[kepler-dev] Thoughts on an "R' Actor

Fri Jun 11 06:49:55 PDT 2004

Hi Dan,

a couple other semi-random thoughts to add to yours:

* a long time ago I wrote client/server software to interact with a
program running remotely through its command line. A generic client API
(in C++: a colleague wrote a Java API that I haven't used but should
work) could be subclassed to fit a specific interface, so it was
convenient to use. I used that to control spatial simulations remotely
from a locally running GUI. The advantage was that the server kept the
program running, but you had no requirement of keeping the pipe open
between invocations. Something like this is probably (although not
necessarily) overkill for R running locally, but may be a possibility if
we need to run it as some sort of web service. I don't even know where
that software is now, but I can probably trace it down, and I have a
paper and user manual about it that I can share (if I find them).

* I looked at the R code closely and I can say for sure that even in its
embedded form, it's definitely not suitable to be run in a multiple
threaded server - no thread safety whatsoever. So in server mode, if we
need to use it that way, we'll need a copy/fork of the R process per
user, or at least we'll have to serialize requests (not a great idea
given that R jobs can be big). That can be taxing computationally. On
the other hand users may not want or be capable to run R, particularly
if we defer things like spatial scaling or kriging to it.

* I've started tinkering with the R parser, inserting callbacks to
produce a semantic analysis of an arbitrary R script, so that we could
process a script and obtain a custom actor declaration where inputs and
outputs are characterized correctly. That would help creating actors in
R, rather than the "R actor". What I still don't know is how much we can
infer from the SEXP type that is all R passes around, so there may not
be a lot to do there. But I'll definitely let you know what I find in
the next few weeks. R integration is always been at the top of my
non-SEEK priority list anyway.

Ciao,
f

On Thu, 2004-06-10 at 18:04, Dan Higgins wrote:
> Hi All,
> 
> I have been working on trying to understand some of the details of the
> R system (http://www.r-project.org/) and how it might be integrated
> into Kepler. For those who are unfamiliar with R, "R is a system for
> statistical computation and graphics. It consists of a language plus a
> run-time environment with graphics, a debugger, access to certain
> system functions, and the ability to run programs stored in script
> files." (from the "R FAQ"). R is a powerful system for statistical and
> other calculations. It is comparable to Matlab or SAS but has the
> advantage of being free, easily extended, and available for PCs, Macs
> (OS X), and Unix systems. There are also numerous extensions from a
> variety of sources. It thus appears to be fairly widely accepted and
> used by numerous researchers.
> 
> A first-cut on building an R actor would seem to be to use a local
> version of R (since it can be freely installed on almost any computer)
> and run it as a sub-process to Kepler. An obvious method for doing
> this is to use one of the CommandLine/Exec actors.
> 
> I say 'one of ...' because there are at least 2 existing actors for
> running arbitrary subprocesses from within Kepler/Ptolemy.  The
> "CommandLine" actor can be found in the the Kepler graph editor tree
> under "actors/kepler/spa/CommandLine". The author listed in the source
> is Ilkay Altintas, and this actor runs under the 3.0.2 version of
> Ptolemy/Kepler. A second similar actor, called "Exec" is included with
> the Ptolemy 4.0Beta release under "MoreLibraries/Esoteric/Exec". The
> Exec actor was written for Ptolemy 4 by Chris Brooks and (I think)
> uses some new features that are not available in version 3.0.2.
> [Specifically, there is an "Expert Mode" for setting additional
> parameters.]
> 
> 
> 
> 
> 
> Both the CommandLine and Exec actors use the Java 'exec' method to
> launch a subprocess. They differ in the details, however. CommandLine
> actually  launches a command processor ('cmd.exe/command.exe' on
> Windows and 'sh' on Mac/Linux) so that  the command entered by a user
> is essentially identical to that entered in a terminal window to
> launch a process. This can include I/O redirection like "< myfile.in".
> In the Exec actor, the command follows the underlying  Java method
> more closely and has ports for input and output streams. The command
> string cannot include redirection. Both actors wait for the subprocess
> to finish before their 'fire' action completes.
> 
> Now consider just how we might integrate R into Kepler. R can be run
> in an interactive mode (start up; type a command; see response; type
> another command) or in a batch mode (start R with a script file which
> has a series of command and write the results to an output file).
> Creating an R workflow in the batch mode is fairly easy. A screen shot
> of a workflow which uses the CommandLine actor to run R to create a
> jpeg plot and then display it shown below.
> 
> 
> 
> The script file used in the example is:
> 
> x <- seq(-10, 10, length = 50)
> y <- x
> rotsinc <- function(x, y) {
>     sinc <- function(x) {
>         y <- sin(x)/x
>         y[is.na(y)] <- 1
>         y
>     }
>     10 * sinc(sqrt(x^2 + y^2))
> }
> sinc.exp <- expression(z == Sinc(sqrt(x^2 + y^2)))
> z <- outer(x, y, rotsinc)    
> jpeg(filename = "RTest.jpg", width = 480, height = 480, pointsize =
> 12,
>      quality = 75, bg = "white")
> par(bg = "white")
> persp(x, y, z, theta = 30, phi = 30, expand = 0.5, col = "lightblue")
> 
> It can be seen in this batch approach that one can get the results
> from an R calculation from the output stream or from a file created by
> R that is then read by other Kepler actors. A problem comes up,
> however, if one considers how to dynamically input instructions/data
> to R. In batch mode, this could require the dynamic creation of script
> files, although it would be nicer if ports for inputing
> data/instructions existed for an R actor. One thus has the question of
> how to import information from other parts of a workflow to an R
> actor.
> 
> And what about using R in an interactive mode? Both the CommandLine
> actor and the Exec actor start a subprocess and then wait for it to
> finish. This means that the R code is loaded, executed, and then
> removed from memory.  For an interactive environment (or for the case
> where the R calculation is repeatedly executed). it would be desirable
> to only load R once!  There doesn't seem to any reason why  the R
> process has to be stopped between firings. One could keep the process
> in memory (a static variable?) and simply read the input stream,
> execute it, write the output to the output stream, and then wait for
> the next input as part of a fire event.  [Or perhaps there needs to be
> some class level R actor and a set of instances that do certain
> calculations by communicating with the class actor???]
> 
> In any case, it is possible to simulate an interactive R session using
> save/load workspace options when starting and ending an R session. But
> it would be useful if the CommandLine actor had an 'inport' port to
> receive commands. Also, it might be useful if the Exec actor really
> had input and output streams instead of the String tokens currently
> used (to handle long inputs).
> 
> That ends these semi-random thoughts for now.
> 
> Any comments or suggestions?
> 
> Dan
> -- 
> *******************************************************************
> Dan Higgins                                  higgins at nceas.ucsb.edu
> http://www.nceas.ucsb.edu/    Ph: 805-892-2531
> National Center for Ecological Analysis and Synthesis (NCEAS) 
> 735 State Street - Room 205
> Santa Barbara, CA 93195
> *******************************************************************
--