[kepler-dev] Thoughts on an "R' Actor

Thu Jun 10 15:04:08 PDT 2004

Hi All,

I have been working on trying to understand some of the details of the R 
system (http://www.r-project.org/) and how it might be integrated into 
Kepler. For those who are unfamiliar with R, "R is a system for 
statistical computation and graphics. It consists of a language plus a 
run-time environment with graphics, a debugger, access to certain system 
functions, and the ability to run programs stored in script files." 
(from the "R FAQ"). R is a powerful system for statistical and other 
calculations. It is comparable to Matlab or SAS but has the advantage of 
being free, easily extended, and available for PCs, Macs (OS X), and 
Unix systems. There are also numerous extensions from a variety of 
sources. It thus appears to be fairly widely accepted and used by 
numerous researchers.

A first-cut on building an R actor would seem to be to use a local 
version of R (since it can be freely installed on almost any computer) 
and run it as a sub-process to Kepler. An obvious method for doing this 
is to use one of the CommandLine/Exec actors.

I say 'one of ...' because there are at least 2 existing actors for 
running arbitrary subprocesses from within Kepler/Ptolemy.  The 
"CommandLine" actor can be found in the the Kepler graph editor tree 
under "actors/kepler/spa/CommandLine". The author listed in the source 
is Ilkay Altintas, and this actor runs under the 3.0.2 version of 
Ptolemy/Kepler. A second similar actor, called "Exec" is included with 
the Ptolemy 4.0Beta release under "MoreLibraries/Esoteric/Exec". The 
Exec actor was written for Ptolemy 4 by Chris Brooks and (I think) uses 
some new features that are not available in version 3.0.2. 
[Specifically, there is an "Expert Mode" for setting additional parameters.]

Both the CommandLine and Exec actors use the Java 'exec' method to 
launch a subprocess. They differ in the details, however. CommandLine 
actually  launches a command processor ('cmd.exe/command.exe' on Windows 
and 'sh' on Mac/Linux) so that  the command entered by a user is 
essentially identical to that entered in a terminal window to launch a 
process. This can include I/O redirection like "< myfile.in". In the 
Exec actor, the command follows the underlying  Java method more closely 
and has ports for input and output streams. The command string cannot 
include redirection. Both actors wait for the subprocess to finish 
before their 'fire' action completes.

Now consider just how we might integrate R into Kepler. R can be run in 
an interactive mode (start up; type a command; see response; type 
another command) or in a batch mode (start R with a script file which 
has a series of command and write the results to an output file). 
Creating an R workflow in the batch mode is fairly easy. A screen shot 
of a workflow which uses the CommandLine actor to run R to create a jpeg 
plot and then display it shown below.

The script file used in the example is:

x <- seq(-10, 10, length = 50)
y <- x
rotsinc <- function(x, y) {
    sinc <- function(x) {
        y <- sin(x)/x
        y[is.na(y)] <- 1
        y
    }
    10 * sinc(sqrt(x^2 + y^2))
}
sinc.exp <- expression(z == Sinc(sqrt(x^2 + y^2)))
z <- outer(x, y, rotsinc)   
jpeg(filename = "RTest.jpg", width = 480, height = 480, pointsize = 12,
     quality = 75, bg = "white")
par(bg = "white")
persp(x, y, z, theta = 30, phi = 30, expand = 0.5, col = "lightblue")

It can be seen in this batch approach that one can get the results from 
an R calculation from the output stream or from a file created by R that 
is then read by other Kepler actors. A problem comes up, however, if one 
considers how to dynamically input instructions/data to R. In batch 
mode, this could require the dynamic creation of script files, although 
it would be nicer if ports for inputing data/instructions existed for an 
R actor. One thus has the question of how to import information from 
other parts of a workflow to an R actor.

And what about using R in an interactive mode? Both the CommandLine 
actor and the Exec actor start a subprocess and then wait for it to 
finish. This means that the R code is loaded, executed, and then removed 
from memory.  For an interactive environment (or for the case where the 
R calculation is repeatedly executed). it would be desirable to only 
load R once!  There doesn't seem to any reason why  the R process has to 
be stopped between firings. One could keep the process in memory (a 
static variable?) and simply read the input stream, execute it, write 
the output to the output stream, and then wait for the next input as 
part of a fire event.  [Or perhaps there needs to be some class level R 
actor and a set of instances that do certain calculations by 
communicating with the class actor???]

In any case, it is possible to simulate an interactive R session using 
save/load workspace options when starting and ending an R session. But 
it would be useful if the CommandLine actor had an 'inport' port to 
receive commands. Also, it might be useful if the Exec actor really had 
input and output streams instead of the String tokens currently used (to 
handle long inputs).

That ends these semi-random thoughts for now.

Any comments or suggestions?

Dan

-- 
*******************************************************************
Dan Higgins                                  higgins at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Ph: 805-892-2531
National Center for Ecological Analysis and Synthesis (NCEAS) 
735 State Street - Room 205
Santa Barbara, CA 93195
*******************************************************************

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/kepler-dev/attachments/20040610/f880dede/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CommandLine.gif
Type: image/gif
Size: 19027 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/kepler-dev/attachments/20040610/f880dede/CommandLine.gif
-------------- next part --------------
A non-text attachment was scrubbed...
Name: exec.gif
Type: image/gif
Size: 17234 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/kepler-dev/attachments/20040610/f880dede/exec.gif
-------------- next part --------------
A non-text attachment was scrubbed...
Name: R-example.gif
Type: image/gif
Size: 62558 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/kepler-dev/attachments/20040610/f880dede/R-example.gif