[kepler-dev] [Bug 1342] - need R actor

Fri Mar 11 15:19:56 PST 2005

http://bugzilla.ecoinformatics.org/show_bug.cgi?id=1342

jones at nceas.ucsb.edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
OtherBugsDependingO|                            |2043
              nThis|                            |

------- Additional Comments From jones at nceas.ucsb.edu  2005-03-11 15:19 -------
OK, to summarize the phone call from last week.  We all agree an effiective R
actor needs to provide the following capabilities:

1) Easily import existing R scripts by
   * letting user choose script
   * user defines ports for all script inputs and outputs
   * actor searches script and automagically provides plumbing to 

2) Execute the R script in an existing workflow without any additional
configuration on the part of the user.  That is, if a workflow with an R actor
in it works on one Kepler instance (e.g., Windows), it should work on all other
Kepler instances without modification (ie, no path fixes, temp file renaming, etc)

The key here that we are trying to acheive is ease of use through transparent
functionality.  If someone binds an R script for Linear Regression to the R
actor, they should be able to register that in the user library and from then on
all other users should be able to use that regression actor without modification
(just drop it in place, connect the ports, and run).

Handling (1) the import of actors will probably require a new GUI component, or
at least a specialized configuration dialog that can take the R script as an
attribute.  The process might look like this for importing an R script:
a) User drags generic R actor onto canvas
b) user opens the configure dialog for that actor and
   b1) pastes the R script of interest into the script field
   b2) creates ports for all of the inputs expected by the R script

Ideally that would be it and the R actor would be able to map between the port
name and the tokens in the script.  For example, assume a script needed two
vectors of floats as input named 'x' and 'y' in the script and produced a
scatterplot in an image file of type image/gif called 'scatterplot'.  When the
user imports the actor, he would create two input ports named x and y of type
array of float, and one output port named 'scatterplot' of type 'Object' ( I'm
not sure about that last type decision).  The R actor would then automatically
and without further configuration be able to take data coming in on the two
input ports (x and y) and provide it to the exectuing R script, and take the out
file and provide it on the output port (scatterplot).  This means reading the
temporary file that the R put generates using an InputStream and emitting the
data on the port.  In addition, becauser the R script used a temporary file for
staging the output, ideally the R actor will 'know' to substitute a temporary
file name instead of the default 'scatterplot' name in the script so that two or
more instances of this actor in the same or different workflows don't overwrite
each others output graphs.

With this functionality we will be able to import a suite of R actors that do
common tasks that scientists can use, such as regression, anova, means by group,
etc.   These new actors are described in a new set of bugs (see bug #2043).