[kepler-dev] [Bug 2963] - Add data structure for tabular data and associated metadata

Wed Sep 12 10:29:24 PDT 2007

http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2963

------- Comment #1 from higgins at nceas.ucsb.edu  2007-09-12 10:29 -------
There is already support for handing an R dataframe from one RExpression to
another. If RExpression has an output port assigned to a dataframe, it writes
the dataframe to a file and passes a string starting with 'dataframe:' followed
by the file path. Another RExpression actor can access this as an input.

The rough equivalent Kepler data structure for a dataframe is a Kepler/PTII
record. Each record has a name (the column name) and an array of values (the
column vector). See $KEPLER/demos/R/emlToRecord_R.xml for an example of
creating a record/dataframe and importing it into the RExpression actor.

There is one shortcoming with representing a dataframe with a record of arrays
- namely, the Kepler record is not ordered; i.e. record are always referenced
by name, not by location in an array. Thus, the nth item in creating a record
may not be in the nth location when accessing. R dateframe columns can be
referenced by name or index.

I suggest that we consider an extension of the R dataframe concept. This would
include not only the list of named column arrays of the dataframe, but also
additional associated metadata (i.e. all the eml attribute info) and possibly
sematic information. (Dan Higgins)