[kepler-dev] Attributes for Kepler (and Ptolemy) Tokens
higgins at nceas.ucsb.edu
Tue Mar 25 11:47:19 PDT 2008
I have been spending some time lately learning Python with the
particular goal of using the Python/Jython actor in Kepler. One thing
that I have noted is that Python has some interesting similarities to R.
In particular, both languages have the ability to attach 'attributes' to
arbitrary objects. It strikes me that this is a very useful way to
attach various types of metadata to data objects - a capability that is
the basis of knb/eml data packages that are stored in the NCEAS Metacat
and used in Kepler EML data source actors.
Kepler passes data between actors as Tokens, which I think of as
references to the actual data (one level of abstraction from the actual
data). However, at least as far as I understand it, there is no way to
attach attributes to Tokens. *I would like to propose adding a
'Dictionary' member (i.e. a Hashtable) to the base Token class*. This
would allow any Kepler token to carry a named list of 'attributes'.
Example labels (keys) for these attributes might be a 'name', 'unit', or
some named more complex metadata element (e.g. an XML fragment). The
default value of this Dictionary member could be null so that it would
have no effect on existing workflows using existing tokens, and it would
have minimal effect on new workflows unless it was deliberately
populated with attributes of interest.
Any comments/thoughts on this?
Some additional thoughts:
One item that lead to these thoughts is the R dataframe object that
is very useful in R for manipulating table-like structures. In R, a
dataframe is an ordered list of column data. The columns are basically
arrays of the same length but not necessarily of the same data type -
i.e. one might be strings, another doubles, etc. The columns (and rows)
can be named. A dataframe is thus very similar to a relational database
table and functions for subsetting, searching, and other RDB-like
operations exist in R.
How would one pass dataframe objects between arbitrary actors in
Kepler using Kepler tokens? My first thought would be as Ptolemy
RecordTokens where each item (ie column) in the Record is an ArrayToken.
The columns in the Record each have an associated label (name), but they
are not ordered except by the alphabetical order of the names (since a
RecordToken is just a dictionary or hash table). To get the ordering of
the dataframe, one could create a DataframeToken that was an array of
column arrays, but then how does one attach names (and other metadata)
to each column array?
So you can see that the idea of including a Dictionary member to
Token is driven in part by the desire to create a 'dataframe-like' token
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Kepler-dev