[kepler-dev] Attributes for Kepler (and Ptolemy) Tokens

Tue Mar 25 11:47:19 PDT 2008

Hi All,

    I have been spending some time lately learning Python with the 
particular goal of using the Python/Jython actor in Kepler. One thing 
that I have noted is that Python has some interesting similarities to R. 
In particular, both languages have the ability to attach 'attributes' to 
arbitrary objects. It strikes me that this is a very useful way to 
attach various types of metadata to data objects - a capability that is 
the basis of knb/eml data packages that are stored in the NCEAS Metacat 
and used in Kepler EML data source actors.

    Kepler passes data between actors as Tokens, which I think of as 
references to the actual data (one level of abstraction from the actual 
data). However, at least as far as I understand it, there is no way to 
attach attributes to Tokens. *I would like to propose adding a 
'Dictionary' member (i.e. a Hashtable) to the base Token class*. This 
would allow any Kepler token to carry a named list of 'attributes'. 
Example labels (keys) for these attributes might be a 'name', 'unit', or 
some named more complex metadata element (e.g. an XML fragment). The 
default value of this Dictionary member could be null so that it would 
have no effect on existing workflows using existing tokens, and it would 
have minimal effect on new workflows unless it was deliberately 
populated with attributes of interest.

    Any comments/thoughts on this?

Dan Higgins

Some additional thoughts:
    One item that lead to these thoughts is the R dataframe object that 
is very useful in R for manipulating table-like structures. In R, a 
dataframe is an ordered list of column data. The columns are basically 
arrays of the same length but not necessarily of the same data type - 
i.e. one might be strings, another doubles, etc. The columns (and rows) 
can be named. A dataframe is thus very similar to a relational database 
table and functions for subsetting, searching, and other RDB-like 
operations exist in R.
    How would one pass dataframe objects between arbitrary actors in 
Kepler using Kepler tokens? My first thought would be as Ptolemy 
RecordTokens where each item (ie column) in the Record is an ArrayToken. 
The columns in the Record each have an associated label (name), but they 
are not ordered except by the alphabetical order of the names (since a 
RecordToken is just a dictionary or hash table). To get the ordering of 
the dataframe, one could create a DataframeToken that was an array of 
column arrays, but then how does one attach names (and other metadata) 
to each column array?
    So you can see that the idea of including a Dictionary member to 
Token is driven in part by the desire to create a 'dataframe-like' token 
for Kepler.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/kepler-dev/attachments/20080325/e21d45ea/attachment.htm