[kepler-dev] Attributes for Kepler (and Ptolemy) Tokens

Christopher Brooks cxh at eecs.berkeley.edu
Tue Mar 25 12:16:39 PDT 2008


Hi Dan,
Hmm, interesting idea.

The dictionary sounds a bit like a RecordToken.  RecordTokens use
a TreeMap as the inner data structure.  Perhaps attaching a RecordToken
to a Token might help with data management and operations on the metadata.
I don't fully understand the DataFrame example, but it does not sound
like RecordToken would help there.

One issue with adding to Token is that even if the reference to a
dictionary is null, it will still add space to Token.  Can anyone
confirm this?

Right now, I don't think Tokens have any data, the data is part of the
subclass.

It might be worth looking at how the unit system in ptolemy/data/unit
is implemented.  It looks like we ended up making ScalarToken larger
by adding:
    protected int[] _unitCategoryExponents = null;

The notion of adding metadata to a token is of interest to us, Edward
might have some input.

_Christopher
--------

    
    Hi All,
    
        I have been spending some time lately learning Python with the 
    particular goal of using the Python/Jython actor in Kepler. One thing 
    that I have noted is that Python has some interesting similarities to R. 
    In particular, both languages have the ability to attach 'attributes' to 
    arbitrary objects. It strikes me that this is a very useful way to 
    attach various types of metadata to data objects - a capability that is 
    the basis of knb/eml data packages that are stored in the NCEAS Metacat 
    and used in Kepler EML data source actors.
    
        Kepler passes data between actors as Tokens, which I think of as 
    references to the actual data (one level of abstraction from the actual 
    data). However, at least as far as I understand it, there is no way to 
    attach attributes to Tokens. *I would like to propose adding a 
    'Dictionary' member (i.e. a Hashtable) to the base Token class*. This 
    would allow any Kepler token to carry a named list of 'attributes'. 
    Example labels (keys) for these attributes might be a 'name', 'unit', or 
    some named more complex metadata element (e.g. an XML fragment). The 
    default value of this Dictionary member could be null so that it would 
    have no effect on existing workflows using existing tokens, and it would 
    have minimal effect on new workflows unless it was deliberately 
    populated with attributes of interest.
    
        Any comments/thoughts on this?
    
    Dan Higgins
    
    Some additional thoughts:
        One item that lead to these thoughts is the R dataframe object that 
    is very useful in R for manipulating table-like structures. In R, a 
    dataframe is an ordered list of column data. The columns are basically 
    arrays of the same length but not necessarily of the same data type - 
    i.e. one might be strings, another doubles, etc. The columns (and rows) 
    can be named. A dataframe is thus very similar to a relational database 
    table and functions for subsetting, searching, and other RDB-like 
    operations exist in R.
        How would one pass dataframe objects between arbitrary actors in 
    Kepler using Kepler tokens? My first thought would be as Ptolemy 
    RecordTokens where each item (ie column) in the Record is an ArrayToken. 
    The columns in the Record each have an associated label (name), but they 
    are not ordered except by the alphabetical order of the names (since a 
    RecordToken is just a dictionary or hash table). To get the ordering of 
    the dataframe, one could create a DataframeToken that was an array of 
    column arrays, but then how does one attach names (and other metadata) 
    to each column array?
        So you can see that the idea of including a Dictionary member to 
    Token is driven in part by the desire to create a 'dataframe-like' token 
    for Kepler.
    
    --------------040404000407070403020603
    Content-Type: text/html; charset=ISO-8859-1
    Content-Transfer-Encoding: 7bit
    
    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <html>
    <head>
    </head>
    <body bgcolor="#ffffff" text="#000000">
    Hi All,<br>
    <br>
    &nbsp;&nbsp;&nbsp; I have been spending some time lately learning Python wi
   th the
    particular goal of using the Python/Jython actor in Kepler. One thing
    that I have noted is that Python has some interesting similarities to
    R. In particular, both languages have the ability to attach
    'attributes' to arbitrary objects. It strikes me that this is a very
    useful way to attach various types of metadata to data objects - a
    capability that is the basis of knb/eml data packages that are stored
    in the NCEAS Metacat and used in Kepler EML data source actors.<br>
    <br>
    &nbsp;&nbsp;&nbsp; Kepler passes data between actors as Tokens, which I thi
   nk of as
    references to the actual data (one level of abstraction from the actual
    data). However, at least as far as I understand it, there is no way to
    attach attributes to Tokens. <b>I would like to propose adding a
    'Dictionary' member (i.e. a Hashtable) to the base Token class</b>.
    This would allow any Kepler token to carry a named list of
    'attributes'. Example labels (keys) for these attributes might be a
    'name', 'unit', or some named more complex metadata element (e.g. an
    XML fragment). The default value of this Dictionary member could be
    null so that it would have no effect on existing workflows using
    existing tokens, and it would have minimal effect on new workflows
    unless it was deliberately populated with attributes of interest.<br>
    <br>
    &nbsp;&nbsp;&nbsp; Any comments/thoughts on this?<br>
    <br>
    Dan Higgins<br>
    <br>
    Some additional thoughts:<br>
    &nbsp;&nbsp;&nbsp; One item that lead to these thoughts is the R dataframe 
   object that
    is very useful in R for manipulating table-like structures. In R, a
    dataframe is an ordered list of column data. The columns are basically
    arrays of the same length but not necessarily of the same data type -
    i.e. one might be strings, another doubles, etc. The columns (and rows)
    can be named. A dataframe is thus very similar to a relational database
    table and functions for subsetting, searching, and other RDB-like
    operations exist in R.<br>
    &nbsp;&nbsp;&nbsp; How would one pass dataframe objects between arbitrary a
   ctors in
    Kepler using Kepler tokens? My first thought would be as Ptolemy
    RecordTokens where each item (ie column) in the Record is an
    ArrayToken. The columns in the Record each have an associated label
    (name), but they are not ordered except by the alphabetical order of
    the names (since a RecordToken is just a dictionary or hash table). To
    get the ordering of the dataframe, one could create a DataframeToken
    that was an array of column arrays, but then how does one attach names
    (and other metadata) to each column array?<br>
    &nbsp;&nbsp;&nbsp; So you can see that the idea of including a Dictionary m
   ember to
    Token is driven in part by the desire to create a 'dataframe-like'
    token for Kepler.<br>
    </body>
    </html>
    
    --------------040404000407070403020603--
--------


More information about the Kepler-dev mailing list