[kepler-dev] Attributes for Kepler (and Ptolemy) Tokens

Tue Mar 25 15:37:44 PDT 2008

This is an interesting idea, and if we did as a generalization
of the unit system that didn't add additional overhead, that would
be great. While we are at it, the unit system should be generalized
to support arbitrary ontologies given as a partial order relation
over a set.

The difference between RecordToken and Dan's proposal is that
RecordToken works with the Ptolemy II type system.  Type inference,
for example, tells you what fields to expect in a RecordToken
anywhere in the model. And subtyping is cleanly supported
(a record with more fields is a subclass of a record with fewer).

Dan's proposal would work outside the type system. As such, any
use of it has to check that desired fields are available. It cannot
simply assume it.  This moves towards a more dynamically typed
system.

BTW: I've often wished Java had this feature. We have several
places in Ptolemy II where we use a hashtable to associate some
data element with a Java object. This is not as efficient in
general, but it is an alternative.

Edward

At 01:25 PM 3/25/2008, Christopher Brooks wrote:
>Hi Dan,
>Right, I had forgotten about the Java Dictionary class.  I think of
>things as Maps . . .
>
>Perhaps we could refactor the unit system work so that Token had a
>reference to an object that could be both your proposed Dictionary/Map
>attribute and the unit system.  I see the unit system as additional type
>information, which is very similar to the attribute.
>
>To do this refactoring, we would add a reference to Token and remove
>the unit system reference in ScalarToken.  This would handle my object
>concerns rather nicely, though it would require some heavy lifting
>with the unit system.  Still, designing the attribute extension to
>Token to be flexible enough to handle the unit system would make the
>unit system work possible.
>
>_Christopher
>
>--------
>
>    Hi Christoper,
>    
>        My terminology may be a bit confusing, since different languages use 
>    different terms for what is basically the same thing. For instance, in 
>    Java, Hashtable is derived from java.util.Dictionary, but the JavaDocs 
>    now say that the Dictionary class is obsolete and one should use the 
>    newer Map interface. So I do think of a RecordToken as a dictionary 
>    object. [TreeMap appears to basically just be a Map sorted by key values]
>        And, yes, I agree that even a null value in a member would increase 
>    the space needed for Tokens, so that might be a disavantage, but I don't 
>    know how big of one. But it still seems useful to me to be able to add 
>    attributes to any token. And R and Python include that ability for any 
>    of their objects.
>    
>    Dan
>    
>    Christopher Brooks wrote:
>    > Hi Dan,
>    > Hmm, interesting idea.
>    >
>    > The dictionary sounds a bit like a RecordToken.  RecordTokens use
>    > a TreeMap as the inner data structure.  Perhaps attaching a RecordToken
>    > to a Token might help with data management and operations on the metadata
>   .
>    > I don't fully understand the DataFrame example, but it does not sound
>    > like RecordToken would help there.
>    >
>    > One issue with adding to Token is that even if the reference to a
>    > dictionary is null, it will still add space to Token.  Can anyone
>    > confirm this?
>    >
>    > Right now, I don't think Tokens have any data, the data is part of the
>    > subclass.
>    >
>    > It might be worth looking at how the unit system in ptolemy/data/unit
>    > is implemented.  It looks like we ended up making ScalarToken larger
>    > by adding:
>    >     protected int[] _unitCategoryExponents = null;
>    >
>    > The notion of adding metadata to a token is of interest to us, Edward
>    > might have some input.
>    >
>    > _Christopher
>    > --------
>    >
>    >     
>    >     Hi All,
>    >     
>    >         I have been spending some time lately learning Python with the 
>    >     particular goal of using the Python/Jython actor in Kepler. One thing
>    
>    >     that I have noted is that Python has some interesting similarities to
>    R. 
>    >     In particular, both languages have the ability to attach 'attributes'
>    to 
>    >     arbitrary objects. It strikes me that this is a very useful way to 
>    >     attach various types of metadata to data objects - a capability that 
>   is 
>    >     the basis of knb/eml data packages that are stored in the NCEAS Metac
>   at 
>    >     and used in Kepler EML data source actors.
>    >     
>    >         Kepler passes data between actors as Tokens, which I think of as 
>    >     references to the actual data (one level of abstraction from the actu
>   al 
>    >     data). However, at least as far as I understand it, there is no way t
>   o 
>    >     attach attributes to Tokens. *I would like to propose adding a 
>    >     'Dictionary' member (i.e. a Hashtable) to the base Token class*. This
>    
>    >     would allow any Kepler token to carry a named list of 'attributes'. 
>    >     Example labels (keys) for these attributes might be a 'name', 'unit',
>    or 
>    >     some named more complex metadata element (e.g. an XML fragment). The 
>    >     default value of this Dictionary member could be null so that it woul
>   d 
>    >     have no effect on existing workflows using existing tokens, and it wo
>   uld 
>    >     have minimal effect on new workflows unless it was deliberately 
>    >     populated with attributes of interest.
>    >     
>    >         Any comments/thoughts on this?
>    >     
>    >     Dan Higgins
>    >     
>    >     Some additional thoughts:
>    >         One item that lead to these thoughts is the R dataframe object th
>   at 
>    >     is very useful in R for manipulating table-like structures. In R, a 
>    >     dataframe is an ordered list of column data. The columns are basicall
>   y 
>    >     arrays of the same length but not necessarily of the same data type -
>    
>    >     i.e. one might be strings, another doubles, etc. The columns (and row
>   s) 
>    >     can be named. A dataframe is thus very similar to a relational databa
>   se 
>    >     table and functions for subsetting, searching, and other RDB-like 
>    >     operations exist in R.
>    >         How would one pass dataframe objects between arbitrary actors in 
>    >     Kepler using Kepler tokens? My first thought would be as Ptolemy 
>    >     RecordTokens where each item (ie column) in the Record is an ArrayTok
>   en. 
>    >     The columns in the Record each have an associated label (name), but t
>   hey 
>    >     are not ordered except by the alphabetical order of the names (since 
>   a 
>    >     RecordToken is just a dictionary or hash table). To get the ordering 
>   of 
>    >     the dataframe, one could create a DataframeToken that was an array of
>    
>    >     column arrays, but then how does one attach names (and other metadata
>   ) 
>    >     to each column array?
>    >         So you can see that the idea of including a Dictionary member to 
>    >     Token is driven in part by the desire to create a 'dataframe-like' to
>   ken 
>    >     for Kepler.
>    >     
>    >     --------------040404000407070403020603
>    >     Content-Type: text/html; charset=ISO-8859-1
>    >     Content-Transfer-Encoding: 7bit
>    >     
>    >     <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
>    >     <html>
>    >     <head>
>    >     </head>
>    >     <body bgcolor="#ffffff" text="#000000">
>    >     Hi All,<br>
>    >     <br>
>    >     &nbsp;&nbsp;&nbsp; I have been spending some time lately learning Pyt
>   hon wi
>    >    th the
>    >     particular goal of using the Python/Jython actor in Kepler. One thing
>    >     that I have noted is that Python has some interesting similarities to
>    >     R. In particular, both languages have the ability to attach
>    >     'attributes' to arbitrary objects. It strikes me that this is a very
>    >     useful way to attach various types of metadata to data objects - a
>    >     capability that is the basis of knb/eml data packages that are stored
>    >     in the NCEAS Metacat and used in Kepler EML data source actors.<br>
>    >     <br>
>    >     &nbsp;&nbsp;&nbsp; Kepler passes data between actors as Tokens, which
>    I thi
>    >    nk of as
>    >     references to the actual data (one level of abstraction from the actu
>   al
>    >     data). However, at least as far as I understand it, there is no way t
>   o
>    >     attach attributes to Tokens. <b>I would like to propose adding a
>    >     'Dictionary' member (i.e. a Hashtable) to the base Token class</b>.
>    >     This would allow any Kepler token to carry a named list of
>    >     'attributes'. Example labels (keys) for these attributes might be a
>    >     'name', 'unit', or some named more complex metadata element (e.g. an
>    >     XML fragment). The default value of this Dictionary member could be
>    >     null so that it would have no effect on existing workflows using
>    >     existing tokens, and it would have minimal effect on new workflows
>    >     unless it was deliberately populated with attributes of interest.<br>
>    >     <br>
>    >     &nbsp;&nbsp;&nbsp; Any comments/thoughts on this?<br>
>    >     <br>
>    >     Dan Higgins<br>
>    >     <br>
>    >     Some additional thoughts:<br>
>    >     &nbsp;&nbsp;&nbsp; One item that lead to these thoughts is the R data
>   frame 
>    >    object that
>    >     is very useful in R for manipulating table-like structures. In R, a
>    >     dataframe is an ordered list of column data. The columns are basicall
>   y
>    >     arrays of the same length but not necessarily of the same data type -
>    >     i.e. one might be strings, another doubles, etc. The columns (and row
>   s)
>    >     can be named. A dataframe is thus very similar to a relational databa
>   se
>    >     table and functions for subsetting, searching, and other RDB-like
>    >     operations exist in R.<br>
>    >     &nbsp;&nbsp;&nbsp; How would one pass dataframe objects between arbit
>   rary a
>    >    ctors in
>    >     Kepler using Kepler tokens? My first thought would be as Ptolemy
>    >     RecordTokens where each item (ie column) in the Record is an
>    >     ArrayToken. The columns in the Record each have an associated label
>    >     (name), but they are not ordered except by the alphabetical order of
>    >     the names (since a RecordToken is just a dictionary or hash table). T
>   o
>    >     get the ordering of the dataframe, one could create a DataframeToken
>    >     that was an array of column arrays, but then how does one attach name
>   s
>    >     (and other metadata) to each column array?<br>
>    >     &nbsp;&nbsp;&nbsp; So you can see that the idea of including a Dictio
>   nary m
>    >    ember to
>    >     Token is driven in part by the desire to create a 'dataframe-like'
>    >     token for Kepler.<br>
>    >     </body>
>    >     </html>
>    >     
>    >     --------------040404000407070403020603--
>    > --------
>    >   
>--------
>_______________________________________________
>Kepler-dev mailing list
>Kepler-dev at ecoinformatics.org
>http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev

------------ 
Edward A. Lee
Chair of EECS and Robert S. Pepper Distinguished Professor
231 Cory Hall, UC Berkeley, Berkeley, CA 94720-1770
phone: 510-642-0253, fax: 510-642-2845
eal at eecs.Berkeley.EDU, http://www.eecs.berkeley.edu/Faculty/Homepages/lee.html