[kepler-dev] Attributes for Kepler (and Ptolemy) Tokens
Christopher Brooks
cxh at eecs.berkeley.edu
Tue Mar 25 12:16:39 PDT 2008
Hi Dan,
Hmm, interesting idea.
The dictionary sounds a bit like a RecordToken. RecordTokens use
a TreeMap as the inner data structure. Perhaps attaching a RecordToken
to a Token might help with data management and operations on the metadata.
I don't fully understand the DataFrame example, but it does not sound
like RecordToken would help there.
One issue with adding to Token is that even if the reference to a
dictionary is null, it will still add space to Token. Can anyone
confirm this?
Right now, I don't think Tokens have any data, the data is part of the
subclass.
It might be worth looking at how the unit system in ptolemy/data/unit
is implemented. It looks like we ended up making ScalarToken larger
by adding:
protected int[] _unitCategoryExponents = null;
The notion of adding metadata to a token is of interest to us, Edward
might have some input.
_Christopher
--------
Hi All,
I have been spending some time lately learning Python with the
particular goal of using the Python/Jython actor in Kepler. One thing
that I have noted is that Python has some interesting similarities to R.
In particular, both languages have the ability to attach 'attributes' to
arbitrary objects. It strikes me that this is a very useful way to
attach various types of metadata to data objects - a capability that is
the basis of knb/eml data packages that are stored in the NCEAS Metacat
and used in Kepler EML data source actors.
Kepler passes data between actors as Tokens, which I think of as
references to the actual data (one level of abstraction from the actual
data). However, at least as far as I understand it, there is no way to
attach attributes to Tokens. *I would like to propose adding a
'Dictionary' member (i.e. a Hashtable) to the base Token class*. This
would allow any Kepler token to carry a named list of 'attributes'.
Example labels (keys) for these attributes might be a 'name', 'unit', or
some named more complex metadata element (e.g. an XML fragment). The
default value of this Dictionary member could be null so that it would
have no effect on existing workflows using existing tokens, and it would
have minimal effect on new workflows unless it was deliberately
populated with attributes of interest.
Any comments/thoughts on this?
Dan Higgins
Some additional thoughts:
One item that lead to these thoughts is the R dataframe object that
is very useful in R for manipulating table-like structures. In R, a
dataframe is an ordered list of column data. The columns are basically
arrays of the same length but not necessarily of the same data type -
i.e. one might be strings, another doubles, etc. The columns (and rows)
can be named. A dataframe is thus very similar to a relational database
table and functions for subsetting, searching, and other RDB-like
operations exist in R.
How would one pass dataframe objects between arbitrary actors in
Kepler using Kepler tokens? My first thought would be as Ptolemy
RecordTokens where each item (ie column) in the Record is an ArrayToken.
The columns in the Record each have an associated label (name), but they
are not ordered except by the alphabetical order of the names (since a
RecordToken is just a dictionary or hash table). To get the ordering of
the dataframe, one could create a DataframeToken that was an array of
column arrays, but then how does one attach names (and other metadata)
to each column array?
So you can see that the idea of including a Dictionary member to
Token is driven in part by the desire to create a 'dataframe-like' token
for Kepler.
--------------040404000407070403020603
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
</head>
<body bgcolor="#ffffff" text="#000000">
Hi All,<br>
<br>
I have been spending some time lately learning Python wi
th the
particular goal of using the Python/Jython actor in Kepler. One thing
that I have noted is that Python has some interesting similarities to
R. In particular, both languages have the ability to attach
'attributes' to arbitrary objects. It strikes me that this is a very
useful way to attach various types of metadata to data objects - a
capability that is the basis of knb/eml data packages that are stored
in the NCEAS Metacat and used in Kepler EML data source actors.<br>
<br>
Kepler passes data between actors as Tokens, which I thi
nk of as
references to the actual data (one level of abstraction from the actual
data). However, at least as far as I understand it, there is no way to
attach attributes to Tokens. <b>I would like to propose adding a
'Dictionary' member (i.e. a Hashtable) to the base Token class</b>.
This would allow any Kepler token to carry a named list of
'attributes'. Example labels (keys) for these attributes might be a
'name', 'unit', or some named more complex metadata element (e.g. an
XML fragment). The default value of this Dictionary member could be
null so that it would have no effect on existing workflows using
existing tokens, and it would have minimal effect on new workflows
unless it was deliberately populated with attributes of interest.<br>
<br>
Any comments/thoughts on this?<br>
<br>
Dan Higgins<br>
<br>
Some additional thoughts:<br>
One item that lead to these thoughts is the R dataframe
object that
is very useful in R for manipulating table-like structures. In R, a
dataframe is an ordered list of column data. The columns are basically
arrays of the same length but not necessarily of the same data type -
i.e. one might be strings, another doubles, etc. The columns (and rows)
can be named. A dataframe is thus very similar to a relational database
table and functions for subsetting, searching, and other RDB-like
operations exist in R.<br>
How would one pass dataframe objects between arbitrary a
ctors in
Kepler using Kepler tokens? My first thought would be as Ptolemy
RecordTokens where each item (ie column) in the Record is an
ArrayToken. The columns in the Record each have an associated label
(name), but they are not ordered except by the alphabetical order of
the names (since a RecordToken is just a dictionary or hash table). To
get the ordering of the dataframe, one could create a DataframeToken
that was an array of column arrays, but then how does one attach names
(and other metadata) to each column array?<br>
So you can see that the idea of including a Dictionary m
ember to
Token is driven in part by the desire to create a 'dataframe-like'
token for Kepler.<br>
</body>
</html>
--------------040404000407070403020603--
--------
More information about the Kepler-dev
mailing list