[seek-dev] data typing in ptolemy
ludaesch at sdsc.edu
Thu Oct 9 11:19:04 PDT 2003
This is a very interesting issues and directly related to the semantic
typing issues Shawn has been working on recently.
Can you directly work with Chad and Matt on this to make sure the
extensions will be compatible with what we've been discussing?
I can't join today's IRC session, but maybe we can have a phone
conf. on this next week.
>>>>> "CB" == Chad Berkley <berkley at nceas.ucsb.edu> writes:
CB> Matt and I had a conversation on IRC the other day that we thought might
CB> be of interrest to those on these lists.
CB> Basically, I am now dealing with typing issues within ptolemy. Problems
CB> arise when you get missing values in the data. Ptolemy's type heirarchy
CB> does not allow missing values in a data tokens so Matt and I were
CB> talking about extending the ptolemy typing system to allow missing
CB> values. It occured to us that the typing system will need to be
CB> extended to allow for semantic typing in the future.
CB> The type class hierarchy currently looks like the following:
CB> | |
CB> ScalarToken AbstractConvertableToken
CB> | |
CB> ---------------...* -----------------
CB> | | | |
CB> DoubleToken IntToken BooleanToken StringToken
CB> *Note that ScalarToken also includes LongToken and ComplexToken.
CB> In addition to this Token hierarchy (Tokens are the means by which you
CB> pass data between actors over ports) there is also a port typing
CB> hierarchy implemented in the class BaseType. BaseType is the means by
CB> which you actually specify a port's type. It looks like this:
CB> | | | | |
CB> BooleanType ComplexType GeneralType IntType DoubleType ....*
CB> * BaseType also includes EventType, LongType, NumericalType, ObjectType,
CB> SCalarType, StringType, UnknownType, UnsignedByteType
CB> Basically, in order to extend this typing system, we must extend both of
CB> these hierarchies since Tokens are the means by which data is transfered
CB> between ports and BaseTypes are the means by which you allow (or
CB> disallow) a port to accept different types of data.
CB> Extending the hierarchy
CB> There are two different ways that I see to extend the hierarchy. The
CB> first is to extend the base class Token with our own tree of token types
CB> extending from the root of the tree. This will probably allow us the
CB> most flexibility in implementing types the way we need to, however, the
CB> main drawback I see to doing this is that we would not be able to use
CB> most existing actors because their ports are typed according to the
CB> current hierarchy. I think that one fact pretty much eliminates this
CB> approach from the options.
CB> The second approach I see is to extend each of the leaf token types.
CB> For example, extend DoubleToken to ExtendedDoubleToken and add our
CB> additional functionality there. This keeps our type system within the
CB> bounds of the current ptolemy hierarchy but limits our flexibility in
CB> extension. we are basically limited to the hierarchy that already
CB> exists. It is still unclear to me what the affects of doing this will
CB> be on existing actors. For instance, if we extend DoubleToken to allow
CB> missing values, and an actor with a port of BaseType.DoubleType gets an
CB> ExtendedDoubleToken, it would still need to be able to handle whatever
CB> value we assign as a missing value code. This is problematic, because
CB> we are then restricted to using an actual double value as a missing
CB> value code (i.e. -999.999) which we've always maintained was bad data
CB> practice. This could also cause problems because the actor cannot
CB> differentiate -999.999 from a normal value and will operate on it
CB> This same problem comes up (but to a lesser extent) when you think about
CB> extending this system for semantics. What does an existing actor do
CB> with semantic information stored in the token? It can ignore it, but
CB> that may be detrimental to the analysis.
CB> Another possible option
CB> The other possible solution to the missing value problem is to simply
CB> not send any data over the port when a missing value is encountered. I
CB> have modified the EML ingestion actor to dynamically create one typed
CB> port for each attribute in the data package. These ports can then be
CB> hooked up to other actors. The data is sent asyncronously and depends
CB> on the receiving ports to queue the data until all the input data is
CB> present to run the analysis.
CB> If I simply do not send a token when a missing value comes up, I forsee
CB> major timing problems. For instance, port A and port B are mapped to
CB> input ports X and Y (res.) of a plotter. port A sends a token to X,
CB> then B gets a missing value. It sends nothing. The plotter is then
CB> waiting for its second input. the next record is iterated into. port A
CB> sends another token to X. This causes an exception. The other scenario
CB> is, on the second iteration, A is a missing value but B is not. Then we
CB> are plotting two values from different records when Y recieves data from
CB> B in the second record. This would be a nightmare to deal with given
CB> the current directors.
CB> So, does anyone see something that I'm missing here? What are the needs
CB> of the semantic typing going to be as far as ptolemy goes? Anyone have
CB> a better solution than the three that I've layed out? This is a complex
CB> issue that I need to deal with before I can continue moving forward with
CB> AMS. I don't want to do anything that will hinder the future semantic
CB> extensions of ptolemy and this is just too much of a basic
CB> infrastructure item to try to hack. If anyone want to have an IRC chat
CB> about this, I'm on #seek.
CB> Chad Berkley
CB> National Center for
CB> Ecological Analysis
CB> and Synthesis (NCEAS)
CB> berkley at nceas.ucsb.edu
CB> seek-dev mailing list
CB> seek-dev at ecoinformatics.org
More information about the Seek-dev