[seek-dev] data typing in ptolemy

Bertram Ludaescher ludaesch at sdsc.edu
Thu Oct 9 11:19:04 PDT 2003


This is a very interesting issues and directly related to the semantic 
typing issues Shawn has been working on recently.


Can you directly work with Chad and Matt on this to make sure the
extensions will be compatible with what we've been discussing?

I can't join today's IRC session, but maybe we can have a phone
conf. on this next week. 


>>>>> "CB" == Chad Berkley <berkley at nceas.ucsb.edu> writes:
CB> Hi,
CB> Matt and I had a conversation on IRC the other day that we thought might 
CB> be of interrest to those on these lists.
CB> Basically, I am now dealing with typing issues within ptolemy.  Problems 
CB> arise when you get missing values in the data.  Ptolemy's type heirarchy 
CB> does not allow missing values in a data tokens so Matt and I were 
CB> talking about extending the ptolemy typing system to allow missing 
CB> values.  It occured to us that the typing system will need to be 
CB> extended to allow for semantic typing in the future.
CB> The type class hierarchy currently looks like the following:
CB>                Token
CB>                  |
CB>       --------------------------
CB>       |                        |
CB> ScalarToken             AbstractConvertableToken
CB>         |                               |
CB> ---------------...*              -----------------
CB>   |            |                 |               |
CB> DoubleToken  IntToken         BooleanToken  StringToken
CB> *Note that ScalarToken also includes LongToken and ComplexToken.
CB> In addition to this Token hierarchy (Tokens are the means by which you 
CB> pass data between actors over ports) there is also a port typing 
CB> hierarchy implemented in the class BaseType.  BaseType is the means by 
CB> which you actually specify a port's type.  It looks like this:
CB>                                BaseType
CB>                                   |
CB>       ---------------------------------------------------------....
CB>       |             |             |          |          |
CB> BooleanType   ComplexType   GeneralType   IntType   DoubleType ....*
CB> * BaseType also includes EventType, LongType, NumericalType, ObjectType, 
CB> SCalarType, StringType, UnknownType, UnsignedByteType
CB> Basically, in order to extend this typing system, we must extend both of 
CB> these hierarchies since Tokens are the means by which data is transfered 
CB> between ports and BaseTypes are the means by which you allow (or 
CB> disallow) a port to accept different types of data.
CB> Extending the hierarchy
CB> -----------------------
CB> There are two different ways that I see to extend the hierarchy.  The 
CB> first is to extend the base class Token with our own tree of token types 
CB> extending from the root of the tree.  This will probably allow us the 
CB> most flexibility in implementing types the way we need to, however, the 
CB> main drawback I see to doing this is that we would not be able to use 
CB> most existing actors because their ports are typed according to the 
CB> current hierarchy.  I think that one fact pretty much eliminates this 
CB> approach from the options.
CB> The second approach I see is to extend each of the leaf token types. 
CB> For example, extend DoubleToken to ExtendedDoubleToken and add our 
CB> additional functionality there.  This keeps our type system within the 
CB> bounds of the current ptolemy hierarchy but limits our flexibility in 
CB> extension.  we are basically limited to the hierarchy that already 
CB> exists.  It is still unclear to me what the affects of doing this will 
CB> be on existing actors.  For instance, if we extend DoubleToken to allow 
CB> missing values, and an actor with a port of BaseType.DoubleType gets an 
CB> ExtendedDoubleToken, it would still need to be able to handle whatever 
CB> value we assign as a missing value code.  This is problematic, because 
CB> we are then restricted to using an actual double value as a missing 
CB> value code (i.e. -999.999) which we've always maintained was bad data 
CB> practice.  This could also cause problems because the actor cannot 
CB> differentiate -999.999 from a normal value and will operate on it 
CB> normally.
CB> This same problem comes up (but to a lesser extent) when you think about 
CB> extending this system for semantics.  What does an existing actor do 
CB> with semantic information stored in the token?  It can ignore it, but 
CB> that may be detrimental to the analysis.
CB> Another possible option
CB> -----------------------
CB> The other possible solution to the missing value problem is to simply 
CB> not send any data over the port when a missing value is encountered.  I 
CB> have modified the EML ingestion actor to dynamically create one typed 
CB> port for each attribute in the data package.  These ports can then be 
CB> hooked up to other actors.  The data is sent asyncronously and depends 
CB> on the receiving ports to queue the data until all the input data is 
CB> present to run the analysis.
CB> If I simply do not send a token when a missing value comes up, I forsee 
CB> major timing problems.  For instance, port A and port B are mapped to 
CB> input ports X and Y (res.) of a plotter.  port A sends a token to X, 
CB> then B gets a missing value.  It sends nothing.  The plotter is then 
CB> waiting for its second input.  the next record is iterated into.  port A 
CB> sends another token to X.  This causes an exception.  The other scenario 
CB> is, on the second iteration, A is a missing value but B is not.  Then we 
CB> are plotting two values from different records when Y recieves data from 
CB> B in the second record.  This would be a nightmare to deal with given 
CB> the current directors.
CB> So, does anyone see something that I'm missing here?  What are the needs 
CB> of the semantic typing going to be as far as ptolemy goes?  Anyone have 
CB> a better solution than the three that I've layed out?  This is a complex 
CB> issue that I need to deal with before I can continue moving forward with 
CB> AMS.  I don't want to do anything that will hinder the future semantic 
CB> extensions of ptolemy and this is just too much of a basic 
CB> infrastructure item to try to hack.  If anyone want to have an IRC chat 
CB> about this, I'm on #seek.
CB> chad
CB> -- 
CB> -----------------------
CB> Chad Berkley
CB> National Center for
CB> Ecological Analysis
CB> and Synthesis (NCEAS)
CB> berkley at nceas.ucsb.edu
CB> -----------------------
CB> _______________________________________________
CB> seek-dev mailing list
CB> seek-dev at ecoinformatics.org
CB> http://www.ecoinformatics.org/mailman/listinfo/seek-dev

More information about the Seek-dev mailing list