[seek-kr] Re: [seek-dev] data typing in ptolemy

Bertram Ludaescher ludaesch at sdsc.edu
Thu Oct 9 11:19:04 PDT 2003


Chad:

This is a very interesting issues and directly related to the semantic 
typing issues Shawn has been working on recently.

Shawn: 

Can you directly work with Chad and Matt on this to make sure the
extensions will be compatible with what we've been discussing?

I can't join today's IRC session, but maybe we can have a phone
conf. on this next week. 

Bertram


>>>>> "CB" == Chad Berkley <berkley at nceas.ucsb.edu> writes:
CB> 
CB> Hi,
CB> Matt and I had a conversation on IRC the other day that we thought might 
CB> be of interrest to those on these lists.
CB> 
CB> Basically, I am now dealing with typing issues within ptolemy.  Problems 
CB> arise when you get missing values in the data.  Ptolemy's type heirarchy 
CB> does not allow missing values in a data tokens so Matt and I were 
CB> talking about extending the ptolemy typing system to allow missing 
CB> values.  It occured to us that the typing system will need to be 
CB> extended to allow for semantic typing in the future.
CB> 
CB> The type class hierarchy currently looks like the following:
CB> 
CB>                Token
CB>                  |
CB>       --------------------------
CB>       |                        |
CB> ScalarToken             AbstractConvertableToken
CB>         |                               |
CB> ---------------...*              -----------------
CB>   |            |                 |               |
CB> DoubleToken  IntToken         BooleanToken  StringToken
CB> 
CB> *Note that ScalarToken also includes LongToken and ComplexToken.
CB> 
CB> In addition to this Token hierarchy (Tokens are the means by which you 
CB> pass data between actors over ports) there is also a port typing 
CB> hierarchy implemented in the class BaseType.  BaseType is the means by 
CB> which you actually specify a port's type.  It looks like this:
CB> 
CB>                                BaseType
CB>                                   |
CB>       ---------------------------------------------------------....
CB>       |             |             |          |          |
CB> BooleanType   ComplexType   GeneralType   IntType   DoubleType ....*
CB> 
CB> * BaseType also includes EventType, LongType, NumericalType, ObjectType, 
CB> SCalarType, StringType, UnknownType, UnsignedByteType
CB> 
CB> Basically, in order to extend this typing system, we must extend both of 
CB> these hierarchies since Tokens are the means by which data is transfered 
CB> between ports and BaseTypes are the means by which you allow (or 
CB> disallow) a port to accept different types of data.
CB> 
CB> Extending the hierarchy
CB> -----------------------
CB> There are two different ways that I see to extend the hierarchy.  The 
CB> first is to extend the base class Token with our own tree of token types 
CB> extending from the root of the tree.  This will probably allow us the 
CB> most flexibility in implementing types the way we need to, however, the 
CB> main drawback I see to doing this is that we would not be able to use 
CB> most existing actors because their ports are typed according to the 
CB> current hierarchy.  I think that one fact pretty much eliminates this 
CB> approach from the options.
CB> 
CB> The second approach I see is to extend each of the leaf token types. 
CB> For example, extend DoubleToken to ExtendedDoubleToken and add our 
CB> additional functionality there.  This keeps our type system within the 
CB> bounds of the current ptolemy hierarchy but limits our flexibility in 
CB> extension.  we are basically limited to the hierarchy that already 
CB> exists.  It is still unclear to me what the affects of doing this will 
CB> be on existing actors.  For instance, if we extend DoubleToken to allow 
CB> missing values, and an actor with a port of BaseType.DoubleType gets an 
CB> ExtendedDoubleToken, it would still need to be able to handle whatever 
CB> value we assign as a missing value code.  This is problematic, because 
CB> we are then restricted to using an actual double value as a missing 
CB> value code (i.e. -999.999) which we've always maintained was bad data 
CB> practice.  This could also cause problems because the actor cannot 
CB> differentiate -999.999 from a normal value and will operate on it 
CB> normally.
CB> 
CB> This same problem comes up (but to a lesser extent) when you think about 
CB> extending this system for semantics.  What does an existing actor do 
CB> with semantic information stored in the token?  It can ignore it, but 
CB> that may be detrimental to the analysis.
CB> 
CB> Another possible option
CB> -----------------------
CB> The other possible solution to the missing value problem is to simply 
CB> not send any data over the port when a missing value is encountered.  I 
CB> have modified the EML ingestion actor to dynamically create one typed 
CB> port for each attribute in the data package.  These ports can then be 
CB> hooked up to other actors.  The data is sent asyncronously and depends 
CB> on the receiving ports to queue the data until all the input data is 
CB> present to run the analysis.
CB> 
CB> If I simply do not send a token when a missing value comes up, I forsee 
CB> major timing problems.  For instance, port A and port B are mapped to 
CB> input ports X and Y (res.) of a plotter.  port A sends a token to X, 
CB> then B gets a missing value.  It sends nothing.  The plotter is then 
CB> waiting for its second input.  the next record is iterated into.  port A 
CB> sends another token to X.  This causes an exception.  The other scenario 
CB> is, on the second iteration, A is a missing value but B is not.  Then we 
CB> are plotting two values from different records when Y recieves data from 
CB> B in the second record.  This would be a nightmare to deal with given 
CB> the current directors.
CB> 
CB> So, does anyone see something that I'm missing here?  What are the needs 
CB> of the semantic typing going to be as far as ptolemy goes?  Anyone have 
CB> a better solution than the three that I've layed out?  This is a complex 
CB> issue that I need to deal with before I can continue moving forward with 
CB> AMS.  I don't want to do anything that will hinder the future semantic 
CB> extensions of ptolemy and this is just too much of a basic 
CB> infrastructure item to try to hack.  If anyone want to have an IRC chat 
CB> about this, I'm on #seek.
CB> 
CB> chad
CB> 
CB> -- 
CB> -----------------------
CB> Chad Berkley
CB> National Center for
CB> Ecological Analysis
CB> and Synthesis (NCEAS)
CB> berkley at nceas.ucsb.edu
CB> -----------------------
CB> 
CB> _______________________________________________
CB> seek-dev mailing list
CB> seek-dev at ecoinformatics.org
CB> http://www.ecoinformatics.org/mailman/listinfo/seek-dev



More information about the Seek-kr mailing list