[kepler-dev] Bug 2240: add support for null values to data

Mon Jan 30 20:56:36 PST 2006

One of the things that will come up next week is the null value issue.

It might be interesting to get the conversation going a little early.

Below are two piece of email from December.

Below is Professor Arne Huseby's comments about missing values:

> I actually implemented support for missing values in my simulation 
> program, Riscue, earlier this fall. Normally, when I run an ordinary 
> simulation, I would never encounter missing data. However, I also use 
> Riscue to process and analyze real statistical data, and such data 
> often come with a lot of missing values.
> 
> When I load a table of statistical data into Riscue, Riscue generates 
> a set of "data nodes", one for each column in the table. I can then 
> do a lot of descriptive statistics using these data nodes, e.g., 
> cumulative distribution plots, histograms, scatter plots, regression 
> analysis, correlation analysis, etc. I can also do postprocessing of 
> the data, i.e, transformations, merging, filtering etc.
> 
> In addition to using data nodes as input to such things, data nodes 
> can also be used as elements in a model. When I run a simulation on a 
> model containing data nodes, Riscue will sample random values from 
> the values contained in each of the data nodes. In statistical 
> terminology this is usually called "resampling".
> 
> Thus, since I now support data nodes with missing values, I also need 
> to deal with these missing values in the simulations. Say that I 
> e.g., want to simulate a very simple model with only two nodes, A and 
> B, where A is a data node, and B is a node taking the sampled value 
> from A, and transforming this into some other value, using some 
> function f. Thus, when a "good" value, say x, is sampled from A, then 
> B gets the value f(x). If on the other hand, a missing value is 
> sampled from A, there is a question how this should be treated in B. 
> At least in my applications, the natural thing would be to say that B 
> gets a missing value too, i.e., that the missing value state is 
> propagated through the model.
> 
> So, how did I implement support for this?
> 
> One obvious way of dealing with this would be to implement support 
> for missing values throughout my entire library of mathematical 
> functions. With about 400 such classes, this would be a daunting 
> task! Moreover, while a few functions may be able to handle missing 
> values in some meaningful way, most of these functions would not. So 
> it really didn't make sense to go through all the trouble of 
> implementing support for something that in 99.99% of the cases would 
> not be meaningful after all.
> 
> In Riscue each node owns a formula object, which is a tree composed 
> of function objects from my function library. So, in order to 
> calculate its value, which is done using a method called 
> calcObjectValue(), the node asks its formula to do this, at the same 
> time passing a list of input edges to the formula. The formula then 
> uses the values obtained from the input edges in its calculations and 
> responds back to the node with a value. The edges can also have their 
> own formulas, and transform values obtained from their respective 
> input nodes in a similar fashion.
> 
> This design actually provides a very easy way to support missing 
> values. Whenever a request for a value is sent from a formula to some 
> object (i.e., to an edge or a node), using a method called 
> getObjectValue(), two things can happen:
> 
> 1) The object has a valid value which is passed back to the formula.
> 2) The object does not have a valid value, and throws a MissingValueException.
> 
> This exception is NOT handled by the formula object, so this way I 
> avoided having to implement support for this throughout my function 
> library. Instead this is handled by the owner of the formula, i.e., 
> either a node or an edge. Handling this exception is trivial since 
> this simply means assigning a missing value (encoded in some suitable 
> fashion) to the object instead of a valid value.
> 
> So the only (well almost...) changes I needed to make in my code, were:
> 
> (i) Modifying the getObjectValue()-method so that missing values 
> results in a MissingValueException being thrown.
> (ii) Adding a new "catch block" in the calcObjectValue()-method for 
> the nodes and edges, handling the MissingValueException.
> 
> I don't know if this solution has any relevance to your problem, but 
> maybe this will trigger some ideas.
> 
> Arne 

On 12/13/05, I (cxh) wrote:
> 
> Ok, I hacked up the following:
> 
> * Token.java now has nil() and isNil() methods
>   I went with nil because null is a Java keyword.
> 
>   The term "missing" is rather appealing since the Token.toString()
>   method usually prints out "present".  So, it could be modified to
>   print out "missing".  However, I feel that someone is likely to have
>   a parameter named "missing" somewhere.  nil seems safer.
> 
>   This is not cast in stone, comments are welcome.
> 
>   These methods have tests. 
>   
> * data/expr/Constants.java now defines a "nil" constant which
>   is a Token that has the nil() method:
>         ptolemy.data.Token nil = new ptolemy.data.Token();
>         nil.nil();
>         _table.put("nil", nil);
>   Thus, one can now create expressions that have nil in them
> 
>   The "nil" constant has a test in data/expr/test/PtParser.tcl
> 
> * actor/lib/RemoveNilTokens.java:
>   A new actor that reads its input and discards any nil tokens
>   in the fire() method.  It might be better to do this in prefire()
>   This actor is available in the "More Libraries" -> "Esoteric" section.
> 
>   Note that this actor should not be used in SDF.
>   
>   No tests yet.
> 
> * domains/pn/demo/RemoveNilTokens/RemoveNilTokens.xml
>   A model that uses RemoveNilTokens.
>   I think I'm terminating the PN process poorly.  I get 7 outputs
>   instead of 5.  I could use some help here.
>   Also, I have to explicitly set the type of output of the RemoveNilTokens
>   actor.
> 
> 
> The model looks like:
> 
>           Bool Switch
> Ramp----> | 
>           |----------------> RemoveNilTokens ---------> Display
> Const     |                                      |
> that ---> |                                       --> "Code that stops PN"
> produces  _
> nil       ^
>           |
>           |
> Bernoulli -
> 
> 
> So, now we have a straw man in PN to try out.  
> 
> Questions:
> 1) Would something like this meet the needs of the Kepler group?
> 2) Should RemoveNilTokens do something in prefire()?
>    I'm not sure if I can get access to the Token and call Token.isNil()
>    in prefire().
> 3) Is "nil" an ok name?
> 4) How do I get PN to terminate properly after I see 5 non-nil tokens?
> 
> _Christopher
> 
> Edward wrote:
> --------
> 
> 
>     At 08:16 AM 12/13/2005 -0800, Shawn Bowers wrote:
>     >One option for SDF would be to literally "propagate" the missing-valued
>     >tokens, instead of "dropping" them (as in PN).   Then, it seems the
>     >consumption/production rates at least wouldn't be violated.  Doing this
>     >correctly might be a bit tricky, but could probably perform this based on
>     >the token production/consumption rates --
>     
>     In SDF, before any actor is fired, its prefire() method is called.
>     For all our actors that require input data, if prefire() will
>     return false if there is no input token, and the actor will not
>     be fired. Consequently, its outputs will also have no token...
>     
>     So SDF already does this propagation...
>     
>     Edward
>     
>     
>     
>     ------------
>     Edward A. Lee
>     Professor, Chair of the EE Division, Associate Chair of EECS
>     231 Cory Hall, UC Berkeley, Berkeley, CA 94720
>     phone: 510-642-0253 or 510-642-0455, fax: 510-642-2845
>     eal at eecs.Berkeley.EDU, http://ptolemy.eecs.berkeley.edu/~eal  
>     
>     _______________________________________________
>     Ptolemy maillist  -  Ptolemy at chess.eecs.berkeley.edu
>     http://chess.eecs.berkeley.edu/ptolemy/listinfo/ptolemy
> --------
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev