[kepler-dev] Bug 2240: add support for null values to data
cxh at eecs.berkeley.edu
Mon Jan 30 20:56:36 PST 2006
One of the things that will come up next week is the null value issue.
It might be interesting to get the conversation going a little early.
Below are two piece of email from December.
Below is Professor Arne Huseby's comments about missing values:
> I actually implemented support for missing values in my simulation
> program, Riscue, earlier this fall. Normally, when I run an ordinary
> simulation, I would never encounter missing data. However, I also use
> Riscue to process and analyze real statistical data, and such data
> often come with a lot of missing values.
> When I load a table of statistical data into Riscue, Riscue generates
> a set of "data nodes", one for each column in the table. I can then
> do a lot of descriptive statistics using these data nodes, e.g.,
> cumulative distribution plots, histograms, scatter plots, regression
> analysis, correlation analysis, etc. I can also do postprocessing of
> the data, i.e, transformations, merging, filtering etc.
> In addition to using data nodes as input to such things, data nodes
> can also be used as elements in a model. When I run a simulation on a
> model containing data nodes, Riscue will sample random values from
> the values contained in each of the data nodes. In statistical
> terminology this is usually called "resampling".
> Thus, since I now support data nodes with missing values, I also need
> to deal with these missing values in the simulations. Say that I
> e.g., want to simulate a very simple model with only two nodes, A and
> B, where A is a data node, and B is a node taking the sampled value
> from A, and transforming this into some other value, using some
> function f. Thus, when a "good" value, say x, is sampled from A, then
> B gets the value f(x). If on the other hand, a missing value is
> sampled from A, there is a question how this should be treated in B.
> At least in my applications, the natural thing would be to say that B
> gets a missing value too, i.e., that the missing value state is
> propagated through the model.
> So, how did I implement support for this?
> One obvious way of dealing with this would be to implement support
> for missing values throughout my entire library of mathematical
> functions. With about 400 such classes, this would be a daunting
> task! Moreover, while a few functions may be able to handle missing
> values in some meaningful way, most of these functions would not. So
> it really didn't make sense to go through all the trouble of
> implementing support for something that in 99.99% of the cases would
> not be meaningful after all.
> In Riscue each node owns a formula object, which is a tree composed
> of function objects from my function library. So, in order to
> calculate its value, which is done using a method called
> calcObjectValue(), the node asks its formula to do this, at the same
> time passing a list of input edges to the formula. The formula then
> uses the values obtained from the input edges in its calculations and
> responds back to the node with a value. The edges can also have their
> own formulas, and transform values obtained from their respective
> input nodes in a similar fashion.
> This design actually provides a very easy way to support missing
> values. Whenever a request for a value is sent from a formula to some
> object (i.e., to an edge or a node), using a method called
> getObjectValue(), two things can happen:
> 1) The object has a valid value which is passed back to the formula.
> 2) The object does not have a valid value, and throws a MissingValueException.
> This exception is NOT handled by the formula object, so this way I
> avoided having to implement support for this throughout my function
> library. Instead this is handled by the owner of the formula, i.e.,
> either a node or an edge. Handling this exception is trivial since
> this simply means assigning a missing value (encoded in some suitable
> fashion) to the object instead of a valid value.
> So the only (well almost...) changes I needed to make in my code, were:
> (i) Modifying the getObjectValue()-method so that missing values
> results in a MissingValueException being thrown.
> (ii) Adding a new "catch block" in the calcObjectValue()-method for
> the nodes and edges, handling the MissingValueException.
> I don't know if this solution has any relevance to your problem, but
> maybe this will trigger some ideas.
On 12/13/05, I (cxh) wrote:
> Ok, I hacked up the following:
> * Token.java now has nil() and isNil() methods
> I went with nil because null is a Java keyword.
> The term "missing" is rather appealing since the Token.toString()
> method usually prints out "present". So, it could be modified to
> print out "missing". However, I feel that someone is likely to have
> a parameter named "missing" somewhere. nil seems safer.
> This is not cast in stone, comments are welcome.
> These methods have tests.
> * data/expr/Constants.java now defines a "nil" constant which
> is a Token that has the nil() method:
> ptolemy.data.Token nil = new ptolemy.data.Token();
> _table.put("nil", nil);
> Thus, one can now create expressions that have nil in them
> The "nil" constant has a test in data/expr/test/PtParser.tcl
> * actor/lib/RemoveNilTokens.java:
> A new actor that reads its input and discards any nil tokens
> in the fire() method. It might be better to do this in prefire()
> This actor is available in the "More Libraries" -> "Esoteric" section.
> Note that this actor should not be used in SDF.
> No tests yet.
> * domains/pn/demo/RemoveNilTokens/RemoveNilTokens.xml
> A model that uses RemoveNilTokens.
> I think I'm terminating the PN process poorly. I get 7 outputs
> instead of 5. I could use some help here.
> Also, I have to explicitly set the type of output of the RemoveNilTokens
> The model looks like:
> Bool Switch
> Ramp----> |
> |----------------> RemoveNilTokens ---------> Display
> Const | |
> that ---> | --> "Code that stops PN"
> produces _
> nil ^
> Bernoulli -
> So, now we have a straw man in PN to try out.
> 1) Would something like this meet the needs of the Kepler group?
> 2) Should RemoveNilTokens do something in prefire()?
> I'm not sure if I can get access to the Token and call Token.isNil()
> in prefire().
> 3) Is "nil" an ok name?
> 4) How do I get PN to terminate properly after I see 5 non-nil tokens?
> Edward wrote:
> At 08:16 AM 12/13/2005 -0800, Shawn Bowers wrote:
> >One option for SDF would be to literally "propagate" the missing-valued
> >tokens, instead of "dropping" them (as in PN). Then, it seems the
> >consumption/production rates at least wouldn't be violated. Doing this
> >correctly might be a bit tricky, but could probably perform this based on
> >the token production/consumption rates --
> In SDF, before any actor is fired, its prefire() method is called.
> For all our actors that require input data, if prefire() will
> return false if there is no input token, and the actor will not
> be fired. Consequently, its outputs will also have no token...
> So SDF already does this propagation...
> Edward A. Lee
> Professor, Chair of the EE Division, Associate Chair of EECS
> 231 Cory Hall, UC Berkeley, Berkeley, CA 94720
> phone: 510-642-0253 or 510-642-0455, fax: 510-642-2845
> eal at eecs.Berkeley.EDU, http://ptolemy.eecs.berkeley.edu/~eal
> Ptolemy maillist - Ptolemy at chess.eecs.berkeley.edu
> Kepler-dev mailing list
> Kepler-dev at ecoinformatics.org
More information about the Kepler-dev