[kepler-dev] Bug 2240: add support for null values to data passing among ports
Edward A. Lee
eal at eecs.berkeley.edu
Tue Dec 13 06:44:34 PST 2005
Hmm... This now leads me to change my mind... Perhaps this needs
to be more carefully thought through. I suspect we will be introducing
a back-door mechanism for getting unexpected nondeterminism by this
mechanism... Let's not implement it without further discussion...
It is arguable that if you want a well-defined notion of missing tokens,
you should be using SR or DE. Both of these have clean semantics for
absent tokens, and it's already fully supported...
Edward
At 10:01 PM 12/12/2005 -0800, Christopher Brooks wrote:
>I started looking in to this, below are some random musings.
>
>Comments are welcome.
>
>I added two methods to Token:
>
> /** Return true if the token has been set to null.
> * @return True if the token has been set to null by calling
> * {@link #setToNull()).
> * @see #setToNull()
> */
> public boolean isNull() {
> return _isNull;
> }
>
> /** Set the value of this token to null.
> */
> public void setToNull() {
> // It would be nice if this method was called "null()", but
> // null is a reserved Java keyword.
> _isNull = true;
> }
>
>
>I'm thinking we should call these tokens "missing" instead of null
>because null is a keyword and a null Token is not null in the usual
>Java sense. Other possibilities are "absent" and "empty".
>sr.lib.AbsentToken already exists, so absent is out.
>
>I modified IOPort to have
>IOPort.get(int channnelIndex, boolean dropNullValues),
>and
>IOPort.get(int channnelIndex, int vectorLength, boolean dropNullValues),
>
>Looking at IOPort.get(int channnelIndex, boolean dropNullValues),
>which basically loops through the receivers and gets the first
>non-null token, we have:
>
> localReceivers = getReceivers();
>...
> Token token = null;
>
> for (int j = 0; j < localReceivers[channelIndex].length; j++) {
> Token localToken = localReceivers[channelIndex][j].get();
>
> if (token == null) {
> token = localToken;
> }
> }
>
> if (token == null) {
> throw new NoTokenException(this, "No token to return.");
> }
>
>So, say we receive a missing token on one of the channels?
>What do we do?
>We could go through the rest of the channels and hopefully find
>a non-null and non-missing token.
>If we find a non-null and non-missing token, then we could return the
>missing token. This would invisibly drop the missing token on a channel,
>but it would assume that there is a non-missing token to be had
>
>If we don't find a non-null and non-missing token, then should we return
>the missing token that we found? Or, do we throw an exception like
>we used to? Or, should we somehow busywait? Busywaiting is gross
>
>It would be nice if we could say "Ok, wait for a non-missing token to
>show up," but I don't think get() in SDF does not really handle this,
>it would mess up the balance equations.
>
>Maybe I need a small test example to work from, like two actors:
>
>RandomMissingRampSource ---> NonStrictTest
>
>RandomMissingRampSource would be a Ramp that would produce Missing tokens
>randomly.
>
>NonStrictTest should work fine here because NonStrictTest ignores
>absent inputs and checks inputs in the postfire() method instead of
>in fire() like actor.lib.Test.
>
>Another test would be
> Const
> |
> V
>RandomMissingRampSource ---> MissingCapableAdd -> Test
>
>Where MissingCapableAdd would "Do the right thing" and consume
>missing Tokens and add Const to the non-missing tokens.
>
>This all seems to tie in to strictness and where we read data (fire
>vs. postfire). Domain polymorphic actors should do processing
>in postfire() not fire() because domains like CT might call
>fire() multiple times. It seems like what we really need to do
>is have "Missing" capable actors call fire() and read "Missing" tokens
>until they get a non-missing token and then call postfire().
>I think this would mess with determinisim and scheduling?
>
>Also, there are complications with strictness and multiports.
>NonStrictTest says:
> // The Test actor could be extended so that Strictness was a parameter,
> // but that would require some slightly tricky code to handle
> // multiports in a non-strict fashion. The problem is that if
> // we have more than one input channel, and we want to handle
> // non-strict inputs, then we need to keep track of number of
> // tokens we have seen on each channel. Also, this actor does
> // not read inputs until postfire(), which is too late to produce
> // an output, as done by Test.
>
>_Christopher
>
>
>
>--------
>
>
> This solution sounds reasonable to me...
>
> Edward
>
> At 06:36 PM 12/12/2005 -0800, Christopher Brooks wrote:
> >Hi Edward,
> >
> >One of the Kepler bugs blocking the Kepler release is:
> >http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2240
> >which is reproduced below.
> >
> >Basically, Kepler needs to handle missing data because not all
> >ecological data sets are complete.
> >
> >The solution to this problem needs to work in PN and SDF.
> >
> >The Synchronous/Reactive (SR) domain has the notion of absent values.
> >For example, domains/sr/lib/Absent.java has this class comment:
> >
> > This actor outputs absent values. That is, it produces no tokens,
> > and it calls the sendClear() method of the output port on each
> > firing.
> >
> >The sendClear() method is the IOPort.java sendClear() method.
> >IOPort also has sendClearInside() and broadcastClear().
> >
> >Do you have any comments?
> >
> >Below is the text from the bug report:
> >Below is the text from the bug report:
> >--start--
> >Currently ptolemy and kepler do not support passing null values
> >(sometimes called missing values) among ports, even though this is
> >common in analytical systems like R and SAS. The concept of null is
> >not even defined in the token types. This causes a real problem for
> >data sources that are sparsely populated, as well as data streams that
> >result from data integration operations that might produce null
> >values. We need to extend the underlying token representation to
> >include a concept of null values and the actor framework to protect
> >existing actors that might not know how to handle null values.
> >Because nulls cannot currently be represented in Kepler, none of the
> >existing actors support them. An exception is thrown whenever a
> >missing value is detected by the EML data source, and workflow
> >execution ceases.
> >
> >Bowers and Jones discussed one possible partial solution to this on
> >IRC, which is summarized here.
> >
> >1) Override the Token base class to support null values by providing
> > two new methods:
> >
> > Token.null() sets the token's value to null
> > boolean Token.isNull() returns true if the token has been set to
> > null
> >
> >2) Override TypedIOPort to add a new method that takes a boolean
> > "dropNull"
> >
> > a) by default this could be set to "true" then the existing get()
> > method would be reimplemented to call the new one with a default of
> > "true" because we can assume that no existing actor written can
> > handle null values, since there is no way to pass null values, and
> > so existing calls to get() will invisibly drop null values.
> >
> > b) if an actor can handle null values, then it passes "false"
> > which indicates that the actor knows how to deal with nulls and
> > wants to receive them
> >
> >so the changes to IOPort (or maybe TypedIOPOrt) would be:
> >
> >IOPOrt.get(channelindex) becomes get(channelIndex, dropNullValues)
> >and
> >IOPort.get(channelIndex, int vectorLength) becomes get(channelIndex,
> >vectorLength, dropNullValues)
> >
> >so, for example, the new implmenetation of get(channelIndex) would
> >simply call get(channelIndex, true), so existing actors would not even
> >notice the change.
> >--end--
> >
> >_Christopher
>
> ------------
> Edward A. Lee
> Professor, Chair of the EE Division, Associate Chair of EECS
> 231 Cory Hall, UC Berkeley, Berkeley, CA 94720
> phone: 510-642-0253 or 510-642-0455, fax: 510-642-2845
> eal at eecs.Berkeley.EDU, http://ptolemy.eecs.berkeley.edu/~eal
>--------
>_______________________________________________
>Kepler-dev mailing list
>Kepler-dev at ecoinformatics.org
>http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
------------
Edward A. Lee
Professor, Chair of the EE Division, Associate Chair of EECS
231 Cory Hall, UC Berkeley, Berkeley, CA 94720
phone: 510-642-0253 or 510-642-0455, fax: 510-642-2845
eal at eecs.Berkeley.EDU, http://ptolemy.eecs.berkeley.edu/~eal
More information about the Kepler-dev
mailing list