[kepler-dev] Bug 2240: add support for null values to data passing among ports

Tue Dec 13 06:44:34 PST 2005

Hmm... This now leads me to change my mind... Perhaps this needs
to be more carefully thought through.  I suspect we will be introducing
a back-door mechanism for getting unexpected nondeterminism by this
mechanism... Let's not implement it without further discussion...

It is arguable that if you want a well-defined notion of missing tokens,
you should be using SR or DE.  Both of these have clean semantics for
absent tokens, and it's already fully supported...

Edward

At 10:01 PM 12/12/2005 -0800, Christopher Brooks wrote:
>I started looking in to this, below are some random musings.
>
>Comments are welcome.
>
>I added two methods to Token:
>
>     /** Return true if the token has been set to null.
>      *  @return True if the token has been set to null by calling
>      *  {@link #setToNull()).
>      *  @see #setToNull()
>      */
>     public boolean isNull() {
>         return _isNull;
>     }
>
>     /** Set the value of this token to null.
>      */
>     public void setToNull() {
>         // It would be nice if this method was called "null()", but
>         // null is a reserved Java keyword.
>         _isNull = true;
>     }
>
>
>I'm thinking we should call these tokens "missing" instead of null
>because null is a keyword and a null Token is not null in the usual
>Java sense.  Other possibilities are "absent" and "empty".
>sr.lib.AbsentToken already exists, so absent is out.
>
>I modified IOPort to have
>IOPort.get(int channnelIndex, boolean dropNullValues),
>and
>IOPort.get(int channnelIndex, int vectorLength, boolean dropNullValues),
>
>Looking at IOPort.get(int channnelIndex, boolean dropNullValues),
>which basically loops through the receivers and gets the first
>non-null token, we have:
>
>         localReceivers = getReceivers();
>...
>         Token token = null;
>
>         for (int j = 0; j < localReceivers[channelIndex].length; j++) {
>             Token localToken = localReceivers[channelIndex][j].get();
>
>             if (token == null) {
>                 token = localToken;
>             }
>         }
>
>         if (token == null) {
>             throw new NoTokenException(this, "No token to return.");
>         }
>
>So, say we receive a missing token on one of the channels?
>What do we do?
>We could go through the rest of the channels and hopefully find
>a non-null and non-missing token.
>If we find a non-null and non-missing token, then we could return the
>missing token.   This would invisibly drop the missing token on a channel,
>but it would assume that there is a non-missing token to be had
>
>If we don't find a non-null and non-missing token, then should we return
>the missing token that we found?  Or, do we throw an exception like
>we used to?  Or, should we somehow busywait? Busywaiting is gross
>
>It would be nice if we could say "Ok, wait for a non-missing token to
>show up," but I don't think get() in SDF does not really handle this,
>it would mess up the balance equations.
>
>Maybe I need a small test example to work from, like two actors:
>
>RandomMissingRampSource ---> NonStrictTest
>
>RandomMissingRampSource would be a Ramp that would produce Missing tokens
>randomly.
>
>NonStrictTest should work fine here because NonStrictTest ignores
>absent inputs and checks inputs in the postfire() method instead of
>in fire() like actor.lib.Test.
>
>Another test would be
>                             Const
>                              |
>                              V
>RandomMissingRampSource ---> MissingCapableAdd -> Test
>
>Where MissingCapableAdd would "Do the right thing" and consume
>missing Tokens and add Const to the non-missing tokens.
>
>This all seems to tie in to strictness and where we read data (fire
>vs. postfire).  Domain polymorphic actors should do processing
>in postfire() not fire() because domains like CT might call
>fire() multiple times.  It seems like what we really need to do
>is have "Missing" capable actors call fire() and read "Missing" tokens
>until they get a non-missing token and then call postfire().
>I think this would mess with determinisim and scheduling?
>
>Also, there are complications with strictness and multiports.
>NonStrictTest says:
>     // The Test actor could be extended so that Strictness was a parameter,
>     // but that would require some slightly tricky code to handle
>     // multiports in a non-strict fashion.  The problem is that if
>     // we have more than one input channel, and we want to handle
>     // non-strict inputs, then we need to keep track of number of
>     // tokens we have seen on each channel. Also, this actor does
>     // not read inputs until postfire(), which is too late to produce
>     // an output, as done by Test.
>
>_Christopher
>
>
>
>--------
>
>
>     This solution sounds reasonable to me...
>
>     Edward
>
>     At 06:36 PM 12/12/2005 -0800, Christopher Brooks wrote:
>     >Hi Edward,
>     >
>     >One of the Kepler bugs blocking the Kepler release is:
>     >http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2240
>     >which is reproduced below.
>     >
>     >Basically, Kepler needs to handle missing data because not all
>     >ecological data sets are complete.
>     >
>     >The solution to this problem needs to work in PN and SDF.
>     >
>     >The Synchronous/Reactive (SR) domain has the notion of absent values.
>     >For example, domains/sr/lib/Absent.java has this class comment:
>     >
>     >  This actor outputs absent values.  That is, it produces no tokens,
>     >  and it calls the sendClear() method of the output port on each
>     >  firing.
>     >
>     >The sendClear() method is the IOPort.java sendClear() method.
>     >IOPort also has sendClearInside() and broadcastClear().
>     >
>     >Do you have any comments?
>     >
>     >Below is the text from the bug report:
>     >Below is the text from the bug report:
>     >--start--
>     >Currently ptolemy and kepler do not support passing null values
>     >(sometimes called missing values) among ports, even though this is
>     >common in analytical systems like R and SAS.  The concept of null is
>     >not even defined in the token types. This causes a real problem for
>     >data sources that are sparsely populated, as well as data streams that
>     >result from data integration operations that might produce null
>     >values.  We need to extend the underlying token representation to
>     >include a concept of null values and the actor framework to protect
>     >existing actors that might not know how to handle null values.
>     >Because nulls cannot currently be represented in Kepler, none of the
>     >existing actors support them.  An exception is thrown whenever a
>     >missing value is detected by the EML data source, and workflow
>     >execution ceases.
>     >
>     >Bowers and Jones discussed one possible partial solution to this on
>     >IRC, which is summarized here.
>     >
>     >1) Override the Token base class to support null values by providing
>     >    two new methods:
>     >
>     >    Token.null() sets the token's value to null
>     >    boolean Token.isNull() returns true if the token has been set to
>     >    null
>     >
>     >2) Override TypedIOPort to add a new method that takes a boolean
>     >    "dropNull"
>     >
>     >     a) by default this could be set to "true" then the existing get()
>     >    method would be reimplemented to call the new one with a default of
>     >    "true" because we can assume that no existing actor written can
>     >    handle null values, since there is no way to pass null values, and
>     >    so existing calls to get() will invisibly drop null values.
>     >
>     >     b) if an actor can handle null values, then it passes "false"
>     >     which indicates that the actor knows how to deal with nulls and
>     >     wants to receive them
>     >
>     >so the changes to IOPort (or maybe TypedIOPOrt) would be:
>     >
>     >IOPOrt.get(channelindex) becomes get(channelIndex, dropNullValues)
>     >and
>     >IOPort.get(channelIndex, int vectorLength) becomes get(channelIndex,
>     >vectorLength, dropNullValues)
>     >
>     >so, for example, the new implmenetation of get(channelIndex) would
>     >simply call get(channelIndex, true), so existing actors would not even
>     >notice the change.
>     >--end--
>     >
>     >_Christopher
>
>     ------------
>     Edward A. Lee
>     Professor, Chair of the EE Division, Associate Chair of EECS
>     231 Cory Hall, UC Berkeley, Berkeley, CA 94720
>     phone: 510-642-0253 or 510-642-0455, fax: 510-642-2845
>     eal at eecs.Berkeley.EDU, http://ptolemy.eecs.berkeley.edu/~eal
>--------
>_______________________________________________
>Kepler-dev mailing list
>Kepler-dev at ecoinformatics.org
>http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev

------------
Edward A. Lee
Professor, Chair of the EE Division, Associate Chair of EECS
231 Cory Hall, UC Berkeley, Berkeley, CA 94720
phone: 510-642-0253 or 510-642-0455, fax: 510-642-2845
eal at eecs.Berkeley.EDU, http://ptolemy.eecs.berkeley.edu/~eal