[kepler-dev] [Ptolemy] Re: Bug 2240: add support for null values to data passing among ports

Tue Dec 13 09:02:43 PST 2005

Shawn writes:
> I would think that the right functionality would be to literally "drop"
> missing values (if the flag in get was set as such); which would mean that
> the loop would treat these tokens with missing values as if they were
> null (and thus, keep "looping").

Right, the problem is that the loop in IOPort.get() is looping through
multiple receivers on one channel, which is what happens when there
is a transparent port.  The class comment says:

     *  If the channel has a group with more than one receiver (something
     *  that is possible if this is a transparent port), then this method
     *  calls get() on all receivers, but returns only the first non-null
     *  token returned by these calls.
     *  Normally this method is not used on transparent ports.

So, if get() gets a missing token, then get() is not really the place
to wait for a non-missing token.

For SDF, we could try to wedge something in to the scheduler that
would fire an upstream actor again, but this seems bad.

I'm coming to the opinion that a model with missing token semantics
where we drop tokens is not SDF.  It is either Dynamic Dataflow (DDF)
or possibly PN.

If in SDF we have a missing token and handle it as Steve suggests,
then we are ok.

_Christopher

--------

    I think it is ok to call these "missing" values, although in database
    parlance, they are really referred to as NULL values.

    I would think that the right functionality would be to literally "drop"
    missing values (if the flag in get was set as such); which would mean that
    the loop would treat these tokens with missing values as if they were
    null (and thus, keep "looping").

    It sounds like, however, in SDF, this might mess up the token
    consumption/production rate (i.e., an actor may receive multiple
    missing-valued tokens, and not produce anything).  This shouldn't be a
    problem for PN though, right?

    One option for SDF would be to literally "propagate" the missing-valued
    tokens, instead of "dropping" them (as in PN).   Then, it seems the
    consumption/production rates at least wouldn't be violated.  Doing this
    correctly might be a bit tricky, but could probably perform this based on
    the token production/consumption rates --

    If this is a possible work-around, we might want to change the flag from
    "dropMissingValues" to "ignoreMissingValues" -- with slightly different
    semantics depending on the particular director used.

    Most scientific workflows it seems are really dataflow driven (pipelines),
    and so DE and SR don't seem like the right fit (based on my limited
    knowledge of these domains) ... .

    -shawn

    On Mon, 12 Dec 2005, Christopher Brooks wrote:

    > I started looking in to this, below are some random musings.
    >
    > Comments are welcome.
    >
    > I added two methods to Token:
    >
    >     /** Return true if the token has been set to null.
    >      *  @return True if the token has been set to null by calling
    >      *  {@link #setToNull()).
    >      *  @see #setToNull()
    >      */
    >     public boolean isNull() {
    >         return _isNull;
    >     }
    >
    >     /** Set the value of this token to null.
    >      */
    >     public void setToNull() {
    >         // It would be nice if this method was called "null()", but
    >         // null is a reserved Java keyword.
    >         _isNull = true;
    >     }
    >
    >
    > I'm thinking we should call these tokens "missing" instead of null
    > because null is a keyword and a null Token is not null in the usual
    > Java sense.  Other possibilities are "absent" and "empty".
    > sr.lib.AbsentToken already exists, so absent is out.
    >
    > I modified IOPort to have
    > IOPort.get(int channnelIndex, boolean dropNullValues),
    > and
    > IOPort.get(int channnelIndex, int vectorLength, boolean dropNullValues),
    >
    > Looking at IOPort.get(int channnelIndex, boolean dropNullValues),
    > which basically loops through the receivers and gets the first
    > non-null token, we have:
    >
    >         localReceivers = getReceivers();
    > ...
    >         Token token = null;
    >
    >         for (int j = 0; j < localReceivers[channelIndex].length; j++) {
    >             Token localToken = localReceivers[channelIndex][j].get();
    >
    >             if (token == null) {
    >                 token = localToken;
    >             }
    >         }
    >
    >         if (token == null) {
    >             throw new NoTokenException(this, "No token to return.");
    >         }
    >
    > So, say we receive a missing token on one of the channels?
    > What do we do?
    > We could go through the rest of the channels and hopefully find
    > a non-null and non-missing token.
    > If we find a non-null and non-missing token, then we could return the
    > missing token.   This would invisibly drop the missing token on a channel
   ,
    > but it would assume that there is a non-missing token to be had
    >
    > If we don't find a non-null and non-missing token, then should we return
    > the missing token that we found?  Or, do we throw an exception like
    > we used to?  Or, should we somehow busywait? Busywaiting is gross
    >
    > It would be nice if we could say "Ok, wait for a non-missing token to
    > show up," but I don't think get() in SDF does not really handle this,
    > it would mess up the balance equations.
    >
    > Maybe I need a small test example to work from, like two actors:
    >
    > RandomMissingRampSource ---> NonStrictTest
    >
    > RandomMissingRampSource would be a Ramp that would produce Missing tokens
    > randomly.
    >
    > NonStrictTest should work fine here because NonStrictTest ignores
    > absent inputs and checks inputs in the postfire() method instead of
    > in fire() like actor.lib.Test.
    >
    > Another test would be
    >                             Const
    >                              |
    >                              V
    > RandomMissingRampSource ---> MissingCapableAdd -> Test
    >
    > Where MissingCapableAdd would "Do the right thing" and consume
    > missing Tokens and add Const to the non-missing tokens.
    >
    > This all seems to tie in to strictness and where we read data (fire
    > vs. postfire).  Domain polymorphic actors should do processing
    > in postfire() not fire() because domains like CT might call
    > fire() multiple times.  It seems like what we really need to do
    > is have "Missing" capable actors call fire() and read "Missing" tokens
    > until they get a non-missing token and then call postfire().
    > I think this would mess with determinisim and scheduling?
    >
    > Also, there are complications with strictness and multiports.
    > NonStrictTest says:
    >     // The Test actor could be extended so that Strictness was a paramete
   r,
    >     // but that would require some slightly tricky code to handle
    >     // multiports in a non-strict fashion.  The problem is that if
    >     // we have more than one input channel, and we want to handle
    >     // non-strict inputs, then we need to keep track of number of
    >     // tokens we have seen on each channel. Also, this actor does
    >     // not read inputs until postfire(), which is too late to produce
    >     // an output, as done by Test.
    >
    > _Christopher
    >
    >
    >
    > --------
    >
    >
    >     This solution sounds reasonable to me...
    >
    >     Edward
    >
    >     At 06:36 PM 12/12/2005 -0800, Christopher Brooks wrote:
    >     >Hi Edward,
    >     >
    >     >One of the Kepler bugs blocking the Kepler release is:
    >     >http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2240
    >     >which is reproduced below.
    >     >
    >     >Basically, Kepler needs to handle missing data because not all
    >     >ecological data sets are complete.
    >     >
    >     >The solution to this problem needs to work in PN and SDF.
    >     >
    >     >The Synchronous/Reactive (SR) domain has the notion of absent values
   .
    >     >For example, domains/sr/lib/Absent.java has this class comment:
    >     >
    >     >  This actor outputs absent values.  That is, it produces no tokens,
    >     >  and it calls the sendClear() method of the output port on each
    >     >  firing.
    >     >
    >     >The sendClear() method is the IOPort.java sendClear() method.
    >     >IOPort also has sendClearInside() and broadcastClear().
    >     >
    >     >Do you have any comments?
    >     >
    >     >Below is the text from the bug report:
    >     >Below is the text from the bug report:
    >     >--start--
    >     >Currently ptolemy and kepler do not support passing null values
    >     >(sometimes called missing values) among ports, even though this is
    >     >common in analytical systems like R and SAS.  The concept of null is
    >     >not even defined in the token types. This causes a real problem for
    >     >data sources that are sparsely populated, as well as data streams th
   at
    >     >result from data integration operations that might produce null
    >     >values.  We need to extend the underlying token representation to
    >     >include a concept of null values and the actor framework to protect
    >     >existing actors that might not know how to handle null values.
    >     >Because nulls cannot currently be represented in Kepler, none of the
    >     >existing actors support them.  An exception is thrown whenever a
    >     >missing value is detected by the EML data source, and workflow
    >     >execution ceases.
    >     >
    >     >Bowers and Jones discussed one possible partial solution to this on
    >     >IRC, which is summarized here.
    >     >
    >     >1) Override the Token base class to support null values by providing
    >     >    two new methods:
    >     >
    >     >    Token.null() sets the token's value to null
    >     >    boolean Token.isNull() returns true if the token has been set to
    >     >    null
    >     >
    >     >2) Override TypedIOPort to add a new method that takes a boolean
    >     >    "dropNull"
    >     >
    >     >     a) by default this could be set to "true" then the existing get
   ()
    >     >    method would be reimplemented to call the new one with a default
    of
    >     >    "true" because we can assume that no existing actor written can
    >     >    handle null values, since there is no way to pass null values, a
   nd
    >     >    so existing calls to get() will invisibly drop null values.
    >     >
    >     >     b) if an actor can handle null values, then it passes "false"
    >     >     which indicates that the actor knows how to deal with nulls and
    >     >     wants to receive them
    >     >
    >     >so the changes to IOPort (or maybe TypedIOPOrt) would be:
    >     >
    >     >IOPOrt.get(channelindex) becomes get(channelIndex, dropNullValues)
    >     >and
    >     >IOPort.get(channelIndex, int vectorLength) becomes get(channelIndex,
    >     >vectorLength, dropNullValues)
    >     >
    >     >so, for example, the new implmenetation of get(channelIndex) would
    >     >simply call get(channelIndex, true), so existing actors would not ev
   en
    >     >notice the change.
    >     >--end--
    >     >
    >     >_Christopher
    >
    >     ------------
    >     Edward A. Lee
    >     Professor, Chair of the EE Division, Associate Chair of EECS
    >     231 Cory Hall, UC Berkeley, Berkeley, CA 94720
    >     phone: 510-642-0253 or 510-642-0455, fax: 510-642-2845
    >     eal at eecs.Berkeley.EDU, http://ptolemy.eecs.berkeley.edu/~eal
    > --------
    > _______________________________________________
    > Kepler-dev mailing list
    > Kepler-dev at ecoinformatics.org
    > http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
    >
    _______________________________________________
    Ptolemy maillist  -  Ptolemy at chess.eecs.berkeley.edu
    http://chess.eecs.berkeley.edu/ptolemy/listinfo/ptolemy
--------