[kepler-dev] [Ptolemy] Re: Bug 2240: add support for null values to data passing among ports

Tue Dec 13 14:52:06 PST 2005

We chatted about this at Ptolemy group lunch.
(Edward proposes calling these "nil" tokens as in Lisp's nil and t.)

The bottom line is that dropping missing tokens means that the model
is not SDF.

One thing that would help would be a concrete description of a model
and what you expect to happen.

I propose:
                             Const
                              |
                              V
RandomMissingRampSource ---> MissingCapableAdd -> Test

Where RandomMissingRampSource generates a Ramp with missing tokens
thrown in.

What do you expect as an output?

Say RandomMissingRampSource produces:
    1 2 3 <missing> 4 5
And the Const is 10
Should the output at the Test be
    11 12 13 14 15

It seems like what you really want is either PN, SR, DE, DDF or DT.
If you care about time then DE is the way to go.
If we were in DE, the output might look like:
   t1  t2  t3     t5  t6
   11  12  13     14  15

Where t1 is time interval 1 etc.  Note there is no t4.  

Edward suggested that what you may want is a tuple of time stamp or
sequence number and a value.

You could also embed the RandomMissingSource in a composite actor
with a different director (DE?) and try to use it in a SDF model.

Since Token now has a missing() and isMissing() method, perhaps
the thing to do is define RandomMissingRampSource and
MissingCapableAdd. 

I suppose a MissingTransformer actor would be useful, where it would
look for Missing tokens and discard them.  This actor would not work
in SDF.  MissingTransformer would read tokens and discard missing
tokens.

Another idea is to create an R actor that processes the 
data with missing tokens appropriately.

Comments?

_Christopher

--------

    The problem with this approach -- of adding new "nullable" types and
    requiring each actor to process a missing token explicitly -- is that
    most existing/legacy actors don't process null values and so you
    wouldn't be able to use most data sets with existing actors or in
    existing workflows.  It is my impression that many other systems (R,
    SAS, even SQL) have automated mechanisms to handle null values, some
    in quite fancy ways.

    Our original motivation for changing the basic Token and IOPort
    interface was to not require changes to the type lattice (which aren't
    always obvious when it comes to supporting null values) and to
    automate the handling of null values for actors that don't explicitly
    manage them. We were trying to slip in the ability to handle null
    values, without having to make any changes to existing actors, while
    allowing data sets with null values to still use non-null-aware
    actors.

    It seems that SDF is really the problematic case here ...

    -shawn

    Stephen Neuendorffer wrote:
     > I don't think this needs to be handled specially.  To me this has
     > always seemed like the perfect application of union/variant types.
     > The main difference would be that as a variant type, the receiving
     > actor would have to process each 'missing' token explicitly.
     >
     > (I think) an SDF model should propagate 'missing' values and generate
     > missing values, just like any other token.  I tend to agree that if you
     > want a particular definition of something like missing tokens, then
     > you want to give a fair bit of thought to it.  I'm guessing that the
     > semantics of R are to treat it as another data value with
     > anything OP missing = missing.
     >
     > Steve
     >
     >> -----Original Message-----
     >> From: ptolemy-admin at chess.eecs.berkeley.edu
     >> [mailto:ptolemy-admin at chess.eecs.berkeley.edu] On Behalf Of
     >> Edward A. Lee
     >> Sent: Tuesday, December 13, 2005 6:45 AM
     >> To: Christopher Brooks
     >> Cc: kepler-dev at ecoinformatics.org; ptresearch at chess.eecs.berkeley.edu
     >> Subject: [Ptolemy] Re: [kepler-dev] Bug 2240: add support for
     >> null values to data passing among ports
     >>
     >>
     >> Hmm... This now leads me to change my mind... Perhaps this
     >> needs to be more carefully thought through.  I suspect we
     >> will be introducing a back-door mechanism for getting
     >> unexpected nondeterminism by this mechanism... Let's not
     >> implement it without further discussion...
     >>
     >> It is arguable that if you want a well-defined notion of
     >> missing tokens, you should be using SR or DE.  Both of these
     >> have clean semantics for absent tokens, and it's already
     >> fully supported...
     >>
     >> Edward
     >>
     >> At 10:01 PM 12/12/2005 -0800, Christopher Brooks wrote:
     >>> I started looking in to this, below are some random musings.
     >>>
     >>> Comments are welcome.
     >>>
     >>> I added two methods to Token:
     >>>
     >>>     /** Return true if the token has been set to null.
     >>>      *  @return True if the token has been set to null by calling
     >>>      *  {@link #setToNull()).
     >>>      *  @see #setToNull()
     >>>      */
     >>>     public boolean isNull() {
     >>>         return _isNull;
     >>>     }
     >>>
     >>>     /** Set the value of this token to null.
     >>>      */
     >>>     public void setToNull() {
     >>>         // It would be nice if this method was called "null()", but
     >>>         // null is a reserved Java keyword.
     >>>         _isNull = true;
     >>>     }
     >>>
     >>>
     >>> I'm thinking we should call these tokens "missing" instead of null
     >>> because null is a keyword and a null Token is not null in the usual
     >>> Java sense.  Other possibilities are "absent" and "empty".
     >>> sr.lib.AbsentToken already exists, so absent is out.
     >>>
     >>> I modified IOPort to have
     >>> IOPort.get(int channnelIndex, boolean dropNullValues), and
     >>> IOPort.get(int channnelIndex, int vectorLength, boolean
     >>> dropNullValues),
     >>>
     >>> Looking at IOPort.get(int channnelIndex, boolean
     >> dropNullValues), which
     >>> basically loops through the receivers and gets the first non-null
     >>> token, we have:
     >>>
     >>>         localReceivers = getReceivers(); ...
     >>>         Token token = null;
     >>>
     >>>         for (int j = 0; j <
     >> localReceivers[channelIndex].length; j++) {
     >>>             Token localToken =
     >> localReceivers[channelIndex][j].get();
     >>>             if (token == null) {
     >>>                 token = localToken;
     >>>             }
     >>>         }
     >>>
     >>>         if (token == null) {
     >>>             throw new NoTokenException(this, "No token to return.");
     >>>         }
     >>>
     >>> So, say we receive a missing token on one of the channels?
     >>> What do we do?
     >>> We could go through the rest of the channels and hopefully find a
     >>> non-null and non-missing token.
     >>> If we find a non-null and non-missing token, then we could return the
     >>> missing token.   This would invisibly drop the missing token
     >> on a channel,
     >>> but it would assume that there is a non-missing token to be had
     >>>
     >>> If we don't find a non-null and non-missing token, then should we
     >>> return the missing token that we found?  Or, do we throw an
     >> exception
     >>> like we used to?  Or, should we somehow busywait?
     >> Busywaiting is gross
     >>> It would be nice if we could say "Ok, wait for a non-missing
     >> token to
     >>> show up," but I don't think get() in SDF does not really
     >> handle this,
     >>> it would mess up the balance equations.
     >>>
     >>> Maybe I need a small test example to work from, like two actors:
     >>>
     >>> RandomMissingRampSource ---> NonStrictTest
     >>>
     >>> RandomMissingRampSource would be a Ramp that would produce Missing
     >>> tokens randomly.
     >>>
     >>> NonStrictTest should work fine here because NonStrictTest ignores
     >>> absent inputs and checks inputs in the postfire() method
     >> instead of in
     >>> fire() like actor.lib.Test.
     >>>
     >>> Another test would be
     >>>                             Const
     >>>                              |
     >>>                              V
     >>> RandomMissingRampSource ---> MissingCapableAdd -> Test
     >>>
     >>> Where MissingCapableAdd would "Do the right thing" and
     >> consume missing
     >>> Tokens and add Const to the non-missing tokens.
     >>>
     >>> This all seems to tie in to strictness and where we read
     >> data (fire vs.
     >>> postfire).  Domain polymorphic actors should do processing in
     >>> postfire() not fire() because domains like CT might call
     >>> fire() multiple times.  It seems like what we really need to
     >> do is have
     >>> "Missing" capable actors call fire() and read "Missing" tokens until
     >>> they get a non-missing token and then call postfire().
     >>> I think this would mess with determinisim and scheduling?
     >>>
     >>> Also, there are complications with strictness and multiports.
     >>> NonStrictTest says:
     >>>     // The Test actor could be extended so that Strictness
     >> was a parameter,
     >>>     // but that would require some slightly tricky code to handle
     >>>     // multiports in a non-strict fashion.  The problem is that if
     >>>     // we have more than one input channel, and we want to handle
     >>>     // non-strict inputs, then we need to keep track of number of
     >>>     // tokens we have seen on each channel. Also, this actor does
     >>>     // not read inputs until postfire(), which is too late
     >> to produce
     >>>     // an output, as done by Test.
     >>>
     >>> _Christopher
     >>>
     >>>
     >>>
     >>> --------
     >>>
     >>>
     >>>     This solution sounds reasonable to me...
     >>>
     >>>     Edward
     >>>
     >>>     At 06:36 PM 12/12/2005 -0800, Christopher Brooks wrote:
     >>>     >Hi Edward,
     >>>     >
     >>>     >One of the Kepler bugs blocking the Kepler release is:
     >>>     >http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2240
     >>>     >which is reproduced below.
     >>>     >
     >>>     >Basically, Kepler needs to handle missing data because not all
     >>>     >ecological data sets are complete.
     >>>     >
     >>>     >The solution to this problem needs to work in PN and SDF.
     >>>     >
     >>>     >The Synchronous/Reactive (SR) domain has the notion of
     >> absent values.
     >>>     >For example, domains/sr/lib/Absent.java has this class comment:
     >>>     >
     >>>     >  This actor outputs absent values.  That is, it
     >> produces no tokens,
     >>>     >  and it calls the sendClear() method of the output
     >> port on each
     >>>     >  firing.
     >>>     >
     >>>     >The sendClear() method is the IOPort.java sendClear() method.
     >>>     >IOPort also has sendClearInside() and broadcastClear().
     >>>     >
     >>>     >Do you have any comments?
     >>>     >
     >>>     >Below is the text from the bug report:
     >>>     >Below is the text from the bug report:
     >>>     >--start--
     >>>     >Currently ptolemy and kepler do not support passing null values
     >>>     >(sometimes called missing values) among ports, even
     >> though this is
     >>>     >common in analytical systems like R and SAS.  The
     >> concept of null is
     >>>     >not even defined in the token types. This causes a
     >> real problem for
     >>>     >data sources that are sparsely populated, as well as
     >> data streams that
     >>>     >result from data integration operations that might produce null
     >>>     >values.  We need to extend the underlying token
     >> representation to
     >>>     >include a concept of null values and the actor
     >> framework to protect
     >>>     >existing actors that might not know how to handle null values.
     >>>     >Because nulls cannot currently be represented in
     >> Kepler, none of the
     >>>     >existing actors support them.  An exception is thrown
     >> whenever a
     >>>     >missing value is detected by the EML data source, and workflow
     >>>     >execution ceases.
     >>>     >
     >>>     >Bowers and Jones discussed one possible partial
     >> solution to this on
     >>>     >IRC, which is summarized here.
     >>>     >
     >>>     >1) Override the Token base class to support null
     >> values by providing
     >>>     >    two new methods:
     >>>     >
     >>>     >    Token.null() sets the token's value to null
     >>>     >    boolean Token.isNull() returns true if the token
     >> has been set to
     >>>     >    null
     >>>     >
     >>>     >2) Override TypedIOPort to add a new method that takes
     >> a boolean
     >>>     >    "dropNull"
     >>>     >
     >>>     >     a) by default this could be set to "true" then
     >> the existing get()
     >>>     >    method would be reimplemented to call the new one
     >> with a default of
     >>>     >    "true" because we can assume that no existing
     >> actor written can
     >>>     >    handle null values, since there is no way to pass
     >> null values, and
     >>>     >    so existing calls to get() will invisibly drop null values.
     >>>     >
     >>>     >     b) if an actor can handle null values, then it
     >> passes "false"
     >>>     >     which indicates that the actor knows how to deal
     >> with nulls and
     >>>     >     wants to receive them
     >>>     >
     >>>     >so the changes to IOPort (or maybe TypedIOPOrt) would be:
     >>>     >
     >>>     >IOPOrt.get(channelindex) becomes get(channelIndex,
     >> dropNullValues)
     >>>     >and
     >>>     >IOPort.get(channelIndex, int vectorLength) becomes
     >> get(channelIndex,
     >>>     >vectorLength, dropNullValues)
     >>>     >
     >>>     >so, for example, the new implmenetation of
     >> get(channelIndex) would
     >>>     >simply call get(channelIndex, true), so existing
     >> actors would not even
     >>>     >notice the change.
     >>>     >--end--
     >>>     >
     >>>     >_Christopher
     >>>
     >>>     ------------
     >>>     Edward A. Lee
     >>>     Professor, Chair of the EE Division, Associate Chair of EECS
     >>>     231 Cory Hall, UC Berkeley, Berkeley, CA 94720
     >>>     phone: 510-642-0253 or 510-642-0455, fax: 510-642-2845
     >>>     eal at eecs.Berkeley.EDU, http://ptolemy.eecs.berkeley.edu/~eal
     >>> --------
     >>> _______________________________________________
     >>> Kepler-dev mailing list
     >>> Kepler-dev at ecoinformatics.org
     >>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo
     >> /kepler-de
     >>> v
     >> ------------
     >> Edward A. Lee
     >> Professor, Chair of the EE Division, Associate Chair of EECS
     >> 231 Cory Hall, UC Berkeley, Berkeley, CA 94720
     >> phone: 510-642-0253 or 510-642-0455, fax: 510-642-2845
     >> eal at eecs.Berkeley.EDU, http://ptolemy.eecs.berkeley.edu/~eal
     >>
     >> _______________________________________________
     >> Ptolemy maillist  -  Ptolemy at chess.eecs.berkeley.edu
     >> http://chess.eecs.berkeley.edu/ptolemy/listinfo/ptolemy
     >>
     >>
     >
     > _______________________________________________
     > Kepler-dev mailing list
     > Kepler-dev at ecoinformatics.org
     > http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
--------