[kepler-dev] Bug 2240: add support for null values to data passing among ports

Christopher Brooks cxh at eecs.berkeley.edu
Mon Dec 12 22:01:45 PST 2005


I started looking in to this, below are some random musings.

Comments are welcome.

I added two methods to Token:

    /** Return true if the token has been set to null.
     *  @return True if the token has been set to null by calling
     *  {@link #setToNull()).
     *  @see #setToNull()
     */
    public boolean isNull() {
        return _isNull;
    }

    /** Set the value of this token to null.
     */
    public void setToNull() {
        // It would be nice if this method was called "null()", but
        // null is a reserved Java keyword.
        _isNull = true;
    }


I'm thinking we should call these tokens "missing" instead of null
because null is a keyword and a null Token is not null in the usual
Java sense.  Other possibilities are "absent" and "empty".  
sr.lib.AbsentToken already exists, so absent is out.

I modified IOPort to have 
IOPort.get(int channnelIndex, boolean dropNullValues),
and
IOPort.get(int channnelIndex, int vectorLength, boolean dropNullValues),

Looking at IOPort.get(int channnelIndex, boolean dropNullValues),
which basically loops through the receivers and gets the first
non-null token, we have:

        localReceivers = getReceivers();
...
        Token token = null;

        for (int j = 0; j < localReceivers[channelIndex].length; j++) {
            Token localToken = localReceivers[channelIndex][j].get();

            if (token == null) {
                token = localToken;
            }
        }

        if (token == null) {
            throw new NoTokenException(this, "No token to return.");
        }

So, say we receive a missing token on one of the channels?
What do we do? 
We could go through the rest of the channels and hopefully find 
a non-null and non-missing token.
If we find a non-null and non-missing token, then we could return the
missing token.   This would invisibly drop the missing token on a channel, 
but it would assume that there is a non-missing token to be had

If we don't find a non-null and non-missing token, then should we return
the missing token that we found?  Or, do we throw an exception like
we used to?  Or, should we somehow busywait? Busywaiting is gross

It would be nice if we could say "Ok, wait for a non-missing token to
show up," but I don't think get() in SDF does not really handle this,
it would mess up the balance equations.

Maybe I need a small test example to work from, like two actors:

RandomMissingRampSource ---> NonStrictTest

RandomMissingRampSource would be a Ramp that would produce Missing tokens
randomly.  

NonStrictTest should work fine here because NonStrictTest ignores
absent inputs and checks inputs in the postfire() method instead of
in fire() like actor.lib.Test.

Another test would be
                            Const
                             |
                             V
RandomMissingRampSource ---> MissingCapableAdd -> Test

Where MissingCapableAdd would "Do the right thing" and consume 
missing Tokens and add Const to the non-missing tokens.

This all seems to tie in to strictness and where we read data (fire
vs. postfire).  Domain polymorphic actors should do processing
in postfire() not fire() because domains like CT might call
fire() multiple times.  It seems like what we really need to do
is have "Missing" capable actors call fire() and read "Missing" tokens
until they get a non-missing token and then call postfire(). 
I think this would mess with determinisim and scheduling?

Also, there are complications with strictness and multiports.
NonStrictTest says:
    // The Test actor could be extended so that Strictness was a parameter,
    // but that would require some slightly tricky code to handle
    // multiports in a non-strict fashion.  The problem is that if
    // we have more than one input channel, and we want to handle
    // non-strict inputs, then we need to keep track of number of
    // tokens we have seen on each channel. Also, this actor does
    // not read inputs until postfire(), which is too late to produce
    // an output, as done by Test.

_Christopher



--------

    
    This solution sounds reasonable to me...
    
    Edward
    
    At 06:36 PM 12/12/2005 -0800, Christopher Brooks wrote:
    >Hi Edward,
    >
    >One of the Kepler bugs blocking the Kepler release is:
    >http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2240
    >which is reproduced below.
    >
    >Basically, Kepler needs to handle missing data because not all
    >ecological data sets are complete.
    >
    >The solution to this problem needs to work in PN and SDF.
    >
    >The Synchronous/Reactive (SR) domain has the notion of absent values.
    >For example, domains/sr/lib/Absent.java has this class comment:
    >
    >  This actor outputs absent values.  That is, it produces no tokens,
    >  and it calls the sendClear() method of the output port on each
    >  firing.
    >
    >The sendClear() method is the IOPort.java sendClear() method.
    >IOPort also has sendClearInside() and broadcastClear().
    >
    >Do you have any comments?
    >
    >Below is the text from the bug report:
    >Below is the text from the bug report:
    >--start--
    >Currently ptolemy and kepler do not support passing null values
    >(sometimes called missing values) among ports, even though this is
    >common in analytical systems like R and SAS.  The concept of null is
    >not even defined in the token types. This causes a real problem for
    >data sources that are sparsely populated, as well as data streams that
    >result from data integration operations that might produce null
    >values.  We need to extend the underlying token representation to
    >include a concept of null values and the actor framework to protect
    >existing actors that might not know how to handle null values.
    >Because nulls cannot currently be represented in Kepler, none of the
    >existing actors support them.  An exception is thrown whenever a
    >missing value is detected by the EML data source, and workflow
    >execution ceases.
    >
    >Bowers and Jones discussed one possible partial solution to this on
    >IRC, which is summarized here.
    >
    >1) Override the Token base class to support null values by providing
    >    two new methods:
    >
    >    Token.null() sets the token's value to null
    >    boolean Token.isNull() returns true if the token has been set to
    >    null
    >
    >2) Override TypedIOPort to add a new method that takes a boolean
    >    "dropNull"
    >
    >     a) by default this could be set to "true" then the existing get()
    >    method would be reimplemented to call the new one with a default of
    >    "true" because we can assume that no existing actor written can
    >    handle null values, since there is no way to pass null values, and
    >    so existing calls to get() will invisibly drop null values.
    >
    >     b) if an actor can handle null values, then it passes "false"
    >     which indicates that the actor knows how to deal with nulls and
    >     wants to receive them
    >
    >so the changes to IOPort (or maybe TypedIOPOrt) would be:
    >
    >IOPOrt.get(channelindex) becomes get(channelIndex, dropNullValues)
    >and
    >IOPort.get(channelIndex, int vectorLength) becomes get(channelIndex,
    >vectorLength, dropNullValues)
    >
    >so, for example, the new implmenetation of get(channelIndex) would
    >simply call get(channelIndex, true), so existing actors would not even
    >notice the change.
    >--end--
    >
    >_Christopher
    
    ------------
    Edward A. Lee
    Professor, Chair of the EE Division, Associate Chair of EECS
    231 Cory Hall, UC Berkeley, Berkeley, CA 94720
    phone: 510-642-0253 or 510-642-0455, fax: 510-642-2845
    eal at eecs.Berkeley.EDU, http://ptolemy.eecs.berkeley.edu/~eal  
--------


More information about the Kepler-dev mailing list