[kepler-dev] [Ptolemy] Re: Bug 2240: add support for null values to data passing among ports
Shawn Bowers
sbowers at ucdavis.edu
Tue Dec 13 10:18:42 PST 2005
The problem with this approach -- of adding new "nullable" types and
requiring each actor to process a missing token explicitly -- is that
most existing/legacy actors don't process null values and so you
wouldn't be able to use most data sets with existing actors or in
existing workflows. It is my impression that many other systems (R,
SAS, even SQL) have automated mechanisms to handle null values, some
in quite fancy ways.
Our original motivation for changing the basic Token and IOPort
interface was to not require changes to the type lattice (which aren't
always obvious when it comes to supporting null values) and to
automate the handling of null values for actors that don't explicitly
manage them. We were trying to slip in the ability to handle null
values, without having to make any changes to existing actors, while
allowing data sets with null values to still use non-null-aware
actors.
It seems that SDF is really the problematic case here ...
-shawn
Stephen Neuendorffer wrote:
> I don't think this needs to be handled specially. To me this has
> always seemed like the perfect application of union/variant types.
> The main difference would be that as a variant type, the receiving
> actor would have to process each 'missing' token explicitly.
>
> (I think) an SDF model should propagate 'missing' values and generate
> missing values, just like any other token. I tend to agree that if you
> want a particular definition of something like missing tokens, then
> you want to give a fair bit of thought to it. I'm guessing that the
> semantics of R are to treat it as another data value with
> anything OP missing = missing.
>
> Steve
>
>> -----Original Message-----
>> From: ptolemy-admin at chess.eecs.berkeley.edu
>> [mailto:ptolemy-admin at chess.eecs.berkeley.edu] On Behalf Of
>> Edward A. Lee
>> Sent: Tuesday, December 13, 2005 6:45 AM
>> To: Christopher Brooks
>> Cc: kepler-dev at ecoinformatics.org; ptresearch at chess.eecs.berkeley.edu
>> Subject: [Ptolemy] Re: [kepler-dev] Bug 2240: add support for
>> null values to data passing among ports
>>
>>
>> Hmm... This now leads me to change my mind... Perhaps this
>> needs to be more carefully thought through. I suspect we
>> will be introducing a back-door mechanism for getting
>> unexpected nondeterminism by this mechanism... Let's not
>> implement it without further discussion...
>>
>> It is arguable that if you want a well-defined notion of
>> missing tokens, you should be using SR or DE. Both of these
>> have clean semantics for absent tokens, and it's already
>> fully supported...
>>
>> Edward
>>
>> At 10:01 PM 12/12/2005 -0800, Christopher Brooks wrote:
>>> I started looking in to this, below are some random musings.
>>>
>>> Comments are welcome.
>>>
>>> I added two methods to Token:
>>>
>>> /** Return true if the token has been set to null.
>>> * @return True if the token has been set to null by calling
>>> * {@link #setToNull()).
>>> * @see #setToNull()
>>> */
>>> public boolean isNull() {
>>> return _isNull;
>>> }
>>>
>>> /** Set the value of this token to null.
>>> */
>>> public void setToNull() {
>>> // It would be nice if this method was called "null()", but
>>> // null is a reserved Java keyword.
>>> _isNull = true;
>>> }
>>>
>>>
>>> I'm thinking we should call these tokens "missing" instead of null
>>> because null is a keyword and a null Token is not null in the usual
>>> Java sense. Other possibilities are "absent" and "empty".
>>> sr.lib.AbsentToken already exists, so absent is out.
>>>
>>> I modified IOPort to have
>>> IOPort.get(int channnelIndex, boolean dropNullValues), and
>>> IOPort.get(int channnelIndex, int vectorLength, boolean
>>> dropNullValues),
>>>
>>> Looking at IOPort.get(int channnelIndex, boolean
>> dropNullValues), which
>>> basically loops through the receivers and gets the first non-null
>>> token, we have:
>>>
>>> localReceivers = getReceivers(); ...
>>> Token token = null;
>>>
>>> for (int j = 0; j <
>> localReceivers[channelIndex].length; j++) {
>>> Token localToken =
>> localReceivers[channelIndex][j].get();
>>> if (token == null) {
>>> token = localToken;
>>> }
>>> }
>>>
>>> if (token == null) {
>>> throw new NoTokenException(this, "No token to return.");
>>> }
>>>
>>> So, say we receive a missing token on one of the channels?
>>> What do we do?
>>> We could go through the rest of the channels and hopefully find a
>>> non-null and non-missing token.
>>> If we find a non-null and non-missing token, then we could return the
>>> missing token. This would invisibly drop the missing token
>> on a channel,
>>> but it would assume that there is a non-missing token to be had
>>>
>>> If we don't find a non-null and non-missing token, then should we
>>> return the missing token that we found? Or, do we throw an
>> exception
>>> like we used to? Or, should we somehow busywait?
>> Busywaiting is gross
>>> It would be nice if we could say "Ok, wait for a non-missing
>> token to
>>> show up," but I don't think get() in SDF does not really
>> handle this,
>>> it would mess up the balance equations.
>>>
>>> Maybe I need a small test example to work from, like two actors:
>>>
>>> RandomMissingRampSource ---> NonStrictTest
>>>
>>> RandomMissingRampSource would be a Ramp that would produce Missing
>>> tokens randomly.
>>>
>>> NonStrictTest should work fine here because NonStrictTest ignores
>>> absent inputs and checks inputs in the postfire() method
>> instead of in
>>> fire() like actor.lib.Test.
>>>
>>> Another test would be
>>> Const
>>> |
>>> V
>>> RandomMissingRampSource ---> MissingCapableAdd -> Test
>>>
>>> Where MissingCapableAdd would "Do the right thing" and
>> consume missing
>>> Tokens and add Const to the non-missing tokens.
>>>
>>> This all seems to tie in to strictness and where we read
>> data (fire vs.
>>> postfire). Domain polymorphic actors should do processing in
>>> postfire() not fire() because domains like CT might call
>>> fire() multiple times. It seems like what we really need to
>> do is have
>>> "Missing" capable actors call fire() and read "Missing" tokens until
>>> they get a non-missing token and then call postfire().
>>> I think this would mess with determinisim and scheduling?
>>>
>>> Also, there are complications with strictness and multiports.
>>> NonStrictTest says:
>>> // The Test actor could be extended so that Strictness
>> was a parameter,
>>> // but that would require some slightly tricky code to handle
>>> // multiports in a non-strict fashion. The problem is that if
>>> // we have more than one input channel, and we want to handle
>>> // non-strict inputs, then we need to keep track of number of
>>> // tokens we have seen on each channel. Also, this actor does
>>> // not read inputs until postfire(), which is too late
>> to produce
>>> // an output, as done by Test.
>>>
>>> _Christopher
>>>
>>>
>>>
>>> --------
>>>
>>>
>>> This solution sounds reasonable to me...
>>>
>>> Edward
>>>
>>> At 06:36 PM 12/12/2005 -0800, Christopher Brooks wrote:
>>> >Hi Edward,
>>> >
>>> >One of the Kepler bugs blocking the Kepler release is:
>>> >http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2240
>>> >which is reproduced below.
>>> >
>>> >Basically, Kepler needs to handle missing data because not all
>>> >ecological data sets are complete.
>>> >
>>> >The solution to this problem needs to work in PN and SDF.
>>> >
>>> >The Synchronous/Reactive (SR) domain has the notion of
>> absent values.
>>> >For example, domains/sr/lib/Absent.java has this class comment:
>>> >
>>> > This actor outputs absent values. That is, it
>> produces no tokens,
>>> > and it calls the sendClear() method of the output
>> port on each
>>> > firing.
>>> >
>>> >The sendClear() method is the IOPort.java sendClear() method.
>>> >IOPort also has sendClearInside() and broadcastClear().
>>> >
>>> >Do you have any comments?
>>> >
>>> >Below is the text from the bug report:
>>> >Below is the text from the bug report:
>>> >--start--
>>> >Currently ptolemy and kepler do not support passing null values
>>> >(sometimes called missing values) among ports, even
>> though this is
>>> >common in analytical systems like R and SAS. The
>> concept of null is
>>> >not even defined in the token types. This causes a
>> real problem for
>>> >data sources that are sparsely populated, as well as
>> data streams that
>>> >result from data integration operations that might produce null
>>> >values. We need to extend the underlying token
>> representation to
>>> >include a concept of null values and the actor
>> framework to protect
>>> >existing actors that might not know how to handle null values.
>>> >Because nulls cannot currently be represented in
>> Kepler, none of the
>>> >existing actors support them. An exception is thrown
>> whenever a
>>> >missing value is detected by the EML data source, and workflow
>>> >execution ceases.
>>> >
>>> >Bowers and Jones discussed one possible partial
>> solution to this on
>>> >IRC, which is summarized here.
>>> >
>>> >1) Override the Token base class to support null
>> values by providing
>>> > two new methods:
>>> >
>>> > Token.null() sets the token's value to null
>>> > boolean Token.isNull() returns true if the token
>> has been set to
>>> > null
>>> >
>>> >2) Override TypedIOPort to add a new method that takes
>> a boolean
>>> > "dropNull"
>>> >
>>> > a) by default this could be set to "true" then
>> the existing get()
>>> > method would be reimplemented to call the new one
>> with a default of
>>> > "true" because we can assume that no existing
>> actor written can
>>> > handle null values, since there is no way to pass
>> null values, and
>>> > so existing calls to get() will invisibly drop null values.
>>> >
>>> > b) if an actor can handle null values, then it
>> passes "false"
>>> > which indicates that the actor knows how to deal
>> with nulls and
>>> > wants to receive them
>>> >
>>> >so the changes to IOPort (or maybe TypedIOPOrt) would be:
>>> >
>>> >IOPOrt.get(channelindex) becomes get(channelIndex,
>> dropNullValues)
>>> >and
>>> >IOPort.get(channelIndex, int vectorLength) becomes
>> get(channelIndex,
>>> >vectorLength, dropNullValues)
>>> >
>>> >so, for example, the new implmenetation of
>> get(channelIndex) would
>>> >simply call get(channelIndex, true), so existing
>> actors would not even
>>> >notice the change.
>>> >--end--
>>> >
>>> >_Christopher
>>>
>>> ------------
>>> Edward A. Lee
>>> Professor, Chair of the EE Division, Associate Chair of EECS
>>> 231 Cory Hall, UC Berkeley, Berkeley, CA 94720
>>> phone: 510-642-0253 or 510-642-0455, fax: 510-642-2845
>>> eal at eecs.Berkeley.EDU, http://ptolemy.eecs.berkeley.edu/~eal
>>> --------
>>> _______________________________________________
>>> Kepler-dev mailing list
>>> Kepler-dev at ecoinformatics.org
>>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo
>> /kepler-de
>>> v
>> ------------
>> Edward A. Lee
>> Professor, Chair of the EE Division, Associate Chair of EECS
>> 231 Cory Hall, UC Berkeley, Berkeley, CA 94720
>> phone: 510-642-0253 or 510-642-0455, fax: 510-642-2845
>> eal at eecs.Berkeley.EDU, http://ptolemy.eecs.berkeley.edu/~eal
>>
>> _______________________________________________
>> Ptolemy maillist - Ptolemy at chess.eecs.berkeley.edu
>> http://chess.eecs.berkeley.edu/ptolemy/listinfo/ptolemy
>>
>>
>
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
More information about the Kepler-dev
mailing list