[kepler-users] token matching mixup due to unmatched boolean switches

Tirath Ramdas tramdas at oci.uzh.ch
Thu Oct 15 05:15:36 PDT 2009


Hi all,

I have a situation which I have been able to address in a "kludgy"  
way, but I wonder if there may be a better way, so I am presenting the  
problem and my proposed solution here for your critique and counter- 
suggestions. I am sure it's a situation that many others have  
encountered. This is with the PN director: I'm not sure how, if at  
all, the DDF director might help. Anyway, here goes...

THE PROBLEM:

Let's say I have a single source of data that has to go through 3  
actors: A, B, and C. They are laid out in a fork/join configuration.  
The DataSource goes to A and B, and the outputs from those two actors  
go to C. In dot notation, this is what my business logic graph looks  
like:

DataSource -> TaskA;
DataSource -> TaskB;
TaskA -> TaskC;
TaskB -> TaskC;

This is a trivial workflow. But now, I want to make my TaskA actor  
more sophisticated and detect failures in the computation (I mean  
business logic failures - not the kind of thing that can be fixed by  
re-execution). I will include in TaskA a boolean switch to detect  
successful jobs and push tokens to the output port only when they are  
good. But the important thing is that one bad piece of data should not  
stop the workflow: the rest of the data gets processed as usual.

The problem is, TaskB never knows when one of TaskA's jobs fails, and  
it dutifully pushes all of it's tokens through. As a result, TaskC  
gets mixed up tokens.

A contrived but indicative example: let's say my DataSource is a Ramp  
producing 1 to 9, TaskB is simply an expression that pushes all tokens  
through, and TaskA has an "error condition" where a token value "5" is  
considered an error and routed to the error port instead of the normal  
output port. Task C just merges both it's inputs into a 2-element  
array. A sample Kepler workflow is here:http://pastebin.com/m2faecfb3

What happens is this:

{0, 0}
{1, 1}
{2, 2}
{3, 3}
{4, 4}
{6, 5} <- problem starts here
{7, 6} <-
{8, 7} <-
{9, 8} <-

What I want to see is this:

{0, 0}
{1, 1}
{2, 2}
{3, 3}
{4, 4}
{6, 6} <- desired result
{7, 7} <-
{8, 8} <-
{9, 9} <-

MY KLUDGE:

I hacked around this simply by passing all of TaskB's results through  
"passthrough" ports in TaskA, so that TaskA's condition checking can  
be effectively applied to TaskB's results as well. This is ugly and  
seriously detracts from the business process flow that I want to  
express.

I don't want to make TaskC responsible for doing TaskB's error  
checking: what if my TaskC is a very generic actor? In practice  
TaskB's error-checking is highly application-specific (grep-ing the  
output of a computational chemistry legacy app).

However, I did consider a more generic approach to exception handling  
that would in fact place the burden on TaskC: I considered the  
possibility of mandating that every actor must output a record which  
contains the output data and also a "predication" [1] field. The  
predication field indicates when the data is valid. Any actor  
receiving tokens only proceeds with the computation if the predication  
fields on all it's inputs are set to valid; if even one is invalid,  
the whole lot gets routed to an error bin, but the next lot gets  
processed as though nothing went wrong. Also I vaguely recall reading  
that some other workflow engine does something like this. Anyway, I  
did not proceed with this yet because it sounds like a non-trivial  
amount of work to modify all the actors I use to adhere to this  
behavior.

Any other ideas?

regards,
-tirath

[1] Borrowing the word "predication" from Intel's Itanium branch  
misprediction handling, not sure who they borrowed the term from.




More information about the Kepler-users mailing list