[kepler-users] token matching mixup due to unmatched boolean switches
Tirath Ramdas
tramdas at oci.uzh.ch
Thu Oct 15 05:15:36 PDT 2009
Hi all,
I have a situation which I have been able to address in a "kludgy"
way, but I wonder if there may be a better way, so I am presenting the
problem and my proposed solution here for your critique and counter-
suggestions. I am sure it's a situation that many others have
encountered. This is with the PN director: I'm not sure how, if at
all, the DDF director might help. Anyway, here goes...
THE PROBLEM:
Let's say I have a single source of data that has to go through 3
actors: A, B, and C. They are laid out in a fork/join configuration.
The DataSource goes to A and B, and the outputs from those two actors
go to C. In dot notation, this is what my business logic graph looks
like:
DataSource -> TaskA;
DataSource -> TaskB;
TaskA -> TaskC;
TaskB -> TaskC;
This is a trivial workflow. But now, I want to make my TaskA actor
more sophisticated and detect failures in the computation (I mean
business logic failures - not the kind of thing that can be fixed by
re-execution). I will include in TaskA a boolean switch to detect
successful jobs and push tokens to the output port only when they are
good. But the important thing is that one bad piece of data should not
stop the workflow: the rest of the data gets processed as usual.
The problem is, TaskB never knows when one of TaskA's jobs fails, and
it dutifully pushes all of it's tokens through. As a result, TaskC
gets mixed up tokens.
A contrived but indicative example: let's say my DataSource is a Ramp
producing 1 to 9, TaskB is simply an expression that pushes all tokens
through, and TaskA has an "error condition" where a token value "5" is
considered an error and routed to the error port instead of the normal
output port. Task C just merges both it's inputs into a 2-element
array. A sample Kepler workflow is here:http://pastebin.com/m2faecfb3
What happens is this:
{0, 0}
{1, 1}
{2, 2}
{3, 3}
{4, 4}
{6, 5} <- problem starts here
{7, 6} <-
{8, 7} <-
{9, 8} <-
What I want to see is this:
{0, 0}
{1, 1}
{2, 2}
{3, 3}
{4, 4}
{6, 6} <- desired result
{7, 7} <-
{8, 8} <-
{9, 9} <-
MY KLUDGE:
I hacked around this simply by passing all of TaskB's results through
"passthrough" ports in TaskA, so that TaskA's condition checking can
be effectively applied to TaskB's results as well. This is ugly and
seriously detracts from the business process flow that I want to
express.
I don't want to make TaskC responsible for doing TaskB's error
checking: what if my TaskC is a very generic actor? In practice
TaskB's error-checking is highly application-specific (grep-ing the
output of a computational chemistry legacy app).
However, I did consider a more generic approach to exception handling
that would in fact place the burden on TaskC: I considered the
possibility of mandating that every actor must output a record which
contains the output data and also a "predication" [1] field. The
predication field indicates when the data is valid. Any actor
receiving tokens only proceeds with the computation if the predication
fields on all it's inputs are set to valid; if even one is invalid,
the whole lot gets routed to an error bin, but the next lot gets
processed as though nothing went wrong. Also I vaguely recall reading
that some other workflow engine does something like this. Anyway, I
did not proceed with this yet because it sounds like a non-trivial
amount of work to modify all the actors I use to adhere to this
behavior.
Any other ideas?
regards,
-tirath
[1] Borrowing the word "predication" from Intel's Itanium branch
misprediction handling, not sure who they borrowed the term from.
More information about the Kepler-users
mailing list