[kepler-users] Combined Sequential Parallel Workflow

Fri Nov 9 06:04:52 PST 2007

Hi,

OK.  I have managed to get Kepler to build, have modified some code and the
ontology tree, have done a couple dozen workflows, and have written a few
actors.  It’s probably time to get more serious.  I see a lot of
already-developed capabilities in Kepler, but I am having trouble getting
information on how they are used.  Hence let me ask in this forum.

I would like to prototype a workflow for an experiment.  This type of
experiment consists of a number of "shots".  Each shot can be considered to
be a process that generates a file, and these would be in a loop on the
local machine.   [It might correspond to different rotations of a sample in
reality.]  We could prototype that as a Composite with a file output.  I
would then like to process these files in parallel on a cluster so that the
processing is going on as the next shots are being taken.  We could assume
the processing output would be in a number of files (one for starters) on
the cluster.  We could probably prototype that as a Composite with a file
output, also.  When all of the processing is done, I would like to do more
analysis involving all of the generated files, either on the cluster or
locally.  Again, the prototype is probably a Composite with many inputs and
some outputs.  This cannot happen until all the previous processing is done.

Eventually we would need to deal with a failure somewhere and allow for user
intervention, but for now I would like to develop a simple Kepler workflow
that does the above.

The cluster I have available is on the local internet.  So far, I can run
jobs on it using the SSH to Exec actor.  It is running an implementation of
MPI2, and I would assume using that to allocate nodes is the best way to go.

I can do a loop of simple processes so far.  I am having some problems with
flow control (i.e. determining when it is all done and doing the final
process) but have done that to some degree.   I have mostly used SDF and
probably don't understand PN well.  I don’t know much of anything about how
to handle the parallelization.  I see a DistributedCompositeActor and a
number of Jobs actors, but have no idea how to use them.  Some guidance,
advice, places to get more information, and most especially examples, would
be great.

Thanks,

        -Ken