[kepler-dev] Threading in kepler

Wed Sep 13 07:52:02 PDT 2006

Hi Tristan,

First, to clarify: you have one processing pipeline (as software) built 
from several actors, and you process several data packets (i.e. data 
processing pipeline as process). Right?
So you have a workflow like this?

  Source --- A --- B --- C

And your data packets can be processed independently from each other?

Solution 1
----------

Ptolemy has a Distributor actor (in Kepler, you can get it with 
Tools/Instantiate Component and type ptolemy.actor.lib.Distributor) to 
send incoming tokens into different outgoing channels in a round-robin 
manner. Thus, you can build several copies of the pipeline, and feed them 
by inserting a Distributor between Source and A.

                           --- A --- B --- C
   Source --- Distributor <
                           --- A --- B --- C

If you want to merge the token stream later, you have to options:
    1. Ordered merge with ptolemy.actor.lib.Commutator, the exact match of 
Distributor. This actor also collects tokens in a round-robin way.
Thus you must not drop tokens anywhere (I suppose, this is the case for 
you). You get the data stream back in the same order as you had splitted 
it at the beginning.
    2. ptolemy.domains.pn.kernel.NondeterministicMerge can merge streams 
together, in the order as they come.

This is a reliable solution, though it looks a bit ugly.

Solution 2
----------
There is a higher-order actor in Ptolemy, 
ptolemy.actor.lib.hoc.MultiInstanceComposite with which you can 
parallelize your pipeline automatically. You can put the pipeline A-B-C 
into this composite and set its parameter 'nInstance' to define how many 
threads you want. You can use its 'instance' parameter in your workflow if 
you need to know which thread is actually processing a given data.

   Source --- Distributor --- MultiInstanceComposite --- Commutator

   MultiInstanceComposite:  (multiport)--- A --- B --- C ---(multiport)

Note, that you must put a director into the composite. I did not try PN in 
it, but PN under PN never worked for me. So you need to put SDF/DDF, so 
the pipeline basically becomes sequential (but you have nInstance 
independent copies of it), while in solution 1, the pipelines are still 
'pipeline-parallel' under PN.

This solution also works nicely. However, I always get into problems if 
I try to stop a running workflow. You should play with it and test it.

Best regards
Norbert

      Norbert Podhorszki
    ------------------------------------
      University of California, Davis
      Department of Computer Science
      1 Shields Ave, 2236 Kemper Hall
      Davis, CA 95616
      (530) 754-8188
      pnorbert at cs.ucdavis.edu
      ----------------------------------

On Wed, 13 Sep 2006, Tristan King wrote:

> hi guys,
>
> basically i'm looking for a bit of an explaination on how threading
> works in kepler, and any ideas to what i might be able to do to solve
> the below problem.
>
> I'm running into a problem where my kepler workflow isn't able to
> keep up with the data being created from processing (thus my data
> queue is getting full and dying).
>
> My workflow consists of a source which polls a data queue for new
> data and when new data arrives, sends that data (an xml parcel) to
> the rest of the workflow.
>
> Each data processing pipeline in the workflow is independent on the
> progress of the other pipelines, thus new threads could be started to
> preform the processing task.
>
> However, it seems to me that each Actor in the workflow only runs a
> single thread for each actor. (i'm using the PN Director).
>
> Is there a way i can spawn a new thread to preform a particular
> pipeline, while leaing my data source in a single thread producing data?
>
> Thanks for any help you can give me.
> --Tristan
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
>