[kepler-dev] Thread safety with Ptolemy’s TypedIOPort

Colin Enticott Colin.Enticott at csse.monash.edu.au
Mon Oct 12 20:42:26 PDT 2009


Hi Edward,

Thankyou for replying. Yes, in my case I was treating stopFire() only
as a recommendation as I wanted change requests to be process while
the experiment was running.  I did this by releasing control back to
the manager. The focus of the new director is to execute jobs on grid
resources which tend to take a while. At runtime we increase the
number of actors based on the number of grid resources available.
Waiting for all actors to finish before I can increase the number of
resources is not optimal. I was hoping that all methods that change
the workspace would obtain a write lock, but as I can now see, only
the manager’s thread should have access to the workflow after a
stopFire().

So what do you recommend as a solution? I could process change
requests myself instead of releasing control back to the manager. This
will allow changes to be processed that any third party actor might
request. Or without using change requests by directly modify the
workflow with using workspace write locks.

The problem with both these solutions is when I copy an opaque
composite, it itself will perform a type check internally and on the
immediate connected external actors. I guess I will have to make this
environment “safe to do so” by blocking sends at this time.

Regards,
Colin


2009/10/9 Edward A. Lee <eal at eecs.berkeley.edu>:
>
> Interesting question...
>
> I suspect the problem with your threading director is that it isn't
> respecting the semantics of stopFire().  The contract is that the
> Manager executes change request only while every actor in the model
> is stopped. Specifically, it does so between iterations of the top-level
> model. So when the top-level director returns from postfire(), the
> Manager assumes it can execute change requests.
>
> The problem is that if you have a thread running independently of the
> top-level director, how does the top-level director know when it is safe
> to return from postfire()?
>
> The key is that stopFire() is called on every actor in the model
> when a change request is registered. In PN, the PN threads respond
> to stopFire() by suspending at the next opportunity (typically a
> read or a write to a port).  Only after all threads have stopped
> does the postfire() method of the director return, allowing the manager
> to execute the change request (or maybe it's the fire() method,
> I forget).
>
> Hope this helps...
>
> Edward
>
>
> Colin Enticott wrote:
>>
>> Hi,
>>
>> Looking into this further (sorry for the delay, I’ve been busy), it looks
>> like all actions on port types obtain a read lock on the workspace. When the
>> manager resolves types, it also obtains a read lock, but makes changes to
>> the port types. Shouldn't it obtain a write lock?
>>
>> Thanks,
>> Colin
>>
>> Colin Enticott wrote:
>>>
>>> Hi,
>>>
>>>
>>> First of all, I didn’t think that multiple threads would use the same
>>> TypedIOPort object, but here’s the story:
>>>
>>>
>>> I’ve been developing a new "threading director" for Ptolemy, the Nimrod/k
>>> TDA director[1], and one in every 100 executions of my rigorous director
>>> threading test workflows, I get an exception. This exception happens when an
>>> actor sends a token out a TypedIOPort:
>>>
>>> (Sorry. The exception is from kepler-1.0.0, so the line numbers differ
>>> from the current version, but the functions in question are identical)
>>>
>>>
>>> ptolemy.kernel.util.IllegalActionException: Run-time type checking
>>> failed. Token 3 with type int is incompatible with port type: int
>>>
>>>  in .Composite.Ramp.output
>>>
>>>        at ptolemy.actor.TypedIOPort._checkType(TypedIOPort.java:750)
>>>
>>>        at ptolemy.actor.TypedIOPort.send(TypedIOPort.java:472)
>>>
>>>        at ptolemy.actor.lib.Ramp.fire(Ramp.java:138)
>>>
>>>        at
>>> org.monash.nimrod.NimrodDirector.NimrodProcessThread.run(NimrodProcessThread.java:347)
>>>
>>>
>>> The confusing error message is “Token 3 with type int is incompatible
>>> with port type: int”. Looking into this deeper I believe the error message
>>> is being generated after the port state has changed.
>>>
>>>
>>> The function in question is:
>>>
>>>
>>> protected void _checkType(Token token) throws IllegalActionException {
>>>
>>>    int compare = TypeLattice.compare(token.getType(), _resolvedType);
>>>
>>>
>>>    if ((compare == CPO.HIGHER) || (compare == CPO.INCOMPARABLE)) {
>>>
>>>        throw new IllegalActionException(this,
>>>
>>>                "Run-time type checking failed. Token " + token
>>>
>>>                        + " with type " + token.getType()
>>>
>>>                        + " is incompatible with port type: "
>>>
>>>                        + getType().toString());
>>>
>>>    }
>>>
>>> }
>>>
>>>
>>> I suspect the “compare” method is called before the type is set (or
>>> changed) and the error message is generated after the type is set. Looking
>>> for another thread that could be accessing the TypedIOPort, I discovered the
>>> type checking functionality. Listening to the manager, it looks like the
>>> manager is "resolving types" in response to a “ChangeRequest”.
>>>
>>>
>>> Two questions:
>>>
>>>
>>> Should I use “change requests” in this way? None of the changes will
>>> cause any problems with types and so I could directly modify the workflow.
>>> (I initially used change requests as the main thread holds a read lock on
>>> the workspace when the "manager" fires the “director”)
>>>
>>>
>>> And, it looks like TypedIOPort is not thread safe. Does this need to be
>>> fixed? Or should type resolving be completely blocked in a threaded
>>> environment? The documentation suggests it should be left to the director to
>>> decide “when it is safe to perform change requests”, but I cannot see a way
>>> of preventing an actor from doing a type check when sending tokens in a
>>> threaded environment.
>>>
>>>
>>> Also, I don't think it is just with my director. It looks like this issue
>>> will arise in the PN environment, if the workflow makes changes to the
>>> workflow.
>>>
>>>
>>> Thanks,
>>>
>>> Colin
>>>
>>>
>>> [1] Abramson D, Enticott C, Altintas I., "Nimrod/K: Towards Massively
>>> Parallel Dynamic Grid Workflows", IEEE SuperComputing 2008, Austin, Texas
>>> November 2008
>>>
>>>
>>> --
>>> Colin Enticott, Research Scientist, Ph: +61 03 9903 2215
>>> Room H7.26, Level 7, Building H, Monash University Caulfield 3145,
>>> Australia
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Kepler-dev mailing list
>>> Kepler-dev at kepler-project.org
>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>
>>
>>
>> --
>> Colin Enticott, Research Scientist, Ph: +61 03 9903 2215
>> Room H7.26, Level 7, Building H, Monash University Caulfield 3145,
>> Australia
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Kepler-dev mailing list
>> Kepler-dev at kepler-project.org
>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>



-- 
Colin


More information about the Kepler-dev mailing list