[kepler-dev] Thread safety with Ptolemy’s TypedIOPort

Colin Enticott Colin.Enticott at csse.monash.edu.au
Tue Oct 13 05:14:57 PDT 2009


Hi Edward,

Fortunately, my problem is simplified and controllable. All tokens
going into an actor (that can be copied) will go to a central place.
So an actor being copied will not cause any problems with sending
tokens as it does not change the token path. The problem is when I
make a copy of the actor, some actors (like composites) will want
their types resolved.

I decided to make the director responsible to process change requests
and resolving types. I can make sure I have the locks that will stop
this from happening. It seems to of solved the immediate problem, next
I'll have to look into nesting director issues. :-)

Regards,
Colin

2009/10/13 Edward A. Lee <eal at eecs.berkeley.edu>:
>
> Hi Colin,
>
> I'm not sure how to accomplish what you want, but I am quite sure
> it will not be easy.  Basically, you will be dealing with low-level
> thread programming, which in my opinion, is almost impossible to get
> to work correctly.  See this paper for more on this:
>
> http://ptolemy.eecs.berkeley.edu/publications/papers/06/problemwithThreads/
>
> Edward
>
>
>
> Colin Enticott wrote:
>>
>> Hi Edward,
>>
>> Thankyou for replying. Yes, in my case I was treating stopFire() only
>> as a recommendation as I wanted change requests to be process while
>> the experiment was running.  I did this by releasing control back to
>> the manager. The focus of the new director is to execute jobs on grid
>> resources which tend to take a while. At runtime we increase the
>> number of actors based on the number of grid resources available.
>> Waiting for all actors to finish before I can increase the number of
>> resources is not optimal. I was hoping that all methods that change
>> the workspace would obtain a write lock, but as I can now see, only
>> the manager’s thread should have access to the workflow after a
>> stopFire().
>>
>> So what do you recommend as a solution? I could process change
>> requests myself instead of releasing control back to the manager. This
>> will allow changes to be processed that any third party actor might
>> request. Or without using change requests by directly modify the
>> workflow with using workspace write locks.
>>
>> The problem with both these solutions is when I copy an opaque
>> composite, it itself will perform a type check internally and on the
>> immediate connected external actors. I guess I will have to make this
>> environment “safe to do so” by blocking sends at this time.
>>
>> Regards,
>> Colin
>>
>>
>> 2009/10/9 Edward A. Lee <eal at eecs.berkeley.edu>:
>>>
>>> Interesting question...
>>>
>>> I suspect the problem with your threading director is that it isn't
>>> respecting the semantics of stopFire().  The contract is that the
>>> Manager executes change request only while every actor in the model
>>> is stopped. Specifically, it does so between iterations of the top-level
>>> model. So when the top-level director returns from postfire(), the
>>> Manager assumes it can execute change requests.
>>>
>>> The problem is that if you have a thread running independently of the
>>> top-level director, how does the top-level director know when it is safe
>>> to return from postfire()?
>>>
>>> The key is that stopFire() is called on every actor in the model
>>> when a change request is registered. In PN, the PN threads respond
>>> to stopFire() by suspending at the next opportunity (typically a
>>> read or a write to a port).  Only after all threads have stopped
>>> does the postfire() method of the director return, allowing the manager
>>> to execute the change request (or maybe it's the fire() method,
>>> I forget).
>>>
>>> Hope this helps...
>>>
>>> Edward
>>>
>>>
>>> Colin Enticott wrote:
>>>>
>>>> Hi,
>>>>
>>>> Looking into this further (sorry for the delay, I’ve been busy), it
>>>> looks
>>>> like all actions on port types obtain a read lock on the workspace. When
>>>> the
>>>> manager resolves types, it also obtains a read lock, but makes changes
>>>> to
>>>> the port types. Shouldn't it obtain a write lock?
>>>>
>>>> Thanks,
>>>> Colin
>>>>
>>>> Colin Enticott wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> First of all, I didn’t think that multiple threads would use the same
>>>>> TypedIOPort object, but here’s the story:
>>>>>
>>>>>
>>>>> I’ve been developing a new "threading director" for Ptolemy, the
>>>>> Nimrod/k
>>>>> TDA director[1], and one in every 100 executions of my rigorous
>>>>> director
>>>>> threading test workflows, I get an exception. This exception happens
>>>>> when an
>>>>> actor sends a token out a TypedIOPort:
>>>>>
>>>>> (Sorry. The exception is from kepler-1.0.0, so the line numbers differ
>>>>> from the current version, but the functions in question are identical)
>>>>>
>>>>>
>>>>> ptolemy.kernel.util.IllegalActionException: Run-time type checking
>>>>> failed. Token 3 with type int is incompatible with port type: int
>>>>>
>>>>>  in .Composite.Ramp.output
>>>>>
>>>>>       at ptolemy.actor.TypedIOPort._checkType(TypedIOPort.java:750)
>>>>>
>>>>>       at ptolemy.actor.TypedIOPort.send(TypedIOPort.java:472)
>>>>>
>>>>>       at ptolemy.actor.lib.Ramp.fire(Ramp.java:138)
>>>>>
>>>>>       at
>>>>>
>>>>> org.monash.nimrod.NimrodDirector.NimrodProcessThread.run(NimrodProcessThread.java:347)
>>>>>
>>>>>
>>>>> The confusing error message is “Token 3 with type int is incompatible
>>>>> with port type: int”. Looking into this deeper I believe the error
>>>>> message
>>>>> is being generated after the port state has changed.
>>>>>
>>>>>
>>>>> The function in question is:
>>>>>
>>>>>
>>>>> protected void _checkType(Token token) throws IllegalActionException {
>>>>>
>>>>>   int compare = TypeLattice.compare(token.getType(), _resolvedType);
>>>>>
>>>>>
>>>>>   if ((compare == CPO.HIGHER) || (compare == CPO.INCOMPARABLE)) {
>>>>>
>>>>>       throw new IllegalActionException(this,
>>>>>
>>>>>               "Run-time type checking failed. Token " + token
>>>>>
>>>>>                       + " with type " + token.getType()
>>>>>
>>>>>                       + " is incompatible with port type: "
>>>>>
>>>>>                       + getType().toString());
>>>>>
>>>>>   }
>>>>>
>>>>> }
>>>>>
>>>>>
>>>>> I suspect the “compare” method is called before the type is set (or
>>>>> changed) and the error message is generated after the type is set.
>>>>> Looking
>>>>> for another thread that could be accessing the TypedIOPort, I
>>>>> discovered the
>>>>> type checking functionality. Listening to the manager, it looks like
>>>>> the
>>>>> manager is "resolving types" in response to a “ChangeRequest”.
>>>>>
>>>>>
>>>>> Two questions:
>>>>>
>>>>>
>>>>> Should I use “change requests” in this way? None of the changes will
>>>>> cause any problems with types and so I could directly modify the
>>>>> workflow.
>>>>> (I initially used change requests as the main thread holds a read lock
>>>>> on
>>>>> the workspace when the "manager" fires the “director”)
>>>>>
>>>>>
>>>>> And, it looks like TypedIOPort is not thread safe. Does this need to be
>>>>> fixed? Or should type resolving be completely blocked in a threaded
>>>>> environment? The documentation suggests it should be left to the
>>>>> director to
>>>>> decide “when it is safe to perform change requests”, but I cannot see a
>>>>> way
>>>>> of preventing an actor from doing a type check when sending tokens in a
>>>>> threaded environment.
>>>>>
>>>>>
>>>>> Also, I don't think it is just with my director. It looks like this
>>>>> issue
>>>>> will arise in the PN environment, if the workflow makes changes to the
>>>>> workflow.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Colin
>>>>>
>>>>>
>>>>> [1] Abramson D, Enticott C, Altintas I., "Nimrod/K: Towards Massively
>>>>> Parallel Dynamic Grid Workflows", IEEE SuperComputing 2008, Austin,
>>>>> Texas
>>>>> November 2008
>>>>>
>>>>>
>>>>> --
>>>>> Colin Enticott, Research Scientist, Ph: +61 03 9903 2215
>>>>> Room H7.26, Level 7, Building H, Monash University Caulfield 3145,
>>>>> Australia
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Kepler-dev mailing list
>>>>> Kepler-dev at kepler-project.org
>>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>>>
>>>>
>>>> --
>>>> Colin Enticott, Research Scientist, Ph: +61 03 9903 2215
>>>> Room H7.26, Level 7, Building H, Monash University Caulfield 3145,
>>>> Australia
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Kepler-dev mailing list
>>>> Kepler-dev at kepler-project.org
>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>
>>
>>
>



-- 
Colin


More information about the Kepler-dev mailing list