[kepler-dev] Thread safety with Ptolemy’s TypedIOPort

Colin Enticott Colin.Enticott at csse.monash.edu.au
Wed Oct 14 00:12:32 PDT 2009


Hi Edward,

My next problem (as predicted) is with nesting directors using
CompositeActors. It looks like the CompositeActor class acquires a
read lock on the workspace before it fires its director. If I have a
workflow that has a long running composite actor, I cannot acquire a
workspace write lock until that composite finishes. First of all I am
curious why the composite acquires this read lock? I thought it would
be the responsibility of the director to decide if it needs a lock on
the workspace? I noticed the PNDirector will release workspace locks
when it sleeps, but the SDF doesn’t (as it doesn’t sleep). As the
director is responsible for the internals of the composite, shouldn't
it just obtain an internal lock? Has this issue arisen before, or am I
the only one doing reconfiguration with threading?

And how would you suggest I solve this issue? I could make changes
directly to the workflow without a write lock. As I mentioned in an
earlier email, the objects I add cannot be used until my director
safely invokes them. Alternatively, when I copy an actor, I could
replace all the *CompositeActor with My*CompositeActor by modifying
the MoML? My version of the composites would release the lock before
invoking the fire method of its director. This option sounds safer,
unless some of the directors assume it already had a readlock.

Thoughts?

Regards,
Colin


2009/10/13 Colin Enticott <Colin.Enticott at csse.monash.edu.au>:
> Hi Edward,
>
> Fortunately, my problem is simplified and controllable. All tokens
> going into an actor (that can be copied) will go to a central place.
> So an actor being copied will not cause any problems with sending
> tokens as it does not change the token path. The problem is when I
> make a copy of the actor, some actors (like composites) will want
> their types resolved.
>
> I decided to make the director responsible to process change requests
> and resolving types. I can make sure I have the locks that will stop
> this from happening. It seems to of solved the immediate problem, next
> I'll have to look into nesting director issues. :-)
>
> Regards,
> Colin
>
> 2009/10/13 Edward A. Lee <eal at eecs.berkeley.edu>:
>>
>> Hi Colin,
>>
>> I'm not sure how to accomplish what you want, but I am quite sure
>> it will not be easy.  Basically, you will be dealing with low-level
>> thread programming, which in my opinion, is almost impossible to get
>> to work correctly.  See this paper for more on this:
>>
>> http://ptolemy.eecs.berkeley.edu/publications/papers/06/problemwithThreads/
>>
>> Edward
>>
>>
>>
>> Colin Enticott wrote:
>>>
>>> Hi Edward,
>>>
>>> Thankyou for replying. Yes, in my case I was treating stopFire() only
>>> as a recommendation as I wanted change requests to be process while
>>> the experiment was running.  I did this by releasing control back to
>>> the manager. The focus of the new director is to execute jobs on grid
>>> resources which tend to take a while. At runtime we increase the
>>> number of actors based on the number of grid resources available.
>>> Waiting for all actors to finish before I can increase the number of
>>> resources is not optimal. I was hoping that all methods that change
>>> the workspace would obtain a write lock, but as I can now see, only
>>> the manager’s thread should have access to the workflow after a
>>> stopFire().
>>>
>>> So what do you recommend as a solution? I could process change
>>> requests myself instead of releasing control back to the manager. This
>>> will allow changes to be processed that any third party actor might
>>> request. Or without using change requests by directly modify the
>>> workflow with using workspace write locks.
>>>
>>> The problem with both these solutions is when I copy an opaque
>>> composite, it itself will perform a type check internally and on the
>>> immediate connected external actors. I guess I will have to make this
>>> environment “safe to do so” by blocking sends at this time.
>>>
>>> Regards,
>>> Colin
>>>
>>>
>>> 2009/10/9 Edward A. Lee <eal at eecs.berkeley.edu>:
>>>>
>>>> Interesting question...
>>>>
>>>> I suspect the problem with your threading director is that it isn't
>>>> respecting the semantics of stopFire().  The contract is that the
>>>> Manager executes change request only while every actor in the model
>>>> is stopped. Specifically, it does so between iterations of the top-level
>>>> model. So when the top-level director returns from postfire(), the
>>>> Manager assumes it can execute change requests.
>>>>
>>>> The problem is that if you have a thread running independently of the
>>>> top-level director, how does the top-level director know when it is safe
>>>> to return from postfire()?
>>>>
>>>> The key is that stopFire() is called on every actor in the model
>>>> when a change request is registered. In PN, the PN threads respond
>>>> to stopFire() by suspending at the next opportunity (typically a
>>>> read or a write to a port).  Only after all threads have stopped
>>>> does the postfire() method of the director return, allowing the manager
>>>> to execute the change request (or maybe it's the fire() method,
>>>> I forget).
>>>>
>>>> Hope this helps...
>>>>
>>>> Edward
>>>>
>>>>
>>>> Colin Enticott wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Looking into this further (sorry for the delay, I’ve been busy), it
>>>>> looks
>>>>> like all actions on port types obtain a read lock on the workspace. When
>>>>> the
>>>>> manager resolves types, it also obtains a read lock, but makes changes
>>>>> to
>>>>> the port types. Shouldn't it obtain a write lock?
>>>>>
>>>>> Thanks,
>>>>> Colin
>>>>>
>>>>> Colin Enticott wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> First of all, I didn’t think that multiple threads would use the same
>>>>>> TypedIOPort object, but here’s the story:
>>>>>>
>>>>>>
>>>>>> I’ve been developing a new "threading director" for Ptolemy, the
>>>>>> Nimrod/k
>>>>>> TDA director[1], and one in every 100 executions of my rigorous
>>>>>> director
>>>>>> threading test workflows, I get an exception. This exception happens
>>>>>> when an
>>>>>> actor sends a token out a TypedIOPort:
>>>>>>
>>>>>> (Sorry. The exception is from kepler-1.0.0, so the line numbers differ
>>>>>> from the current version, but the functions in question are identical)
>>>>>>
>>>>>>
>>>>>> ptolemy.kernel.util.IllegalActionException: Run-time type checking
>>>>>> failed. Token 3 with type int is incompatible with port type: int
>>>>>>
>>>>>>  in .Composite.Ramp.output
>>>>>>
>>>>>>       at ptolemy.actor.TypedIOPort._checkType(TypedIOPort.java:750)
>>>>>>
>>>>>>       at ptolemy.actor.TypedIOPort.send(TypedIOPort.java:472)
>>>>>>
>>>>>>       at ptolemy.actor.lib.Ramp.fire(Ramp.java:138)
>>>>>>
>>>>>>       at
>>>>>>
>>>>>> org.monash.nimrod.NimrodDirector.NimrodProcessThread.run(NimrodProcessThread.java:347)
>>>>>>
>>>>>>
>>>>>> The confusing error message is “Token 3 with type int is incompatible
>>>>>> with port type: int”. Looking into this deeper I believe the error
>>>>>> message
>>>>>> is being generated after the port state has changed.
>>>>>>
>>>>>>
>>>>>> The function in question is:
>>>>>>
>>>>>>
>>>>>> protected void _checkType(Token token) throws IllegalActionException {
>>>>>>
>>>>>>   int compare = TypeLattice.compare(token.getType(), _resolvedType);
>>>>>>
>>>>>>
>>>>>>   if ((compare == CPO.HIGHER) || (compare == CPO.INCOMPARABLE)) {
>>>>>>
>>>>>>       throw new IllegalActionException(this,
>>>>>>
>>>>>>               "Run-time type checking failed. Token " + token
>>>>>>
>>>>>>                       + " with type " + token.getType()
>>>>>>
>>>>>>                       + " is incompatible with port type: "
>>>>>>
>>>>>>                       + getType().toString());
>>>>>>
>>>>>>   }
>>>>>>
>>>>>> }
>>>>>>
>>>>>>
>>>>>> I suspect the “compare” method is called before the type is set (or
>>>>>> changed) and the error message is generated after the type is set.
>>>>>> Looking
>>>>>> for another thread that could be accessing the TypedIOPort, I
>>>>>> discovered the
>>>>>> type checking functionality. Listening to the manager, it looks like
>>>>>> the
>>>>>> manager is "resolving types" in response to a “ChangeRequest”.
>>>>>>
>>>>>>
>>>>>> Two questions:
>>>>>>
>>>>>>
>>>>>> Should I use “change requests” in this way? None of the changes will
>>>>>> cause any problems with types and so I could directly modify the
>>>>>> workflow.
>>>>>> (I initially used change requests as the main thread holds a read lock
>>>>>> on
>>>>>> the workspace when the "manager" fires the “director”)
>>>>>>
>>>>>>
>>>>>> And, it looks like TypedIOPort is not thread safe. Does this need to be
>>>>>> fixed? Or should type resolving be completely blocked in a threaded
>>>>>> environment? The documentation suggests it should be left to the
>>>>>> director to
>>>>>> decide “when it is safe to perform change requests”, but I cannot see a
>>>>>> way
>>>>>> of preventing an actor from doing a type check when sending tokens in a
>>>>>> threaded environment.
>>>>>>
>>>>>>
>>>>>> Also, I don't think it is just with my director. It looks like this
>>>>>> issue
>>>>>> will arise in the PN environment, if the workflow makes changes to the
>>>>>> workflow.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Colin
>>>>>>
>>>>>>
>>>>>> [1] Abramson D, Enticott C, Altintas I., "Nimrod/K: Towards Massively
>>>>>> Parallel Dynamic Grid Workflows", IEEE SuperComputing 2008, Austin,
>>>>>> Texas
>>>>>> November 2008
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Colin Enticott, Research Scientist, Ph: +61 03 9903 2215
>>>>>> Room H7.26, Level 7, Building H, Monash University Caulfield 3145,
>>>>>> Australia
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>> _______________________________________________
>>>>>> Kepler-dev mailing list
>>>>>> Kepler-dev at kepler-project.org
>>>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>>>>
>>>>>
>>>>> --
>>>>> Colin Enticott, Research Scientist, Ph: +61 03 9903 2215
>>>>> Room H7.26, Level 7, Building H, Monash University Caulfield 3145,
>>>>> Australia
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Kepler-dev mailing list
>>>>> Kepler-dev at kepler-project.org
>>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>
>>>
>>>
>>
>
>
>
> --
> Colin
>



-- 
Colin


More information about the Kepler-dev mailing list