[kepler-dev] Thread safety with Ptolemy’s TypedIOPort

Edward A. Lee eal at eecs.berkeley.edu
Tue Oct 13 00:24:04 PDT 2009


Hi Colin,

I'm not sure how to accomplish what you want, but I am quite sure
it will not be easy.  Basically, you will be dealing with low-level
thread programming, which in my opinion, is almost impossible to get
to work correctly.  See this paper for more on this:

http://ptolemy.eecs.berkeley.edu/publications/papers/06/problemwithThreads/

Edward



Colin Enticott wrote:
> Hi Edward,
> 
> Thankyou for replying. Yes, in my case I was treating stopFire() only
> as a recommendation as I wanted change requests to be process while
> the experiment was running.  I did this by releasing control back to
> the manager. The focus of the new director is to execute jobs on grid
> resources which tend to take a while. At runtime we increase the
> number of actors based on the number of grid resources available.
> Waiting for all actors to finish before I can increase the number of
> resources is not optimal. I was hoping that all methods that change
> the workspace would obtain a write lock, but as I can now see, only
> the manager’s thread should have access to the workflow after a
> stopFire().
> 
> So what do you recommend as a solution? I could process change
> requests myself instead of releasing control back to the manager. This
> will allow changes to be processed that any third party actor might
> request. Or without using change requests by directly modify the
> workflow with using workspace write locks.
> 
> The problem with both these solutions is when I copy an opaque
> composite, it itself will perform a type check internally and on the
> immediate connected external actors. I guess I will have to make this
> environment “safe to do so” by blocking sends at this time.
> 
> Regards,
> Colin
> 
> 
> 2009/10/9 Edward A. Lee <eal at eecs.berkeley.edu>:
>> Interesting question...
>>
>> I suspect the problem with your threading director is that it isn't
>> respecting the semantics of stopFire().  The contract is that the
>> Manager executes change request only while every actor in the model
>> is stopped. Specifically, it does so between iterations of the top-level
>> model. So when the top-level director returns from postfire(), the
>> Manager assumes it can execute change requests.
>>
>> The problem is that if you have a thread running independently of the
>> top-level director, how does the top-level director know when it is safe
>> to return from postfire()?
>>
>> The key is that stopFire() is called on every actor in the model
>> when a change request is registered. In PN, the PN threads respond
>> to stopFire() by suspending at the next opportunity (typically a
>> read or a write to a port).  Only after all threads have stopped
>> does the postfire() method of the director return, allowing the manager
>> to execute the change request (or maybe it's the fire() method,
>> I forget).
>>
>> Hope this helps...
>>
>> Edward
>>
>>
>> Colin Enticott wrote:
>>> Hi,
>>>
>>> Looking into this further (sorry for the delay, I’ve been busy), it looks
>>> like all actions on port types obtain a read lock on the workspace. When the
>>> manager resolves types, it also obtains a read lock, but makes changes to
>>> the port types. Shouldn't it obtain a write lock?
>>>
>>> Thanks,
>>> Colin
>>>
>>> Colin Enticott wrote:
>>>> Hi,
>>>>
>>>>
>>>> First of all, I didn’t think that multiple threads would use the same
>>>> TypedIOPort object, but here’s the story:
>>>>
>>>>
>>>> I’ve been developing a new "threading director" for Ptolemy, the Nimrod/k
>>>> TDA director[1], and one in every 100 executions of my rigorous director
>>>> threading test workflows, I get an exception. This exception happens when an
>>>> actor sends a token out a TypedIOPort:
>>>>
>>>> (Sorry. The exception is from kepler-1.0.0, so the line numbers differ
>>>> from the current version, but the functions in question are identical)
>>>>
>>>>
>>>> ptolemy.kernel.util.IllegalActionException: Run-time type checking
>>>> failed. Token 3 with type int is incompatible with port type: int
>>>>
>>>>  in .Composite.Ramp.output
>>>>
>>>>        at ptolemy.actor.TypedIOPort._checkType(TypedIOPort.java:750)
>>>>
>>>>        at ptolemy.actor.TypedIOPort.send(TypedIOPort.java:472)
>>>>
>>>>        at ptolemy.actor.lib.Ramp.fire(Ramp.java:138)
>>>>
>>>>        at
>>>> org.monash.nimrod.NimrodDirector.NimrodProcessThread.run(NimrodProcessThread.java:347)
>>>>
>>>>
>>>> The confusing error message is “Token 3 with type int is incompatible
>>>> with port type: int”. Looking into this deeper I believe the error message
>>>> is being generated after the port state has changed.
>>>>
>>>>
>>>> The function in question is:
>>>>
>>>>
>>>> protected void _checkType(Token token) throws IllegalActionException {
>>>>
>>>>    int compare = TypeLattice.compare(token.getType(), _resolvedType);
>>>>
>>>>
>>>>    if ((compare == CPO.HIGHER) || (compare == CPO.INCOMPARABLE)) {
>>>>
>>>>        throw new IllegalActionException(this,
>>>>
>>>>                "Run-time type checking failed. Token " + token
>>>>
>>>>                        + " with type " + token.getType()
>>>>
>>>>                        + " is incompatible with port type: "
>>>>
>>>>                        + getType().toString());
>>>>
>>>>    }
>>>>
>>>> }
>>>>
>>>>
>>>> I suspect the “compare” method is called before the type is set (or
>>>> changed) and the error message is generated after the type is set. Looking
>>>> for another thread that could be accessing the TypedIOPort, I discovered the
>>>> type checking functionality. Listening to the manager, it looks like the
>>>> manager is "resolving types" in response to a “ChangeRequest”.
>>>>
>>>>
>>>> Two questions:
>>>>
>>>>
>>>> Should I use “change requests” in this way? None of the changes will
>>>> cause any problems with types and so I could directly modify the workflow.
>>>> (I initially used change requests as the main thread holds a read lock on
>>>> the workspace when the "manager" fires the “director”)
>>>>
>>>>
>>>> And, it looks like TypedIOPort is not thread safe. Does this need to be
>>>> fixed? Or should type resolving be completely blocked in a threaded
>>>> environment? The documentation suggests it should be left to the director to
>>>> decide “when it is safe to perform change requests”, but I cannot see a way
>>>> of preventing an actor from doing a type check when sending tokens in a
>>>> threaded environment.
>>>>
>>>>
>>>> Also, I don't think it is just with my director. It looks like this issue
>>>> will arise in the PN environment, if the workflow makes changes to the
>>>> workflow.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Colin
>>>>
>>>>
>>>> [1] Abramson D, Enticott C, Altintas I., "Nimrod/K: Towards Massively
>>>> Parallel Dynamic Grid Workflows", IEEE SuperComputing 2008, Austin, Texas
>>>> November 2008
>>>>
>>>>
>>>> --
>>>> Colin Enticott, Research Scientist, Ph: +61 03 9903 2215
>>>> Room H7.26, Level 7, Building H, Monash University Caulfield 3145,
>>>> Australia
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Kepler-dev mailing list
>>>> Kepler-dev at kepler-project.org
>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>>
>>>
>>> --
>>> Colin Enticott, Research Scientist, Ph: +61 03 9903 2215
>>> Room H7.26, Level 7, Building H, Monash University Caulfield 3145,
>>> Australia
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Kepler-dev mailing list
>>> Kepler-dev at kepler-project.org
>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
> 
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: eal.vcf
Type: text/x-vcard
Size: 351 bytes
Desc: not available
URL: <http://mercury.nceas.ucsb.edu/kepler/pipermail/kepler-dev/attachments/20091013/1ad7e1f2/attachment-0001.vcf>


More information about the Kepler-dev mailing list