[kepler-dev] Thread safety with Ptolemy’s TypedIOPort

Tue Oct 20 00:31:51 PDT 2009

Hi Edward,

Everything is working well in the new workspace. This last (hopefully)
issue is with effigies. Any actor that was placed into the new
workspace is unable to get an Effigy. That is,
Configuration.findEffigy(toplevel()); returns null.

I guess it is not too important as many copies of the actor can be
made and showing all windows could get messy. But it might be handy,
and probably, expected by the user.

I tried various combinations of calling .clone(workspace) on the
current Configuration and Effigy classes and also creating new ones,
but I can’t seem to make findEffigy work. Any thoughts?

Thanks,
Colin

2009/10/16 Colin Enticott <Colin.Enticott at csse.monash.edu.au>:
> Hi Edward,
>
> I was working on the same idea of creating a new workspace. Besides
> having to create a new Workspace, a top level TypedCompositeActor, a
> proxy director and copy the original CompositeActor’s parameters
> across, it wasn’t too hard. I already had a special TypedIORelation
> that could splice onto other relations which had no problems with
> crossing the workspace boundaries. It is working well for most things,
> but TypedCompositeActors aren’t resolving properly. I’ll look into
> that next.
>
> Thank you for your help Edward.
>
> Regards,
> Colin
>
>
> 2009/10/15 Edward A. Lee <eal at eecs.berkeley.edu>:
>>
>> Hi Colin,
>>
>> Technically, the composite actor has to acquire a read lock
>> before it can even be sure what director the composite actor
>> contains... What if the director is being changed?
>>
>> Unfortunately, with concurrency problems, it doesn't really
>> work to say "well, that won't happen very often..."
>>
>> The "right" solution would be to have hierarchical locks,
>> rather the global workspace lock.  This would be very tricky
>> to get right, however, because MoML is expressive enough
>> (and the expression language as well) that references cross
>> levels of the hierarchy (e.g. to variables in scope).
>> So this solution would have intercept the MoML parsing,
>> do scope analysis, and acquire the appropriate locks.
>>
>> Frankly, this does not look easy to me...
>>
>> A very different approach that might solve your problem
>> would be to have a mechanism for separating a part of the model
>> and running it in a different workspace.  The ModelReference
>> actor in the higher-order actors library might even do this,
>> (I don't recall).  The ThreadedComposite actor in the same
>> library might also be useful. There is a paper on how to use
>> it here:
>>
>> http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-151.html
>>
>> Edward
>>
>>
>>
>> Colin Enticott wrote:
>>>
>>> Hi Edward,
>>>
>>> My next problem (as predicted) is with nesting directors using
>>> CompositeActors. It looks like the CompositeActor class acquires a
>>> read lock on the workspace before it fires its director. If I have a
>>> workflow that has a long running composite actor, I cannot acquire a
>>> workspace write lock until that composite finishes. First of all I am
>>> curious why the composite acquires this read lock? I thought it would
>>> be the responsibility of the director to decide if it needs a lock on
>>> the workspace? I noticed the PNDirector will release workspace locks
>>> when it sleeps, but the SDF doesn’t (as it doesn’t sleep). As the
>>> director is responsible for the internals of the composite, shouldn't
>>> it just obtain an internal lock? Has this issue arisen before, or am I
>>> the only one doing reconfiguration with threading?
>>>
>>> And how would you suggest I solve this issue? I could make changes
>>> directly to the workflow without a write lock. As I mentioned in an
>>> earlier email, the objects I add cannot be used until my director
>>> safely invokes them. Alternatively, when I copy an actor, I could
>>> replace all the *CompositeActor with My*CompositeActor by modifying
>>> the MoML? My version of the composites would release the lock before
>>> invoking the fire method of its director. This option sounds safer,
>>> unless some of the directors assume it already had a readlock.
>>>
>>> Thoughts?
>>>
>>> Regards,
>>> Colin
>>>
>>>
>>> 2009/10/13 Colin Enticott <Colin.Enticott at csse.monash.edu.au>:
>>>>
>>>> Hi Edward,
>>>>
>>>> Fortunately, my problem is simplified and controllable. All tokens
>>>> going into an actor (that can be copied) will go to a central place.
>>>> So an actor being copied will not cause any problems with sending
>>>> tokens as it does not change the token path. The problem is when I
>>>> make a copy of the actor, some actors (like composites) will want
>>>> their types resolved.
>>>>
>>>> I decided to make the director responsible to process change requests
>>>> and resolving types. I can make sure I have the locks that will stop
>>>> this from happening. It seems to of solved the immediate problem, next
>>>> I'll have to look into nesting director issues. :-)
>>>>
>>>> Regards,
>>>> Colin
>>>>
>>>> 2009/10/13 Edward A. Lee <eal at eecs.berkeley.edu>:
>>>>>
>>>>> Hi Colin,
>>>>>
>>>>> I'm not sure how to accomplish what you want, but I am quite sure
>>>>> it will not be easy.  Basically, you will be dealing with low-level
>>>>> thread programming, which in my opinion, is almost impossible to get
>>>>> to work correctly.  See this paper for more on this:
>>>>>
>>>>>
>>>>> http://ptolemy.eecs.berkeley.edu/publications/papers/06/problemwithThreads/
>>>>>
>>>>> Edward
>>>>>
>>>>>
>>>>>
>>>>> Colin Enticott wrote:
>>>>>>
>>>>>> Hi Edward,
>>>>>>
>>>>>> Thankyou for replying. Yes, in my case I was treating stopFire() only
>>>>>> as a recommendation as I wanted change requests to be process while
>>>>>> the experiment was running.  I did this by releasing control back to
>>>>>> the manager. The focus of the new director is to execute jobs on grid
>>>>>> resources which tend to take a while. At runtime we increase the
>>>>>> number of actors based on the number of grid resources available.
>>>>>> Waiting for all actors to finish before I can increase the number of
>>>>>> resources is not optimal. I was hoping that all methods that change
>>>>>> the workspace would obtain a write lock, but as I can now see, only
>>>>>> the manager’s thread should have access to the workflow after a
>>>>>> stopFire().
>>>>>>
>>>>>> So what do you recommend as a solution? I could process change
>>>>>> requests myself instead of releasing control back to the manager. This
>>>>>> will allow changes to be processed that any third party actor might
>>>>>> request. Or without using change requests by directly modify the
>>>>>> workflow with using workspace write locks.
>>>>>>
>>>>>> The problem with both these solutions is when I copy an opaque
>>>>>> composite, it itself will perform a type check internally and on the
>>>>>> immediate connected external actors. I guess I will have to make this
>>>>>> environment “safe to do so” by blocking sends at this time.
>>>>>>
>>>>>> Regards,
>>>>>> Colin
>>>>>>
>>>>>>
>>>>>> 2009/10/9 Edward A. Lee <eal at eecs.berkeley.edu>:
>>>>>>>
>>>>>>> Interesting question...
>>>>>>>
>>>>>>> I suspect the problem with your threading director is that it isn't
>>>>>>> respecting the semantics of stopFire().  The contract is that the
>>>>>>> Manager executes change request only while every actor in the model
>>>>>>> is stopped. Specifically, it does so between iterations of the
>>>>>>> top-level
>>>>>>> model. So when the top-level director returns from postfire(), the
>>>>>>> Manager assumes it can execute change requests.
>>>>>>>
>>>>>>> The problem is that if you have a thread running independently of the
>>>>>>> top-level director, how does the top-level director know when it is
>>>>>>> safe
>>>>>>> to return from postfire()?
>>>>>>>
>>>>>>> The key is that stopFire() is called on every actor in the model
>>>>>>> when a change request is registered. In PN, the PN threads respond
>>>>>>> to stopFire() by suspending at the next opportunity (typically a
>>>>>>> read or a write to a port).  Only after all threads have stopped
>>>>>>> does the postfire() method of the director return, allowing the
>>>>>>> manager
>>>>>>> to execute the change request (or maybe it's the fire() method,
>>>>>>> I forget).
>>>>>>>
>>>>>>> Hope this helps...
>>>>>>>
>>>>>>> Edward
>>>>>>>
>>>>>>>
>>>>>>> Colin Enticott wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Looking into this further (sorry for the delay, I’ve been busy), it
>>>>>>>> looks
>>>>>>>> like all actions on port types obtain a read lock on the workspace.
>>>>>>>> When
>>>>>>>> the
>>>>>>>> manager resolves types, it also obtains a read lock, but makes
>>>>>>>> changes
>>>>>>>> to
>>>>>>>> the port types. Shouldn't it obtain a write lock?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Colin
>>>>>>>>
>>>>>>>> Colin Enticott wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> First of all, I didn’t think that multiple threads would use the
>>>>>>>>> same
>>>>>>>>> TypedIOPort object, but here’s the story:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I’ve been developing a new "threading director" for Ptolemy, the
>>>>>>>>> Nimrod/k
>>>>>>>>> TDA director[1], and one in every 100 executions of my rigorous
>>>>>>>>> director
>>>>>>>>> threading test workflows, I get an exception. This exception happens
>>>>>>>>> when an
>>>>>>>>> actor sends a token out a TypedIOPort:
>>>>>>>>>
>>>>>>>>> (Sorry. The exception is from kepler-1.0.0, so the line numbers
>>>>>>>>> differ
>>>>>>>>> from the current version, but the functions in question are
>>>>>>>>> identical)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ptolemy.kernel.util.IllegalActionException: Run-time type checking
>>>>>>>>> failed. Token 3 with type int is incompatible with port type: int
>>>>>>>>>
>>>>>>>>>  in .Composite.Ramp.output
>>>>>>>>>
>>>>>>>>>      at ptolemy.actor.TypedIOPort._checkType(TypedIOPort.java:750)
>>>>>>>>>
>>>>>>>>>      at ptolemy.actor.TypedIOPort.send(TypedIOPort.java:472)
>>>>>>>>>
>>>>>>>>>      at ptolemy.actor.lib.Ramp.fire(Ramp.java:138)
>>>>>>>>>
>>>>>>>>>      at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.monash.nimrod.NimrodDirector.NimrodProcessThread.run(NimrodProcessThread.java:347)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The confusing error message is “Token 3 with type int is
>>>>>>>>> incompatible
>>>>>>>>> with port type: int”. Looking into this deeper I believe the error
>>>>>>>>> message
>>>>>>>>> is being generated after the port state has changed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The function in question is:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> protected void _checkType(Token token) throws IllegalActionException
>>>>>>>>> {
>>>>>>>>>
>>>>>>>>>  int compare = TypeLattice.compare(token.getType(), _resolvedType);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  if ((compare == CPO.HIGHER) || (compare == CPO.INCOMPARABLE)) {
>>>>>>>>>
>>>>>>>>>      throw new IllegalActionException(this,
>>>>>>>>>
>>>>>>>>>              "Run-time type checking failed. Token " + token
>>>>>>>>>
>>>>>>>>>                      + " with type " + token.getType()
>>>>>>>>>
>>>>>>>>>                      + " is incompatible with port type: "
>>>>>>>>>
>>>>>>>>>                      + getType().toString());
>>>>>>>>>
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I suspect the “compare” method is called before the type is set (or
>>>>>>>>> changed) and the error message is generated after the type is set.
>>>>>>>>> Looking
>>>>>>>>> for another thread that could be accessing the TypedIOPort, I
>>>>>>>>> discovered the
>>>>>>>>> type checking functionality. Listening to the manager, it looks like
>>>>>>>>> the
>>>>>>>>> manager is "resolving types" in response to a “ChangeRequest”.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Two questions:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Should I use “change requests” in this way? None of the changes will
>>>>>>>>> cause any problems with types and so I could directly modify the
>>>>>>>>> workflow.
>>>>>>>>> (I initially used change requests as the main thread holds a read
>>>>>>>>> lock
>>>>>>>>> on
>>>>>>>>> the workspace when the "manager" fires the “director”)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And, it looks like TypedIOPort is not thread safe. Does this need to
>>>>>>>>> be
>>>>>>>>> fixed? Or should type resolving be completely blocked in a threaded
>>>>>>>>> environment? The documentation suggests it should be left to the
>>>>>>>>> director to
>>>>>>>>> decide “when it is safe to perform change requests”, but I cannot
>>>>>>>>> see a
>>>>>>>>> way
>>>>>>>>> of preventing an actor from doing a type check when sending tokens
>>>>>>>>> in a
>>>>>>>>> threaded environment.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Also, I don't think it is just with my director. It looks like this
>>>>>>>>> issue
>>>>>>>>> will arise in the PN environment, if the workflow makes changes to
>>>>>>>>> the
>>>>>>>>> workflow.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Colin
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1] Abramson D, Enticott C, Altintas I., "Nimrod/K: Towards
>>>>>>>>> Massively
>>>>>>>>> Parallel Dynamic Grid Workflows", IEEE SuperComputing 2008, Austin,
>>>>>>>>> Texas
>>>>>>>>> November 2008
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Colin Enticott, Research Scientist, Ph: +61 03 9903 2215
>>>>>>>>> Room H7.26, Level 7, Building H, Monash University Caulfield 3145,
>>>>>>>>> Australia
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Kepler-dev mailing list
>>>>>>>>> Kepler-dev at kepler-project.org
>>>>>>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>>>>>>>
>>>>>>>> --
>>>>>>>> Colin Enticott, Research Scientist, Ph: +61 03 9903 2215
>>>>>>>> Room H7.26, Level 7, Building H, Monash University Caulfield 3145,
>>>>>>>> Australia
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Kepler-dev mailing list
>>>>>>>> Kepler-dev at kepler-project.org
>>>>>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>> Colin
>>>>
>>>
>>>
>>>
>>
>
>
>
> --
> Colin
>

-- 
Colin