[kepler-dev] Thread safety with Ptolemy’s TypedIOPort

Tue Oct 20 08:32:38 PDT 2009

Congratulations on getting this working!  I'm quite impressed...
It's not easy.

This is going to be a tricky one because effigies are
themselves Ptolemy components within a model. This model is
what people call the "model" in model-view-controller architectures.

It seems that each workspace is going to have to have its own
model, with a separate tree of effigies and tableaux. This means,
among other things, that closing the top-level window will not
result in all the windows being closed... The UI could get a bit
awkward...

Edward

Colin Enticott wrote:
> Hi Edward,
> 
> Everything is working well in the new workspace. This last (hopefully)
> issue is with effigies. Any actor that was placed into the new
> workspace is unable to get an Effigy. That is,
> Configuration.findEffigy(toplevel()); returns null.
> 
> I guess it is not too important as many copies of the actor can be
> made and showing all windows could get messy. But it might be handy,
> and probably, expected by the user.
> 
> I tried various combinations of calling .clone(workspace) on the
> current Configuration and Effigy classes and also creating new ones,
> but I can’t seem to make findEffigy work. Any thoughts?
> 
> Thanks,
> Colin
> 
> 
> 2009/10/16 Colin Enticott <Colin.Enticott at csse.monash.edu.au>:
>> Hi Edward,
>>
>> I was working on the same idea of creating a new workspace. Besides
>> having to create a new Workspace, a top level TypedCompositeActor, a
>> proxy director and copy the original CompositeActor’s parameters
>> across, it wasn’t too hard. I already had a special TypedIORelation
>> that could splice onto other relations which had no problems with
>> crossing the workspace boundaries. It is working well for most things,
>> but TypedCompositeActors aren’t resolving properly. I’ll look into
>> that next.
>>
>> Thank you for your help Edward.
>>
>> Regards,
>> Colin
>>
>>
>> 2009/10/15 Edward A. Lee <eal at eecs.berkeley.edu>:
>>> Hi Colin,
>>>
>>> Technically, the composite actor has to acquire a read lock
>>> before it can even be sure what director the composite actor
>>> contains... What if the director is being changed?
>>>
>>> Unfortunately, with concurrency problems, it doesn't really
>>> work to say "well, that won't happen very often..."
>>>
>>> The "right" solution would be to have hierarchical locks,
>>> rather the global workspace lock.  This would be very tricky
>>> to get right, however, because MoML is expressive enough
>>> (and the expression language as well) that references cross
>>> levels of the hierarchy (e.g. to variables in scope).
>>> So this solution would have intercept the MoML parsing,
>>> do scope analysis, and acquire the appropriate locks.
>>>
>>> Frankly, this does not look easy to me...
>>>
>>> A very different approach that might solve your problem
>>> would be to have a mechanism for separating a part of the model
>>> and running it in a different workspace.  The ModelReference
>>> actor in the higher-order actors library might even do this,
>>> (I don't recall).  The ThreadedComposite actor in the same
>>> library might also be useful. There is a paper on how to use
>>> it here:
>>>
>>> http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-151.html
>>>
>>> Edward
>>>
>>>
>>>
>>> Colin Enticott wrote:
>>>> Hi Edward,
>>>>
>>>> My next problem (as predicted) is with nesting directors using
>>>> CompositeActors. It looks like the CompositeActor class acquires a
>>>> read lock on the workspace before it fires its director. If I have a
>>>> workflow that has a long running composite actor, I cannot acquire a
>>>> workspace write lock until that composite finishes. First of all I am
>>>> curious why the composite acquires this read lock? I thought it would
>>>> be the responsibility of the director to decide if it needs a lock on
>>>> the workspace? I noticed the PNDirector will release workspace locks
>>>> when it sleeps, but the SDF doesn’t (as it doesn’t sleep). As the
>>>> director is responsible for the internals of the composite, shouldn't
>>>> it just obtain an internal lock? Has this issue arisen before, or am I
>>>> the only one doing reconfiguration with threading?
>>>>
>>>> And how would you suggest I solve this issue? I could make changes
>>>> directly to the workflow without a write lock. As I mentioned in an
>>>> earlier email, the objects I add cannot be used until my director
>>>> safely invokes them. Alternatively, when I copy an actor, I could
>>>> replace all the *CompositeActor with My*CompositeActor by modifying
>>>> the MoML? My version of the composites would release the lock before
>>>> invoking the fire method of its director. This option sounds safer,
>>>> unless some of the directors assume it already had a readlock.
>>>>
>>>> Thoughts?
>>>>
>>>> Regards,
>>>> Colin
>>>>
>>>>
>>>> 2009/10/13 Colin Enticott <Colin.Enticott at csse.monash.edu.au>:
>>>>> Hi Edward,
>>>>>
>>>>> Fortunately, my problem is simplified and controllable. All tokens
>>>>> going into an actor (that can be copied) will go to a central place.
>>>>> So an actor being copied will not cause any problems with sending
>>>>> tokens as it does not change the token path. The problem is when I
>>>>> make a copy of the actor, some actors (like composites) will want
>>>>> their types resolved.
>>>>>
>>>>> I decided to make the director responsible to process change requests
>>>>> and resolving types. I can make sure I have the locks that will stop
>>>>> this from happening. It seems to of solved the immediate problem, next
>>>>> I'll have to look into nesting director issues. :-)
>>>>>
>>>>> Regards,
>>>>> Colin
>>>>>
>>>>> 2009/10/13 Edward A. Lee <eal at eecs.berkeley.edu>:
>>>>>> Hi Colin,
>>>>>>
>>>>>> I'm not sure how to accomplish what you want, but I am quite sure
>>>>>> it will not be easy.  Basically, you will be dealing with low-level
>>>>>> thread programming, which in my opinion, is almost impossible to get
>>>>>> to work correctly.  See this paper for more on this:
>>>>>>
>>>>>>
>>>>>> http://ptolemy.eecs.berkeley.edu/publications/papers/06/problemwithThreads/
>>>>>>
>>>>>> Edward
>>>>>>
>>>>>>
>>>>>>
>>>>>> Colin Enticott wrote:
>>>>>>> Hi Edward,
>>>>>>>
>>>>>>> Thankyou for replying. Yes, in my case I was treating stopFire() only
>>>>>>> as a recommendation as I wanted change requests to be process while
>>>>>>> the experiment was running.  I did this by releasing control back to
>>>>>>> the manager. The focus of the new director is to execute jobs on grid
>>>>>>> resources which tend to take a while. At runtime we increase the
>>>>>>> number of actors based on the number of grid resources available.
>>>>>>> Waiting for all actors to finish before I can increase the number of
>>>>>>> resources is not optimal. I was hoping that all methods that change
>>>>>>> the workspace would obtain a write lock, but as I can now see, only
>>>>>>> the manager’s thread should have access to the workflow after a
>>>>>>> stopFire().
>>>>>>>
>>>>>>> So what do you recommend as a solution? I could process change
>>>>>>> requests myself instead of releasing control back to the manager. This
>>>>>>> will allow changes to be processed that any third party actor might
>>>>>>> request. Or without using change requests by directly modify the
>>>>>>> workflow with using workspace write locks.
>>>>>>>
>>>>>>> The problem with both these solutions is when I copy an opaque
>>>>>>> composite, it itself will perform a type check internally and on the
>>>>>>> immediate connected external actors. I guess I will have to make this
>>>>>>> environment “safe to do so” by blocking sends at this time.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Colin
>>>>>>>
>>>>>>>
>>>>>>> 2009/10/9 Edward A. Lee <eal at eecs.berkeley.edu>:
>>>>>>>> Interesting question...
>>>>>>>>
>>>>>>>> I suspect the problem with your threading director is that it isn't
>>>>>>>> respecting the semantics of stopFire().  The contract is that the
>>>>>>>> Manager executes change request only while every actor in the model
>>>>>>>> is stopped. Specifically, it does so between iterations of the
>>>>>>>> top-level
>>>>>>>> model. So when the top-level director returns from postfire(), the
>>>>>>>> Manager assumes it can execute change requests.
>>>>>>>>
>>>>>>>> The problem is that if you have a thread running independently of the
>>>>>>>> top-level director, how does the top-level director know when it is
>>>>>>>> safe
>>>>>>>> to return from postfire()?
>>>>>>>>
>>>>>>>> The key is that stopFire() is called on every actor in the model
>>>>>>>> when a change request is registered. In PN, the PN threads respond
>>>>>>>> to stopFire() by suspending at the next opportunity (typically a
>>>>>>>> read or a write to a port).  Only after all threads have stopped
>>>>>>>> does the postfire() method of the director return, allowing the
>>>>>>>> manager
>>>>>>>> to execute the change request (or maybe it's the fire() method,
>>>>>>>> I forget).
>>>>>>>>
>>>>>>>> Hope this helps...
>>>>>>>>
>>>>>>>> Edward
>>>>>>>>
>>>>>>>>
>>>>>>>> Colin Enticott wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Looking into this further (sorry for the delay, I’ve been busy), it
>>>>>>>>> looks
>>>>>>>>> like all actions on port types obtain a read lock on the workspace.
>>>>>>>>> When
>>>>>>>>> the
>>>>>>>>> manager resolves types, it also obtains a read lock, but makes
>>>>>>>>> changes
>>>>>>>>> to
>>>>>>>>> the port types. Shouldn't it obtain a write lock?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Colin
>>>>>>>>>
>>>>>>>>> Colin Enticott wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> First of all, I didn’t think that multiple threads would use the
>>>>>>>>>> same
>>>>>>>>>> TypedIOPort object, but here’s the story:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I’ve been developing a new "threading director" for Ptolemy, the
>>>>>>>>>> Nimrod/k
>>>>>>>>>> TDA director[1], and one in every 100 executions of my rigorous
>>>>>>>>>> director
>>>>>>>>>> threading test workflows, I get an exception. This exception happens
>>>>>>>>>> when an
>>>>>>>>>> actor sends a token out a TypedIOPort:
>>>>>>>>>>
>>>>>>>>>> (Sorry. The exception is from kepler-1.0.0, so the line numbers
>>>>>>>>>> differ
>>>>>>>>>> from the current version, but the functions in question are
>>>>>>>>>> identical)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ptolemy.kernel.util.IllegalActionException: Run-time type checking
>>>>>>>>>> failed. Token 3 with type int is incompatible with port type: int
>>>>>>>>>>
>>>>>>>>>>  in .Composite.Ramp.output
>>>>>>>>>>
>>>>>>>>>>      at ptolemy.actor.TypedIOPort._checkType(TypedIOPort.java:750)
>>>>>>>>>>
>>>>>>>>>>      at ptolemy.actor.TypedIOPort.send(TypedIOPort.java:472)
>>>>>>>>>>
>>>>>>>>>>      at ptolemy.actor.lib.Ramp.fire(Ramp.java:138)
>>>>>>>>>>
>>>>>>>>>>      at
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> org.monash.nimrod.NimrodDirector.NimrodProcessThread.run(NimrodProcessThread.java:347)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The confusing error message is “Token 3 with type int is
>>>>>>>>>> incompatible
>>>>>>>>>> with port type: int”. Looking into this deeper I believe the error
>>>>>>>>>> message
>>>>>>>>>> is being generated after the port state has changed.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The function in question is:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> protected void _checkType(Token token) throws IllegalActionException
>>>>>>>>>> {
>>>>>>>>>>
>>>>>>>>>>  int compare = TypeLattice.compare(token.getType(), _resolvedType);
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  if ((compare == CPO.HIGHER) || (compare == CPO.INCOMPARABLE)) {
>>>>>>>>>>
>>>>>>>>>>      throw new IllegalActionException(this,
>>>>>>>>>>
>>>>>>>>>>              "Run-time type checking failed. Token " + token
>>>>>>>>>>
>>>>>>>>>>                      + " with type " + token.getType()
>>>>>>>>>>
>>>>>>>>>>                      + " is incompatible with port type: "
>>>>>>>>>>
>>>>>>>>>>                      + getType().toString());
>>>>>>>>>>
>>>>>>>>>>  }
>>>>>>>>>>
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I suspect the “compare” method is called before the type is set (or
>>>>>>>>>> changed) and the error message is generated after the type is set.
>>>>>>>>>> Looking
>>>>>>>>>> for another thread that could be accessing the TypedIOPort, I
>>>>>>>>>> discovered the
>>>>>>>>>> type checking functionality. Listening to the manager, it looks like
>>>>>>>>>> the
>>>>>>>>>> manager is "resolving types" in response to a “ChangeRequest”.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Two questions:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Should I use “change requests” in this way? None of the changes will
>>>>>>>>>> cause any problems with types and so I could directly modify the
>>>>>>>>>> workflow.
>>>>>>>>>> (I initially used change requests as the main thread holds a read
>>>>>>>>>> lock
>>>>>>>>>> on
>>>>>>>>>> the workspace when the "manager" fires the “director”)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And, it looks like TypedIOPort is not thread safe. Does this need to
>>>>>>>>>> be
>>>>>>>>>> fixed? Or should type resolving be completely blocked in a threaded
>>>>>>>>>> environment? The documentation suggests it should be left to the
>>>>>>>>>> director to
>>>>>>>>>> decide “when it is safe to perform change requests”, but I cannot
>>>>>>>>>> see a
>>>>>>>>>> way
>>>>>>>>>> of preventing an actor from doing a type check when sending tokens
>>>>>>>>>> in a
>>>>>>>>>> threaded environment.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Also, I don't think it is just with my director. It looks like this
>>>>>>>>>> issue
>>>>>>>>>> will arise in the PN environment, if the workflow makes changes to
>>>>>>>>>> the
>>>>>>>>>> workflow.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Colin
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1] Abramson D, Enticott C, Altintas I., "Nimrod/K: Towards
>>>>>>>>>> Massively
>>>>>>>>>> Parallel Dynamic Grid Workflows", IEEE SuperComputing 2008, Austin,
>>>>>>>>>> Texas
>>>>>>>>>> November 2008
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Colin Enticott, Research Scientist, Ph: +61 03 9903 2215
>>>>>>>>>> Room H7.26, Level 7, Building H, Monash University Caulfield 3145,
>>>>>>>>>> Australia
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Kepler-dev mailing list
>>>>>>>>>> Kepler-dev at kepler-project.org
>>>>>>>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Colin Enticott, Research Scientist, Ph: +61 03 9903 2215
>>>>>>>>> Room H7.26, Level 7, Building H, Monash University Caulfield 3145,
>>>>>>>>> Australia
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Kepler-dev mailing list
>>>>>>>>> Kepler-dev at kepler-project.org
>>>>>>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>>>>>
>>>>>
>>>>> --
>>>>> Colin
>>>>>
>>>>
>>>>
>>
>>
>> --
>> Colin
>>
> 
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: eal.vcf
Type: text/x-vcard
Size: 351 bytes
Desc: not available
URL: <http://mercury.nceas.ucsb.edu/kepler/pipermail/kepler-dev/attachments/20091020/ac64ab0f/attachment.vcf>