[kepler-dev] Thread safety with Ptolemy’s TypedIOPort

Tue Oct 20 21:56:59 PDT 2009

Hi Edward,

The need for this is low, so I might ignore it. The user can still
have actors with effigies in composites, but they can't make copies
work in parallel (which could get very messy).

Regards,
Colin

2009/10/21 Edward A. Lee <eal at eecs.berkeley.edu>:
>
> Congratulations on getting this working!  I'm quite impressed...
> It's not easy.
>
> This is going to be a tricky one because effigies are
> themselves Ptolemy components within a model. This model is
> what people call the "model" in model-view-controller architectures.
>
> It seems that each workspace is going to have to have its own
> model, with a separate tree of effigies and tableaux. This means,
> among other things, that closing the top-level window will not
> result in all the windows being closed... The UI could get a bit
> awkward...
>
> Edward
>
>
> Colin Enticott wrote:
>>
>> Hi Edward,
>>
>> Everything is working well in the new workspace. This last (hopefully)
>> issue is with effigies. Any actor that was placed into the new
>> workspace is unable to get an Effigy. That is,
>> Configuration.findEffigy(toplevel()); returns null.
>>
>> I guess it is not too important as many copies of the actor can be
>> made and showing all windows could get messy. But it might be handy,
>> and probably, expected by the user.
>>
>> I tried various combinations of calling .clone(workspace) on the
>> current Configuration and Effigy classes and also creating new ones,
>> but I can’t seem to make findEffigy work. Any thoughts?
>>
>> Thanks,
>> Colin
>>
>>
>> 2009/10/16 Colin Enticott <Colin.Enticott at csse.monash.edu.au>:
>>>
>>> Hi Edward,
>>>
>>> I was working on the same idea of creating a new workspace. Besides
>>> having to create a new Workspace, a top level TypedCompositeActor, a
>>> proxy director and copy the original CompositeActor’s parameters
>>> across, it wasn’t too hard. I already had a special TypedIORelation
>>> that could splice onto other relations which had no problems with
>>> crossing the workspace boundaries. It is working well for most things,
>>> but TypedCompositeActors aren’t resolving properly. I’ll look into
>>> that next.
>>>
>>> Thank you for your help Edward.
>>>
>>> Regards,
>>> Colin
>>>
>>>
>>> 2009/10/15 Edward A. Lee <eal at eecs.berkeley.edu>:
>>>>
>>>> Hi Colin,
>>>>
>>>> Technically, the composite actor has to acquire a read lock
>>>> before it can even be sure what director the composite actor
>>>> contains... What if the director is being changed?
>>>>
>>>> Unfortunately, with concurrency problems, it doesn't really
>>>> work to say "well, that won't happen very often..."
>>>>
>>>> The "right" solution would be to have hierarchical locks,
>>>> rather the global workspace lock.  This would be very tricky
>>>> to get right, however, because MoML is expressive enough
>>>> (and the expression language as well) that references cross
>>>> levels of the hierarchy (e.g. to variables in scope).
>>>> So this solution would have intercept the MoML parsing,
>>>> do scope analysis, and acquire the appropriate locks.
>>>>
>>>> Frankly, this does not look easy to me...
>>>>
>>>> A very different approach that might solve your problem
>>>> would be to have a mechanism for separating a part of the model
>>>> and running it in a different workspace.  The ModelReference
>>>> actor in the higher-order actors library might even do this,
>>>> (I don't recall).  The ThreadedComposite actor in the same
>>>> library might also be useful. There is a paper on how to use
>>>> it here:
>>>>
>>>> http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-151.html
>>>>
>>>> Edward
>>>>
>>>>
>>>>
>>>> Colin Enticott wrote:
>>>>>
>>>>> Hi Edward,
>>>>>
>>>>> My next problem (as predicted) is with nesting directors using
>>>>> CompositeActors. It looks like the CompositeActor class acquires a
>>>>> read lock on the workspace before it fires its director. If I have a
>>>>> workflow that has a long running composite actor, I cannot acquire a
>>>>> workspace write lock until that composite finishes. First of all I am
>>>>> curious why the composite acquires this read lock? I thought it would
>>>>> be the responsibility of the director to decide if it needs a lock on
>>>>> the workspace? I noticed the PNDirector will release workspace locks
>>>>> when it sleeps, but the SDF doesn’t (as it doesn’t sleep). As the
>>>>> director is responsible for the internals of the composite, shouldn't
>>>>> it just obtain an internal lock? Has this issue arisen before, or am I
>>>>> the only one doing reconfiguration with threading?
>>>>>
>>>>> And how would you suggest I solve this issue? I could make changes
>>>>> directly to the workflow without a write lock. As I mentioned in an
>>>>> earlier email, the objects I add cannot be used until my director
>>>>> safely invokes them. Alternatively, when I copy an actor, I could
>>>>> replace all the *CompositeActor with My*CompositeActor by modifying
>>>>> the MoML? My version of the composites would release the lock before
>>>>> invoking the fire method of its director. This option sounds safer,
>>>>> unless some of the directors assume it already had a readlock.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Regards,
>>>>> Colin
>>>>>
>>>>>
>>>>> 2009/10/13 Colin Enticott <Colin.Enticott at csse.monash.edu.au>:
>>>>>>
>>>>>> Hi Edward,
>>>>>>
>>>>>> Fortunately, my problem is simplified and controllable. All tokens
>>>>>> going into an actor (that can be copied) will go to a central place.
>>>>>> So an actor being copied will not cause any problems with sending
>>>>>> tokens as it does not change the token path. The problem is when I
>>>>>> make a copy of the actor, some actors (like composites) will want
>>>>>> their types resolved.
>>>>>>
>>>>>> I decided to make the director responsible to process change requests
>>>>>> and resolving types. I can make sure I have the locks that will stop
>>>>>> this from happening. It seems to of solved the immediate problem, next
>>>>>> I'll have to look into nesting director issues. :-)
>>>>>>
>>>>>> Regards,
>>>>>> Colin
>>>>>>
>>>>>> 2009/10/13 Edward A. Lee <eal at eecs.berkeley.edu>:
>>>>>>>
>>>>>>> Hi Colin,
>>>>>>>
>>>>>>> I'm not sure how to accomplish what you want, but I am quite sure
>>>>>>> it will not be easy.  Basically, you will be dealing with low-level
>>>>>>> thread programming, which in my opinion, is almost impossible to get
>>>>>>> to work correctly.  See this paper for more on this:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://ptolemy.eecs.berkeley.edu/publications/papers/06/problemwithThreads/
>>>>>>>
>>>>>>> Edward
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Colin Enticott wrote:
>>>>>>>>
>>>>>>>> Hi Edward,
>>>>>>>>
>>>>>>>> Thankyou for replying. Yes, in my case I was treating stopFire()
>>>>>>>> only
>>>>>>>> as a recommendation as I wanted change requests to be process while
>>>>>>>> the experiment was running.  I did this by releasing control back to
>>>>>>>> the manager. The focus of the new director is to execute jobs on
>>>>>>>> grid
>>>>>>>> resources which tend to take a while. At runtime we increase the
>>>>>>>> number of actors based on the number of grid resources available.
>>>>>>>> Waiting for all actors to finish before I can increase the number of
>>>>>>>> resources is not optimal. I was hoping that all methods that change
>>>>>>>> the workspace would obtain a write lock, but as I can now see, only
>>>>>>>> the manager’s thread should have access to the workflow after a
>>>>>>>> stopFire().
>>>>>>>>
>>>>>>>> So what do you recommend as a solution? I could process change
>>>>>>>> requests myself instead of releasing control back to the manager.
>>>>>>>> This
>>>>>>>> will allow changes to be processed that any third party actor might
>>>>>>>> request. Or without using change requests by directly modify the
>>>>>>>> workflow with using workspace write locks.
>>>>>>>>
>>>>>>>> The problem with both these solutions is when I copy an opaque
>>>>>>>> composite, it itself will perform a type check internally and on the
>>>>>>>> immediate connected external actors. I guess I will have to make
>>>>>>>> this
>>>>>>>> environment “safe to do so” by blocking sends at this time.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Colin
>>>>>>>>
>>>>>>>>
>>>>>>>> 2009/10/9 Edward A. Lee <eal at eecs.berkeley.edu>:
>>>>>>>>>
>>>>>>>>> Interesting question...
>>>>>>>>>
>>>>>>>>> I suspect the problem with your threading director is that it isn't
>>>>>>>>> respecting the semantics of stopFire().  The contract is that the
>>>>>>>>> Manager executes change request only while every actor in the model
>>>>>>>>> is stopped. Specifically, it does so between iterations of the
>>>>>>>>> top-level
>>>>>>>>> model. So when the top-level director returns from postfire(), the
>>>>>>>>> Manager assumes it can execute change requests.
>>>>>>>>>
>>>>>>>>> The problem is that if you have a thread running independently of
>>>>>>>>> the
>>>>>>>>> top-level director, how does the top-level director know when it is
>>>>>>>>> safe
>>>>>>>>> to return from postfire()?
>>>>>>>>>
>>>>>>>>> The key is that stopFire() is called on every actor in the model
>>>>>>>>> when a change request is registered. In PN, the PN threads respond
>>>>>>>>> to stopFire() by suspending at the next opportunity (typically a
>>>>>>>>> read or a write to a port).  Only after all threads have stopped
>>>>>>>>> does the postfire() method of the director return, allowing the
>>>>>>>>> manager
>>>>>>>>> to execute the change request (or maybe it's the fire() method,
>>>>>>>>> I forget).
>>>>>>>>>
>>>>>>>>> Hope this helps...
>>>>>>>>>
>>>>>>>>> Edward
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Colin Enticott wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Looking into this further (sorry for the delay, I’ve been busy),
>>>>>>>>>> it
>>>>>>>>>> looks
>>>>>>>>>> like all actions on port types obtain a read lock on the
>>>>>>>>>> workspace.
>>>>>>>>>> When
>>>>>>>>>> the
>>>>>>>>>> manager resolves types, it also obtains a read lock, but makes
>>>>>>>>>> changes
>>>>>>>>>> to
>>>>>>>>>> the port types. Shouldn't it obtain a write lock?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Colin
>>>>>>>>>>
>>>>>>>>>> Colin Enticott wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> First of all, I didn’t think that multiple threads would use the
>>>>>>>>>>> same
>>>>>>>>>>> TypedIOPort object, but here’s the story:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I’ve been developing a new "threading director" for Ptolemy, the
>>>>>>>>>>> Nimrod/k
>>>>>>>>>>> TDA director[1], and one in every 100 executions of my rigorous
>>>>>>>>>>> director
>>>>>>>>>>> threading test workflows, I get an exception. This exception
>>>>>>>>>>> happens
>>>>>>>>>>> when an
>>>>>>>>>>> actor sends a token out a TypedIOPort:
>>>>>>>>>>>
>>>>>>>>>>> (Sorry. The exception is from kepler-1.0.0, so the line numbers
>>>>>>>>>>> differ
>>>>>>>>>>> from the current version, but the functions in question are
>>>>>>>>>>> identical)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ptolemy.kernel.util.IllegalActionException: Run-time type
>>>>>>>>>>> checking
>>>>>>>>>>> failed. Token 3 with type int is incompatible with port type: int
>>>>>>>>>>>
>>>>>>>>>>>  in .Composite.Ramp.output
>>>>>>>>>>>
>>>>>>>>>>>     at ptolemy.actor.TypedIOPort._checkType(TypedIOPort.java:750)
>>>>>>>>>>>
>>>>>>>>>>>     at ptolemy.actor.TypedIOPort.send(TypedIOPort.java:472)
>>>>>>>>>>>
>>>>>>>>>>>     at ptolemy.actor.lib.Ramp.fire(Ramp.java:138)
>>>>>>>>>>>
>>>>>>>>>>>     at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> org.monash.nimrod.NimrodDirector.NimrodProcessThread.run(NimrodProcessThread.java:347)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The confusing error message is “Token 3 with type int is
>>>>>>>>>>> incompatible
>>>>>>>>>>> with port type: int”. Looking into this deeper I believe the
>>>>>>>>>>> error
>>>>>>>>>>> message
>>>>>>>>>>> is being generated after the port state has changed.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The function in question is:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> protected void _checkType(Token token) throws
>>>>>>>>>>> IllegalActionException
>>>>>>>>>>> {
>>>>>>>>>>>
>>>>>>>>>>>  int compare = TypeLattice.compare(token.getType(),
>>>>>>>>>>> _resolvedType);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  if ((compare == CPO.HIGHER) || (compare == CPO.INCOMPARABLE)) {
>>>>>>>>>>>
>>>>>>>>>>>     throw new IllegalActionException(this,
>>>>>>>>>>>
>>>>>>>>>>>             "Run-time type checking failed. Token " + token
>>>>>>>>>>>
>>>>>>>>>>>                     + " with type " + token.getType()
>>>>>>>>>>>
>>>>>>>>>>>                     + " is incompatible with port type: "
>>>>>>>>>>>
>>>>>>>>>>>                     + getType().toString());
>>>>>>>>>>>
>>>>>>>>>>>  }
>>>>>>>>>>>
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I suspect the “compare” method is called before the type is set
>>>>>>>>>>> (or
>>>>>>>>>>> changed) and the error message is generated after the type is
>>>>>>>>>>> set.
>>>>>>>>>>> Looking
>>>>>>>>>>> for another thread that could be accessing the TypedIOPort, I
>>>>>>>>>>> discovered the
>>>>>>>>>>> type checking functionality. Listening to the manager, it looks
>>>>>>>>>>> like
>>>>>>>>>>> the
>>>>>>>>>>> manager is "resolving types" in response to a “ChangeRequest”.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Two questions:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Should I use “change requests” in this way? None of the changes
>>>>>>>>>>> will
>>>>>>>>>>> cause any problems with types and so I could directly modify the
>>>>>>>>>>> workflow.
>>>>>>>>>>> (I initially used change requests as the main thread holds a read
>>>>>>>>>>> lock
>>>>>>>>>>> on
>>>>>>>>>>> the workspace when the "manager" fires the “director”)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> And, it looks like TypedIOPort is not thread safe. Does this need
>>>>>>>>>>> to
>>>>>>>>>>> be
>>>>>>>>>>> fixed? Or should type resolving be completely blocked in a
>>>>>>>>>>> threaded
>>>>>>>>>>> environment? The documentation suggests it should be left to the
>>>>>>>>>>> director to
>>>>>>>>>>> decide “when it is safe to perform change requests”, but I cannot
>>>>>>>>>>> see a
>>>>>>>>>>> way
>>>>>>>>>>> of preventing an actor from doing a type check when sending
>>>>>>>>>>> tokens
>>>>>>>>>>> in a
>>>>>>>>>>> threaded environment.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Also, I don't think it is just with my director. It looks like
>>>>>>>>>>> this
>>>>>>>>>>> issue
>>>>>>>>>>> will arise in the PN environment, if the workflow makes changes
>>>>>>>>>>> to
>>>>>>>>>>> the
>>>>>>>>>>> workflow.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Colin
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1] Abramson D, Enticott C, Altintas I., "Nimrod/K: Towards
>>>>>>>>>>> Massively
>>>>>>>>>>> Parallel Dynamic Grid Workflows", IEEE SuperComputing 2008,
>>>>>>>>>>> Austin,
>>>>>>>>>>> Texas
>>>>>>>>>>> November 2008
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Colin Enticott, Research Scientist, Ph: +61 03 9903 2215
>>>>>>>>>>> Room H7.26, Level 7, Building H, Monash University Caulfield
>>>>>>>>>>> 3145,
>>>>>>>>>>> Australia
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Kepler-dev mailing list
>>>>>>>>>>> Kepler-dev at kepler-project.org
>>>>>>>>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Colin Enticott, Research Scientist, Ph: +61 03 9903 2215
>>>>>>>>>> Room H7.26, Level 7, Building H, Monash University Caulfield 3145,
>>>>>>>>>> Australia
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Kepler-dev mailing list
>>>>>>>>>> Kepler-dev at kepler-project.org
>>>>>>>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Colin
>>>>>>
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Colin
>>>
>>
>>
>>
>
>

-- 
Colin