[kepler-dev] [Ptolemy] Re: [Bug 3693] - MultiInstanceComposite actor deadlocks sometimes

Daniel Crawl crawl at sdsc.edu
Mon Jan 5 11:04:05 PST 2009


Hi Edward,

Thanks for looking into this problem. After updating my sources, I
no longer get a deadlock when running the original model. However,
if I add a NondeterministicMerge actor, it eventually deadlocks
(see attached model).

Thanks,

  --dan


Edward A. Lee wrote:
>
> I have checked in a fix to the problem that MultiInstanceComposite
> with PN would sometimes deadlock.
>
> Interestingly, the problem was broader, and the deadlock could have
> occurred pretty much any time we had more than one thread obtaining
> write permission on the workspace.  This was quite rare, since
> in most applications it would only be the UI.  However, conceivably
> we could have gotten deadlock during preinitialize if the user tried
> to make editing changes at the same time that the preinitialize method
> of some higher-order actor was trying to modify in the model.
>
> In the case of MultiInstanceComposite, it modifies the model in its
> wrapup method, and when you have more than one instance of
> MultiInstanceComposite, the deadlock was quite likely to happen.
>
> Edward
>
>
> Edward A. Lee wrote:
>>
>> I have a diagnosis of this problem, and I believe I have
>> a fix, but as usual with threads, I'm not fully confident
>> in the solution.  I guess if this sounds reasonable, then
>> it would increase my confidence.
>>
>> The MultiInstanceComposite apparently triggers a bug
>> because its wrapup() method acquires write access to the
>> workspace. In your model, multiple threads will be simultaneously
>> trying to acquire write access, something that is fairly rare
>> in uses of Ptolemy.  This is why we see the bug only with
>> uses of MultiInstanceComposite.
>>
>> The problem is in the use of Workspace wait(Object obj) method.
>> What this method does is release any read permissions that
>> the calling thread has on the workspace, call obj.wait(),
>> reacquire the read permissions, and return.
>>
>> The problem is that almost everything in the tree where
>> this is called, it is inside a synchronized block,
>> something like this:
>>
>>   synchronized(obj) {
>>      ...
>>      _workspace.wait(obj);
>>      ...
>>   }
>>
>> The problem occurs when the wait(Object obj) method tries
>> to reacquire read permissions.  At that point, it holds
>> a lock on obj, and blocks until the workspace grants
>> read permission.
>>
>> If there is a thread waiting for write permission, the read
>> permission is not granted.  The problem occurs when another
>> thread tries to get a lock on obj while holding read or write
>> permission on the workspace. Deadlock.
>>
>>
>>
>> I think that the fix is that a thread that calls
>> wait(Object obj) should not hold a lock on obj when it makes
>> that call... This is counterintuitive to Java programmers,
>> because generally you _have to_ hold the lock to call wait().
>> Indeed, inside wait(Object obj), it acquires the lock, but
>> the key is that it releases that lock before it tries to
>> reacquire read permissions, thus preventing the deadlock
>> if the calling thread does not already hold the lock.
>>
>> I believe this is correct because wait(Object obj) will
>> release any lock on obj anyway for an indeterminate amount
>> of time while obj.wait() is called.  Thus, no calling method
>> can really assume the lock is held across the call
>> to wait(Object obj).
>>
>>
>> Edward
>>
>> Christopher Brooks wrote:
>>> Hi Edward,
>>> Here's a MultiInstanceComposite model that hangs for me.
>>> I've attached a Ptolemy version.
>>>
>>> The model has PN on the outside with SDF inside the
>>> MultiInstanceComposite.  The MultiInstanceComposite has
>>> no actors, just a link between the ports, which is rather odd.
>>>
>>> _Christopher
>>>
>>> bugzilla-daemon at ecoinformatics.org wrote:
>>>> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=3693
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------- Comment #4 from crawl at sdsc.edu  2008-12-05 11:12 -------
>>>> I was able to reproduce the deadlock in Jianwu's workflow on:
>>>>
>>>> Windows XP, java 1.6.0_11, Kepler 1.0.0
>>>> Mac, java 1.5.0_16, both Kepler 1.0.0 and head
>>>> _______________________________________________
>>>> Kepler-dev mailing list
>>>> Kepler-dev at kepler-project.org
>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at kepler-project.org
> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mic-merge-pt.xml
Type: text/xml
Size: 4250 bytes
Desc: not available
URL: <http://mercury.nceas.ucsb.edu/kepler/pipermail/kepler-dev/attachments/20090105/63d83fd7/attachment.xml>


More information about the Kepler-dev mailing list