[kepler-dev] [Ptolemy] Re: [Bug 3693] - MultiInstanceComposite actor deadlocks sometimes

Edward A. Lee eal at eecs.berkeley.edu
Tue Jan 6 16:07:11 PST 2009


Yep, I can replicate this one...
I'll take a look...

Edward


Daniel Crawl wrote:
> 
> Hi Edward,
> 
> Thanks for looking into this problem. After updating my sources, I
> no longer get a deadlock when running the original model. However,
> if I add a NondeterministicMerge actor, it eventually deadlocks
> (see attached model).
> 
> Thanks,
> 
>  --dan
> 
> 
> Edward A. Lee wrote:
>>
>> I have checked in a fix to the problem that MultiInstanceComposite
>> with PN would sometimes deadlock.
>>
>> Interestingly, the problem was broader, and the deadlock could have
>> occurred pretty much any time we had more than one thread obtaining
>> write permission on the workspace.  This was quite rare, since
>> in most applications it would only be the UI.  However, conceivably
>> we could have gotten deadlock during preinitialize if the user tried
>> to make editing changes at the same time that the preinitialize method
>> of some higher-order actor was trying to modify in the model.
>>
>> In the case of MultiInstanceComposite, it modifies the model in its
>> wrapup method, and when you have more than one instance of
>> MultiInstanceComposite, the deadlock was quite likely to happen.
>>
>> Edward
>>
>>
>> Edward A. Lee wrote:
>>>
>>> I have a diagnosis of this problem, and I believe I have
>>> a fix, but as usual with threads, I'm not fully confident
>>> in the solution.  I guess if this sounds reasonable, then
>>> it would increase my confidence.
>>>
>>> The MultiInstanceComposite apparently triggers a bug
>>> because its wrapup() method acquires write access to the
>>> workspace. In your model, multiple threads will be simultaneously
>>> trying to acquire write access, something that is fairly rare
>>> in uses of Ptolemy.  This is why we see the bug only with
>>> uses of MultiInstanceComposite.
>>>
>>> The problem is in the use of Workspace wait(Object obj) method.
>>> What this method does is release any read permissions that
>>> the calling thread has on the workspace, call obj.wait(),
>>> reacquire the read permissions, and return.
>>>
>>> The problem is that almost everything in the tree where
>>> this is called, it is inside a synchronized block,
>>> something like this:
>>>
>>>   synchronized(obj) {
>>>      ...
>>>      _workspace.wait(obj);
>>>      ...
>>>   }
>>>
>>> The problem occurs when the wait(Object obj) method tries
>>> to reacquire read permissions.  At that point, it holds
>>> a lock on obj, and blocks until the workspace grants
>>> read permission.
>>>
>>> If there is a thread waiting for write permission, the read
>>> permission is not granted.  The problem occurs when another
>>> thread tries to get a lock on obj while holding read or write
>>> permission on the workspace. Deadlock.
>>>
>>>
>>>
>>> I think that the fix is that a thread that calls
>>> wait(Object obj) should not hold a lock on obj when it makes
>>> that call... This is counterintuitive to Java programmers,
>>> because generally you _have to_ hold the lock to call wait().
>>> Indeed, inside wait(Object obj), it acquires the lock, but
>>> the key is that it releases that lock before it tries to
>>> reacquire read permissions, thus preventing the deadlock
>>> if the calling thread does not already hold the lock.
>>>
>>> I believe this is correct because wait(Object obj) will
>>> release any lock on obj anyway for an indeterminate amount
>>> of time while obj.wait() is called.  Thus, no calling method
>>> can really assume the lock is held across the call
>>> to wait(Object obj).
>>>
>>>
>>> Edward
>>>
>>> Christopher Brooks wrote:
>>>> Hi Edward,
>>>> Here's a MultiInstanceComposite model that hangs for me.
>>>> I've attached a Ptolemy version.
>>>>
>>>> The model has PN on the outside with SDF inside the
>>>> MultiInstanceComposite.  The MultiInstanceComposite has
>>>> no actors, just a link between the ports, which is rather odd.
>>>>
>>>> _Christopher
>>>>
>>>> bugzilla-daemon at ecoinformatics.org wrote:
>>>>> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=3693
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------- Comment #4 from crawl at sdsc.edu  2008-12-05 11:12 -------
>>>>> I was able to reproduce the deadlock in Jianwu's workflow on:
>>>>>
>>>>> Windows XP, java 1.6.0_11, Kepler 1.0.0
>>>>> Mac, java 1.5.0_16, both Kepler 1.0.0 and head
>>>>> _______________________________________________
>>>>> Kepler-dev mailing list
>>>>> Kepler-dev at kepler-project.org
>>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>>
>> _______________________________________________
>> Kepler-dev mailing list
>> Kepler-dev at kepler-project.org
>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>   
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: eal.vcf
Type: text/x-vcard
Size: 351 bytes
Desc: not available
URL: <http://mercury.nceas.ucsb.edu/kepler/pipermail/kepler-dev/attachments/20090106/4fdc388e/attachment.vcf>


More information about the Kepler-dev mailing list