[kepler-dev] [Bug 3693] - MultiInstanceComposite actor deadlocks sometimes

Edward A. Lee eal at eecs.berkeley.edu
Fri Dec 19 10:04:31 PST 2008


I have a diagnosis of this problem, and I believe I have
a fix, but as usual with threads, I'm not fully confident
in the solution.  I guess if this sounds reasonable, then
it would increase my confidence.

The MultiInstanceComposite apparently triggers a bug
because its wrapup() method acquires write access to the
workspace. In your model, multiple threads will be simultaneously
trying to acquire write access, something that is fairly rare
in uses of Ptolemy.  This is why we see the bug only with
uses of MultiInstanceComposite.

The problem is in the use of Workspace wait(Object obj) method.
What this method does is release any read permissions that
the calling thread has on the workspace, call obj.wait(),
reacquire the read permissions, and return.

The problem is that almost everything in the tree where
this is called, it is inside a synchronized block,
something like this:

   synchronized(obj) {
      ...
      _workspace.wait(obj);
      ...
   }

The problem occurs when the wait(Object obj) method tries
to reacquire read permissions.  At that point, it holds
a lock on obj, and blocks until the workspace grants
read permission.

If there is a thread waiting for write permission, the read
permission is not granted.  The problem occurs when another
thread tries to get a lock on obj while holding read or write
permission on the workspace. Deadlock.



I think that the fix is that a thread that calls
wait(Object obj) should not hold a lock on obj when it makes
that call... This is counterintuitive to Java programmers,
because generally you _have to_ hold the lock to call wait().
Indeed, inside wait(Object obj), it acquires the lock, but
the key is that it releases that lock before it tries to
reacquire read permissions, thus preventing the deadlock
if the calling thread does not already hold the lock.

I believe this is correct because wait(Object obj) will
release any lock on obj anyway for an indeterminate amount
of time while obj.wait() is called.  Thus, no calling method
can really assume the lock is held across the call
to wait(Object obj).


Edward

Christopher Brooks wrote:
> Hi Edward,
> Here's a MultiInstanceComposite model that hangs for me.
> I've attached a Ptolemy version.
> 
> The model has PN on the outside with SDF inside the
> MultiInstanceComposite.  The MultiInstanceComposite has
> no actors, just a link between the ports, which is rather odd.
> 
> _Christopher
> 
> bugzilla-daemon at ecoinformatics.org wrote:
>> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=3693
>>
>>
>>
>>
>>
>> ------- Comment #4 from crawl at sdsc.edu  2008-12-05 11:12 -------
>> I was able to reproduce the deadlock in Jianwu's workflow on:
>>
>> Windows XP, java 1.6.0_11, Kepler 1.0.0
>> Mac, java 1.5.0_16, both Kepler 1.0.0 and head
>> _______________________________________________
>> Kepler-dev mailing list
>> Kepler-dev at kepler-project.org
>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: eal.vcf
Type: text/x-vcard
Size: 351 bytes
Desc: not available
URL: <http://mercury.nceas.ucsb.edu/kepler/pipermail/kepler-dev/attachments/20081219/38ac587e/attachment.vcf>


More information about the Kepler-dev mailing list