[kepler-dev] Memory leak.

Christopher Brooks cxh at eecs.berkeley.edu
Wed Dec 21 14:51:55 PST 2005


Hi Kevin,

I changed the cc line from ptolemy-hackers to ptresearch so as
to reduce the traffic on ptolemy-hackers.  Once we get a solution,
I'll post a summary to ptolemy-hackers.  It seems like there will be
lots of back and forth on this issue.

I'm taking a bunch of email messages and combining them in an odd
order, but this is how I looked at things.

Kevin, you wrote:
> But in the MoMLParser.parse( URL, Reader ) method, it looks like
> this is being held onto.  parser returns a variable called
> _topLevel.  But there is code in the parse() method which assigns
> this (MoMLParser) to an attribute of _topLevel called "_parser".  I
> believe this is where the reference to the MoMLParser is being
> retained.

MoMLParser.parse(URL, Reader) caches itself as an attribute of
_toplevel.  There is quite a bit of confusion here.
1) In MoMLParser, the _parser private variable is a XmlParser.
2) MoMLParser.parse(URL, Reader) uses _parser and creates a 
   NamedObj called _toplevel.
3) MoMLParser.parse(URL, Reader) sets the "_parser" attribute of
   _toplevel to "this", meaning that the "_parser" attribute is
   of type MoMLParser.  "_parser" is a ptolemy.moml.ParserAttribute.

Perhaps we should rename the _parser private variable to _XmlParser so 
as to avoid confusion with the "_parser" attribute of type MoMLParser?

The "_parser" attribute is needed for undo and import.

In November, 2000, Edward wrote:
> I'll be checking in changes that now enable us to put hierarchical
> models safely into the libraries.  I can't check them in now, however,
> because vergil is dead in the water (for me at least).  It doesn't
> start up after I do a cvs update and make clean all.  When I get it
> working again, I'll check in my changes...
> 
> Here's how it works (by example).
> 
> In the signal processing actors, there is a new actor, Sinewave,
> which you can drag into your model.  You get an instance of
> composite actor, and your saved graph will reference the file
> in which that composite is defined.
> 
> To make the model moveable and transportable, it references
> the file with XML command:
> 
> <import source="ptolemy/actor/lib/SignalProcessingActors.xml"/>
> 
> Notice that the path given is neither absolute nor relative.
> It is given relative to the classpath.
> 
> I don't envision this being the only way of referring to other
> files, but I think it will be the most robust way.  The entry
> in the signalprocessing.xml library file looks like this:
> 
> <import source="ptolemy/actor/lib/SignalProcessingActors.xml"/>
> <entity name="Sinewave" class=".SignalProcessingActors.Sinewave"/>
> 
> The way it works is when this gets dropped into your model,
> all imports in the library are first performed in your model
> (there is only one in this case).  Then the entity is cloned
> from the import.
> 
> To get this to work, I introduced a new attribute, ParserAttribute.
> The MoMLParser creates this attribute in a toplevel, and it becomes
> the parser used for all MoMLChangeRequests on that model.
> This ensures that it has access to all imports.
> It will also facilitate implementing undo, whenever we get
> to that...

So, I think we are probably stuck with having the "_parser" attribute.

Kevin writes:
> The second reason I didn't want to spend a whole lot of time fixing it.  
> I pointed out just a little while ago that the microstar parser was 
> fingered as a possible performance problem.  BTW - that analysis was 
> inconclusive, it only stated that the parser consumed a lot of time - 
> but we know it's used a lot too.  I suggest before gutting it (it's a 
> piece of work, I know because I tried to implement MoMLParser as a SAX 
> ContentHandler and couldn't accomplish it), you actually pit two parsers 
> in a simple head to head parsing task.

I took a look at Aelfred.  There are currently two active version. 

The version used in saxon 7.1 and earlier:
http://saxon.sourceforge.net/aelfred.html
Saxon now uses the Crimson parser that is in Java 1.4

There is also a GNU Aelfred parser at
http://www.gnu.org/software/classpathx/jaxp/apidoc/gnu/xml/aelfred2/package-sum
mary.html
This version can be a validating parser.

We don't really need a validating parser, so the GNU version is out.
Also, if we modify the GNU version, we risk encumbering Ptolemy and
Kepler with the GPL.

Unfortunately, it looks like the aelfred version from saxon 7.1 is now
GPL'd.  I think it this was done somewhat accidentally, there are web
pages that list the copyright without the GPL

In principle, we could take the version from saxon 7.1 and port our
fixes to it and see what happens, but I doubt if we will see much
improvement.  Rather than doing that, we should try other parsers.

I did look at the various versions of Aelfred, and none of them
set readBuffer[] to null.  Various versions have a cleanupVariables()
method that releases various arrays, but not readBuffer.

The version of XmlParser we ship has the following comment for
cleanupVariables():
     * Clean up after the parse to allow some garbage collection.
     * Leave around anything that might be useful for queries.
readBuffer[] is not cleared in this version of cleanupVariables().
Thus, I believe that the authors thought readBuffer[] might be 
useful for a query.

The GNU classpath version says:
   * <p> Only one thread at a time may use this parser; since it is
   * private to this package, post-parse cleanup is done by the caller,
   * which MUST NOT REUSE the parser (just null it).

So, bottom line: we should really only call XmlParser once and
then null it.

On 12/14, Matt Jones wrote:
> Regarding Kevin's point (4) -- the aelfred parser is pretty old.  It
> was great circa 1998, but I think xerces has far better performance
> now, even in SAX mode.  They certainly spent a lot of time in
> 2000-2002 working on xerces performance. So you might see some
> improvements by looking at alternative parsers and seeing if any are
> more efficient.

Porting the MoML parser to use the Java 1.4 or 1.5 Jaxp parser, which
is based on Xerces 2.6.2 would be a good thing.  However, it is
probably a big task.  MoMLParser is a very complex class and
could use a redesign from the ground up.


Kevin wrote:
> [Christopher wrote:]
> >Yep, this looks like a leak of some sort.
> >
> >In MoMLParser, _parser is declared as:
> >    private XmlParser _parser = new XmlParser();
> >
> >I'm not sure if setting _parser to null in 1326 of
> >MoMLParser.parse(URL, Reader) is safe or not.  What happens the next
> >time this method is called with the same MoMLParser?  _parser is null
> >and we will crash.  I'm not sure if there is much state in XmlParser,
> >so creating and destroying _parser each time could be ok.
> >  
> >
> I was just testing a hypothesis and didn't want to spend a whole lot of 
> time fixing it correctly.  XmlParser (or it's delegate, I don't recall 
> which) does allocate some stuff including a fairly sizeable char[], but 
> extra work is much better than a 15M leak, imo.
> 
> I've thought of a few fixes.
> 
> One way to fix it would be to wrap the _parser member and the methods 
> which use it directly in an inner class of MoMLParser.  The inner class' 
> members would still have access to the containing class's private data 
> so there wouldn't have to be a bunch of setters to much up the 
> interface.  Then it might be easier to see where the object needs to be 
> created.

I'm not totally sure if this would work with XmlParser.
XmlParser expects us to declare a XmlHandler (in this case MoMLParser)
that has callback methods like MoMLParser.attribute().
Would XmlParser still be able to call these methods in an inner class
of MoMLParser?  I'm not so sure.

I think we'd have to totally rehack XmlParser.  If we are going to do
that, then why not try the parser that ships with Java?

> Another is to allocate the _parser variable in the parse(URL,String)
> method.  Seems like a reasonable thing to do.  But it doesn't look
> all that pretty to have a member variable be allocated/deallocated
> all the time.

At some level, I think we need to allocate and deallocate _parser each
time we call parse().  However, there are many parse() methods. 
Do we really want to call allocate and deallocate _parser for each
changeRequest?

I propose we set it for parse(URL, Reader) which is where all the
parse() methods end up.  This does have the added cost of allocating
and deallocating memory.

It looks like outside of MoMLParser.parse(URL, Reader), we
only call the follow methods on _parser:

_parser.getCurrentElement()  // called once to handle MoMLFilters
_parser.getLineNumber()      // called many times
_parser.getColumnNumber()    // called many times

The problem is that there are public methods that throw exceptions
that invoke getLineNumber() and getColumnNumber().
We could hack these to check for null values and do the right thing.

> Another is to declare _parser as function local, then pass it into all 
> the methods that use it.  This would result in a pretty ugly private 
> api, but would certainly make the use of _parser more explicit.

Again, the issue is that XmlParser is making callbacks to methods
like MoMLParser.attribute() and we can't easily change attribute()
without changing XmlParser (which we could do).

Ok, in summary, I propose that we modify MoMLParser.parse(URL, Reader)
to create and discard _parser.  This would help with the leak in the
short term.  Things like change requests could slower thouhg.

A longer term solution would be to use a modern parser, like
what ships with Java 1.5

Comments?

_Christopher


Kevin wrote:
--------

    
    Edward,
    
    I don't fully understand the situation and defer to you or Christopher 
    for greater details.   Perhaps there is something in the Kepler code 
    which is holding onto a MoMLParser too long.
    
    In kepler there is this code in ActorMetadata.ActorMetadata( InputStream 
    moml):
    
    {
    
     MoMLParser parser = new MoMLParser();
     NamedObj obj = parser.parse( null, moml );
    
    ...
    
    }
    
    So in theory the parser should be local and go out of scope.
    

    
    Kevin
    
    Edward A. Lee wrote:
    
    > At 12:29 AM 12/21/2005 -0800, Christopher Brooks wrote:
    >
    >> Yep, this looks like a leak of some sort.
    >>
    >> In MoMLParser, _parser is declared as:
    >>     private XmlParser _parser = new XmlParser();
    >
    >
    >
    > I'm confused... Wouldn't the leak then actually be
    > in the creation of multiple instances of MoMLParser without
    > deleting them?
    >
--------


More information about the Kepler-dev mailing list