[kepler-dev] Memory leak.

Wed Dec 21 22:12:07 PST 2005

Actually, one simple solution might be to use exactly one XmlParser,
defined as a static private variable.  I don't think it has any model
dependent code at all... We would have to implement mutual exclusion
to prevent multiple threads from accessing it at once, however, but
this is probably fairly easy to do...

About the getlinenumber problem you mention: is this really an issue?
wouldn't a parse exception be dealt with right away? It could be done
inside the mutual exclusion lock (however... some care is necessary if
any reporting is done to the user via swing, since this will now involve
another thread through which multiple threads accessing the parser might
deadlock).

I think it's worth a try...

Edward

At 07:01 PM 12/21/2005 -0600, Kevin Ruland wrote:

>Edward,
>
>I'm responding to Christophers email only to help move the topic to the
>other list and consolidate the information.
>
>I understand the model needs to have a reference to it's MoMLParser for
>other reasons.  But does the MoMLParser need to maintain a reference to
>the XMLParser?  I suspect that once the parsing process is completed the
>state in the XMLParser is no longer useful.  That is why I think the
>_parser member variable could be considered a leak.
>
>Certainly Kepler is currently hanging on to too many references.  Each
>one of the actors Kepler creates (333 of them) takes up quite a bit of
>room and is causing the application to require large volumes of memory
>in order to simply start up.  After the holidays we will certainly be
>discussing how this can be avoided.
>
>Back to the XMLParser objects.  These were not insignificant resources.
>Granted there were 333 of them but they collectively occupied something
>like 15M.    That's what 50k each or so.  This is considerable given the
>state maintained (most of it was a buffer for the input source) is
>extraneous.
>
>I suspect if all the methods in MoMLParser which use the member variable
>_parser are converted to accept the XMLParser as an argument, it will be
>clear that moving the variable from member to function-local is not so bad.
>
>I had outlined a couple of ideas to make it a local variable.
>Christopher has already commented on some of them.  It is certainly a
>sticky situation.  The easiest solution, as he points out is to
>allocate/de-allocate _parser in the parse(URL, Reader) method.  Is
>suspect the performance will take a hit because the underlying parser
>does allocate a number of dynamic variables.  It is certainly the least
>radical solution.
>
>One idea would be to use pooling for the XmlParser objects.  This would
>add complexity but would reduce the allocation time.
>
>To solve the Exception handling "getlinenumber" problem, I suggest doing
>something like cache the parser state when an exception occurs and store
>that in MoMLParser.  Then if somebody needs it get it from MoMLParser
>instead of the XmlParser object.  The other idea is to have the Excption
>hold onto this information and carry it up the stack.
>
>The current parser is not a SAX parser.  It is in practice an event
>driven parser, but doesn't satisfy the SAX api.  But aren't the current
>incarnations of Saxon using the SAX api?  Moving the parsing process to
>pure SAX is not easy but does offer benefits.  But even other SAX
>parsers maintain some state/buffers and have to be released.  So moving
>to a different parser would not actually correct this situation.
>
>Kevin
>
>Edward A. Lee wrote:
>Kevin:
>
>You are right...
>
>When a parser is used to open a model, the model itself acquires an
>instance of ParserAttribute, which contains a reference to the parser.
>Thus, as long as there is an active reference to the model, there
>will be a reference to the parser.  We could perhaps change ParserAttribute
>to have a weak reference to the parser.
>
>I think the real issue is that the model should pass out of scope,
>and maybe it is not being passed out of scope.
>
>Moreover, references to models may persist longer than they should...
>Although I think we made an effort to make sure all references to the
>model were weak references, I'm not sure we did it exactly right...
>Unfortunately, in Java, it seems you can't actually force a weak
>reference to dissociate...
>
>Edward
>
>
>Christopher Brooks wrote:
>
> >Hi Kevin,
> >
> >I changed the cc line from ptolemy-hackers to ptresearch so as
> >to reduce the traffic on ptolemy-hackers.  Once we get a solution,
> >I'll post a summary to ptolemy-hackers.  It seems like there will be
> >lots of back and forth on this issue.
> >
> >I'm taking a bunch of email messages and combining them in an odd
> >order, but this is how I looked at things.
> >
> >Kevin, you wrote:
> >
> >
> >>But in the MoMLParser.parse( URL, Reader ) method, it looks like
> >>this is being held onto.  parser returns a variable called
> >>_topLevel.  But there is code in the parse() method which assigns
> >>this (MoMLParser) to an attribute of _topLevel called "_parser".  I
> >>believe this is where the reference to the MoMLParser is being
> >>retained.
> >>
> >>
> >
> >MoMLParser.parse(URL, Reader) caches itself as an attribute of
> >_toplevel.  There is quite a bit of confusion here.
> >1) In MoMLParser, the _parser private variable is a XmlParser.
> >2) MoMLParser.parse(URL, Reader) uses _parser and creates a
> >   NamedObj called _toplevel.
> >3) MoMLParser.parse(URL, Reader) sets the "_parser" attribute of
> >   _toplevel to "this", meaning that the "_parser" attribute is
> >   of type MoMLParser.  "_parser" is a ptolemy.moml.ParserAttribute.
> >
> >Perhaps we should rename the _parser private variable to _XmlParser so
> >as to avoid confusion with the "_parser" attribute of type MoMLParser?
> >
> >The "_parser" attribute is needed for undo and import.
> >
> >In November, 2000, Edward wrote:
> >
> >
> >>I'll be checking in changes that now enable us to put hierarchical
> >>models safely into the libraries.  I can't check them in now, however,
> >>because vergil is dead in the water (for me at least).  It doesn't
> >>start up after I do a cvs update and make clean all.  When I get it
> >>working again, I'll check in my changes...
> >>
> >>Here's how it works (by example).
> >>
> >>In the signal processing actors, there is a new actor, Sinewave,
> >>which you can drag into your model.  You get an instance of
> >>composite actor, and your saved graph will reference the file
> >>in which that composite is defined.
> >>
> >>To make the model moveable and transportable, it references
> >>the file with XML command:
> >>
> >><import source="ptolemy/actor/lib/SignalProcessingActors.xml"/>
> >>
> >>Notice that the path given is neither absolute nor relative.
> >>It is given relative to the classpath.
> >>
> >>I don't envision this being the only way of referring to other
> >>files, but I think it will be the most robust way.  The entry
> >>in the signalprocessing.xml library file looks like this:
> >>
> >><import source="ptolemy/actor/lib/SignalProcessingActors.xml"/>
> >><entity name="Sinewave" class=".SignalProcessingActors.Sinewave"/>
> >>
> >>The way it works is when this gets dropped into your model,
> >>all imports in the library are first performed in your model
> >>(there is only one in this case).  Then the entity is cloned
> >>from the import.
> >>
> >>To get this to work, I introduced a new attribute, ParserAttribute.
> >>The MoMLParser creates this attribute in a toplevel, and it becomes
> >>the parser used for all MoMLChangeRequests on that model.
> >>This ensures that it has access to all imports.
> >>It will also facilitate implementing undo, whenever we get
> >>to that...
> >>
> >>
> >
> >So, I think we are probably stuck with having the "_parser" attribute.
> >
> >Kevin writes:
> >
> >
> >>The second reason I didn't want to spend a whole lot of time fixing it.
> >>I pointed out just a little while ago that the microstar parser was
> >>fingered as a possible performance problem.  BTW - that analysis was
> >>inconclusive, it only stated that the parser consumed a lot of time -
> >>but we know it's used a lot too.  I suggest before gutting it (it's a
> >>piece of work, I know because I tried to implement MoMLParser as a SAX
> >>ContentHandler and couldn't accomplish it), you actually pit two parsers
> >>in a simple head to head parsing task.
> >>
> >>
> >
> >I took a look at Aelfred.  There are currently two active version.
> >
> >The version used in saxon 7.1 and earlier:
> >http://saxon.sourceforge.net/aelfred.html
> >Saxon now uses the Crimson parser that is in Java 1.4
> >
> >There is also a GNU Aelfred parser at
> >http://www.gnu.org/software/classpathx/jaxp/apidoc/gnu/xml/aelfred2/packa 
> ge-sum
> >mary.html
> >This version can be a validating parser.
> >
> >We don't really need a validating parser, so the GNU version is out.
> >Also, if we modify the GNU version, we risk encumbering Ptolemy and
> >Kepler with the GPL.
> >
> >Unfortunately, it looks like the aelfred version from saxon 7.1 is now
> >GPL'd.  I think it this was done somewhat accidentally, there are web
> >pages that list the copyright without the GPL
> >
> >In principle, we could take the version from saxon 7.1 and port our
> >fixes to it and see what happens, but I doubt if we will see much
> >improvement.  Rather than doing that, we should try other parsers.
> >
> >I did look at the various versions of Aelfred, and none of them
> >set readBuffer[] to null.  Various versions have a cleanupVariables()
> >method that releases various arrays, but not readBuffer.
> >
> >The version of XmlParser we ship has the following comment for
> >cleanupVariables():
> >     * Clean up after the parse to allow some garbage collection.
> >     * Leave around anything that might be useful for queries.
> >readBuffer[] is not cleared in this version of cleanupVariables().
> >Thus, I believe that the authors thought readBuffer[] might be
> >useful for a query.
> >
> >The GNU classpath version says:
> >   * <p> Only one thread at a time may use this parser; since it is
> >   * private to this package, post-parse cleanup is done by the caller,
> >   * which MUST NOT REUSE the parser (just null it).
> >
> >So, bottom line: we should really only call XmlParser once and
> >then null it.
> >
> >On 12/14, Matt Jones wrote:
> >
> >
> >>Regarding Kevin's point (4) -- the aelfred parser is pretty old.  It
> >>was great circa 1998, but I think xerces has far better performance
> >>now, even in SAX mode.  They certainly spent a lot of time in
> >>2000-2002 working on xerces performance. So you might see some
> >>improvements by looking at alternative parsers and seeing if any are
> >>more efficient.
> >>
> >>
> >
> >Porting the MoML parser to use the Java 1.4 or 1.5 Jaxp parser, which
> >is based on Xerces 2.6.2 would be a good thing.  However, it is
> >probably a big task.  MoMLParser is a very complex class and
> >could use a redesign from the ground up.
> >
> >
> >Kevin wrote:
> >
> >
> >>[Christopher wrote:]
> >>
> >>
> >>>Yep, this looks like a leak of some sort.
> >>>
> >>>In MoMLParser, _parser is declared as:
> >>>   private XmlParser _parser = new XmlParser();
> >>>
> >>>I'm not sure if setting _parser to null in 1326 of
> >>>MoMLParser.parse(URL, Reader) is safe or not.  What happens the next
> >>>time this method is called with the same MoMLParser?  _parser is null
> >>>and we will crash.  I'm not sure if there is much state in XmlParser,
> >>>so creating and destroying _parser each time could be ok.
> >>>
> >>>
> >>>
> >>>
> >>I was just testing a hypothesis and didn't want to spend a whole lot of
> >>time fixing it correctly.  XmlParser (or it's delegate, I don't recall
> >>which) does allocate some stuff including a fairly sizeable char[], but
> >>extra work is much better than a 15M leak, imo.
> >>
> >>I've thought of a few fixes.
> >>
> >>One way to fix it would be to wrap the _parser member and the methods
> >>which use it directly in an inner class of MoMLParser.  The inner class'
> >>members would still have access to the containing class's private data
> >>so there wouldn't have to be a bunch of setters to much up the
> >>interface.  Then it might be easier to see where the object needs to be
> >>created.
> >>
> >>
> >
> >I'm not totally sure if this would work with XmlParser.
> >XmlParser expects us to declare a XmlHandler (in this case MoMLParser)
> >that has callback methods like MoMLParser.attribute().
> >Would XmlParser still be able to call these methods in an inner class
> >of MoMLParser?  I'm not so sure.
> >
> >I think we'd have to totally rehack XmlParser.  If we are going to do
> >that, then why not try the parser that ships with Java?
> >
> >
> >
> >>Another is to allocate the _parser variable in the parse(URL,String)
> >>method.  Seems like a reasonable thing to do.  But it doesn't look
> >>all that pretty to have a member variable be allocated/deallocated
> >>all the time.
> >>
> >>
> >
> >At some level, I think we need to allocate and deallocate _parser each
> >time we call parse().  However, there are many parse() methods.
> >Do we really want to call allocate and deallocate _parser for each
> >changeRequest?
> >
> >I propose we set it for parse(URL, Reader) which is where all the
> >parse() methods end up.  This does have the added cost of allocating
> >and deallocating memory.
> >
> >It looks like outside of MoMLParser.parse(URL, Reader), we
> >only call the follow methods on _parser:
> >
> >_parser.getCurrentElement()  // called once to handle MoMLFilters
> >_parser.getLineNumber()      // called many times
> >_parser.getColumnNumber()    // called many times
> >
> >The problem is that there are public methods that throw exceptions
> >that invoke getLineNumber() and getColumnNumber().
> >We could hack these to check for null values and do the right thing.
> >
> >
> >
> >>Another is to declare _parser as function local, then pass it into all
> >>the methods that use it.  This would result in a pretty ugly private
> >>api, but would certainly make the use of _parser more explicit.
> >>
> >>
> >
> >Again, the issue is that XmlParser is making callbacks to methods
> >like MoMLParser.attribute() and we can't easily change attribute()
> >without changing XmlParser (which we could do).
> >
> >Ok, in summary, I propose that we modify MoMLParser.parse(URL, Reader)
> >to create and discard _parser.  This would help with the leak in the
> >short term.  Things like change requests could slower thouhg.
> >
> >A longer term solution would be to use a modern parser, like
> >what ships with Java 1.5
> >
> >Comments?
> >
> >_Christopher
> >
> >
> >Kevin wrote:
> >--------
> >
> >
> >    Edward,
> >
> >    I don't fully understand the situation and defer to you or Christopher
> >    for greater details.   Perhaps there is something in the Kepler code
> >    which is holding onto a MoMLParser too long.
> >
> >    In kepler there is this code in ActorMetadata.ActorMetadata( 
> InputStream
> >    moml):
> >
> >    {
> >
> >     MoMLParser parser = new MoMLParser();
> >     NamedObj obj = parser.parse( null, moml );
> >
> >    ...
> >
> >    }
> >
> >    So in theory the parser should be local and go out of scope.
> >
> >
> >
> >    Kevin
> >
> >    Edward A. Lee wrote:
> >
> >    > At 12:29 AM 12/21/2005 -0800, Christopher Brooks wrote:
> >    >
> >    >> Yep, this looks like a leak of some sort.
> >    >>
> >    >> In MoMLParser, _parser is declared as:
> >    >>     private XmlParser _parser = new XmlParser();
> >    >
> >    >
> >    >
> >    > I'm confused... Wouldn't the leak then actually be
> >    > in the creation of multiple instances of MoMLParser without
> >    > deleting them?
> >    >
> >--------
> >
> >
>
>_______________________________________________
>Kepler-dev mailing list
>Kepler-dev at ecoinformatics.org
>http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev

------------
Edward A. Lee
Professor, Chair of the EE Division, Associate Chair of EECS
231 Cory Hall, UC Berkeley, Berkeley, CA 94720
phone: 510-642-0253 or 510-642-0455, fax: 510-642-2845
eal at eecs.Berkeley.EDU, http://ptolemy.eecs.berkeley.edu/~eal