DOM parsing vs serialization

Dan Higgins higgins at nceas.ucsb.edu
Mon Mar 3 09:22:12 PST 2003


Hi All,
    Some added information on my initial comparison of 
parsing/serialization times. I did a few quick comparisons and found 
that simply reading the bytes in a file and doing nothing else takes 
about the same time as reading a serialized file of the same size! In 
other words, it looks like the bottleneck is reading a series of bytes 
from disk, not either parsing or recreating a serialized object. 
Serialized DOMs are 2-3 times large than the original XML so they take 
longer to read than the original XML!!

Dan

Chad Berkley wrote:

>On a slightly related aside, I was looking stuff up in the XPathAPI the
>other day and I came across another class called CachedXPathAPI that
>basically does the same thing, but it doesn't use static methods so it
>doesn't have to load the document every time you run xpath.  It's
>supposed to be faster according to the documentation but it does have a
>warning about it not updating the cached document unless you
>reinstantiate the class.
>
>See the docs here: 
>http://xml.apache.org/xalan-j/apidocs/org/apache/xpath/CachedXPathAPI.html
>
>I started using it in the stuff that I was working on and it does seem
>faster.  I think you just have to be careful if you update the document
>then do another xpath query.
>
>If you already were enlightened or astute enough to know about this
>class, ignore this email :).
>
>chad
>
>On Fri, 2003-02-28 at 15:36, Dan Higgins wrote:
>  
>
>>Hi All,
>>
>>    I did some simple comparisons of the time required to read a 
>>serialized DOM tree (XERCESJ parser) versus the time to create the DOM 
>>by parsing the XML text document. For very small docs, the times were 
>>about the same. However, for XML text docs of ~5K or large,  parsing the 
>>XML is 3-4 or more times faster than reading a serialized version of the 
>>DOM from disk !!! (Also, the serialized file is 3-4 times bigger than 
>>the original XML text.)
>>
>>    It thus looks like my idea of storing eml docs in serialized form on 
>>disk for morpho is NOT a good one.(Caching the DOM in RAM does help 
>>performance, however.)
>>
>>Dan
>>
>>-- 
>>*******************************************************************
>>Dan Higgins                                  higgins at nceas.ucsb.edu
>>http://www.nceas.ucsb.edu/    Ph: 805-892-2531
>>National Center for Ecological Analysis and Synthesis (NCEAS) 
>>735 State Street - Room 205
>>Santa Barbara, CA 93195
>>*******************************************************************
>>
>>
>>_______________________________________________
>>morpho-dev mailing list
>>morpho-dev at ecoinformatics.org
>>http://www.ecoinformatics.org/mailman/listinfo/morpho-dev
>>    
>>


-- 
*******************************************************************
Dan Higgins                                  higgins at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Ph: 805-892-2531
National Center for Ecological Analysis and Synthesis (NCEAS) 
735 State Street - Room 205
Santa Barbara, CA 93195
*******************************************************************






More information about the Morpho-dev mailing list