[eml-dev] [Bug 3232] New: - EML parser limitations

bugzilla-daemon@ecoinformatics.org bugzilla-daemon at ecoinformatics.org
Thu Apr 17 11:32:27 PDT 2008


http://bugzilla.ecoinformatics.org/show_bug.cgi?id=3232

           Summary: EML parser limitations
           Product: EML
           Version: 2.0.1
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: eml-parser
        AssignedTo: jones at nceas.ucsb.edu
        ReportedBy: mob at icess.ucsb.edu
         QAContact: eml-dev at ecoinformatics.org


This is just for the record. It seems that the EML parser could benefit from an
 update, although it's current behavior is perfectly legal.

It may be that bug 2054 appeared because the parser that comes with EML does
not use schema-full-checking. My main resourse (Walmsley 2002 book) says that
this is the xerces feature that checks for non-deterministic content models
(which was the error in 2054). That feature doesn't appear to be in the file
SAXValidate.java  - at least not to my untrained eye.

Bug 2703 seems to have come about because Xerces does not necessarily load all
the import schemas. The content model for appinfo and documentation is a
wildcard, and can be validated laxly. So it's up to the validator to go looking
for element declarations, but it doesnt have to. This behavior is perfectly
legal. 

So the parser can detect errors instance documents, but it does not adequately
catch schema errors. Maybe this was always the intent, but not quite clearly
stated. Or, maybe it's a simple matter to add some other xerces features, or
incorporate XSV instead - but not being a java programmer, I dont know.


More information about the Eml-dev mailing list