[eml-dev] validation of EML documents with qualified and unqualified local elements

Matt Jones jones at nceas.ucsb.edu
Thu Oct 1 14:15:09 PDT 2009


Hi Mike, Jing, and Nathan,

I perused your IRC conversation earlier about validation issues in EML when
you explicitly qualify all of the elements in the instance document.  I
don't think that Nathan's assertion that you can *always* add in a namespace
prefix for an element in an XML document and always have it remain valid is
true.  One must explicitly consider how the XML Schema documents define the
elementFormDefault and attributeFormDefault attributes, as they control how
namespaces are handled on non-global elements and on attributes,
respectively.  Not paying attention to these settings and arbitrarily
applying namespace prefixes to local elements can definitely cause validity
issues, which is my first guess as to why xerces is rejecting nathan's
example document (although I haven't seen the document).  Explicitly
qualifying attributes frequently causes these kind of validity problems,
because the default namespace does not apply to attributes.  Sooo.... I
could be wrong, but I don't think there is a problem with the validator or
with Metacat -- I think there is a problem with the document, in that it
truly is invalid according to the XML spec because it sets namespaces on
local elements that are explicitly set as 'unqualified'. Just a guess
though, as I haven't seen the document. More details follow if you're
interested.

A lot has been written on this topic, but there's a decent short overview of
these issues under the section "Qualified or unqualified" here:
   http://www.oracle.com/technology/pub/articles/srivastava_namespaces.html

>From that tutorial, they say:
   "When elementFormDefault is set to *qualified*, it implies that in the
instance of this grammar all the elements must be explicitly qualified,
either by using a prefix or setting a {default namespace}. An
*unqualified*setting means that only the globally declared elements
*must* be explicitly qualified, and the locally declared elements
*must not*be qualified. Qualifying a local declaration in this case is
an error. "

Note in particular the two bold phrases and the last sentence that I quoted.

These defaults can also be overriden using the "form" attribute in the
element definitions in XML Schema.  Keeping track of when it is allowable,
required, or disallowed to include a namespace prefix on an element under
XML Schema is actually an incredibly complex process involving consideration
of at least the following settings: 'xmlns', 'targetNamespace',
'elementFormDefault', 'attributeFormDefault', and 'form'.  This means that
whether or not a prefix is allowed or required or disallowed can change
throughout a document based on how these values are set, and can get
particularly complex when one schema imports another that uses different
qualification settings.

In the case of EML, mostly we don't set elementFormDefault, except in
eml-protocol.xsd, eml-spatialReference.xsd, and stmml.xsd (which we
inherited).  But 'form' is used in various places to override the default
(grep for 'qualified' in the xsd files).  I'll admit that back in 2002 when
we wrote EML, I didn't have a good understanding of these issues.  Nor did
most people in my opinion.  Some people don't today (or so it seems). In
retrospect, we should have probably paid more careful attention to it.  I
suspect there is not a good rationale why the various module authors set the
defaults as they did. I certainly don't know what that rationale was at the
time, although I remember a number of conversations about local versus
global element definitions that would certainly have been influenced by
this.  This should probably be entered as a bug against EML to clean up and
make sensible for a future release, although I suspect changes in this area
would have broad implications for validity of existing documents, making
transformation from older EML versions to newer versions complicated.

However, it is possible to write valid EML docs by carefully following the
qualified and unqualified sections properly.  My experience thus far has
been that qualifying the root eml element and then omitting the namespace
prefex throughout the rest of the document is the best default behavior,
although I suspect you could still run into troubles with this rule of
thumb.  We generate EML documents from XSLT as well, and so I know it is
possible to write valid EML documents from an XSLT stylesheet.  For example,
here is a stylesheet used by the NBII to do just that:
   https://code.ecoinformatics.org/code/eml/trunk/lib/eml2tonbii/bdp2eml.xsl

Bottom line: don't just throw in namespace prefixes into EML documents --
probably best to only qualify the root eml element.

Hope this helps clarify this situation rather than just muddy it up,

Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20091001/7cdc44a4/attachment.html>


More information about the Eml-dev mailing list