EML 2.0 and packaging

Wed Mar 20 20:28:22 PST 2002

Hello EML-dev,

Last week Matt Jones, Mark Schildhauer, Chris Jones, Dan Higgins, Jing
Tao and I met to discuss the packaging issue with regards to EML.
 Specifically, whether we should continue to use our current RDF-like
method of linking different EML modules together or whether we should
use namespaces to create a single monolithic EML document with inherent
relationships.  There are good arguments for both methods, and hence, we
have been discussing this on and off for the last year and a half.

The benefits of using the existing RDF-like triples are that 1) they
provide maximum flexibility when additional modules need to be added 
(e.g., SOILS metadata),
and 2) the information in a module can be reused by reassociating it
with more than one other module, both within and between packages.  
The downside to the triples method is
that the relationships between modules cannot be predefined using an XML
specification language (XML Schema or DTD).  They must be checked using
external software.  It also divides EML up into several different files,
which must be kept together in order to maintain the continuity of the
package.  However, our concensus was that these are not difficult issues 
to address.

The arguments for changing EML into a single document structure are that
EML is simpler to transport as one document and that the relationships
between different modules are explicitly defined within that document
structure.  There are two major issues with this approach.  One is that
it is not easy to add additional modules in the future without changing
the schema of EML itself and releasing a new version of EML.  
We want to avoid this version proliferation. The other
is that information is not normalized, that is, you must reenter
identical information in many places instead of reusing that information
from a central location.  An example of this is attribute information.
 If there are a bunch of entities in your package with the same
attribute structure, the user must reenter the attribute information for
each entity.  We have gotten extensive feedback from ecological
scientists that our current ability to reuse metadata via triple pointers is
an extremely important feature of EML from a practical perspective.

When we met on this issue, the room was, once again, divided on what to
do about this issue.  It was brought up that there are some alternative 
approaches
for pointers in XML.  First, the XML id/ref could be used within a 
single XML
document but not between documents.  Second, XPointer could be used to
allow the use of pointers from one XML
document to another XML document.  We're sure there are other 
possibilities as well.  
One of these could possibly allow us to make
EML into a monolithic document while not having to denormalize the
information that is being marked up.  However, our concensus was that 
these approaches
also have their problems and it was clear that we would not be gaining 
anything
by switching from our existing approach. Thus, we concluded that EML 
2.0.0 should
continue to use triples to link modules, as it is the only system with 
which we
have substantial real-world experience.  We could then revisit the issue 
in a
later version of EML (3.0?) if it seems appropriate.

We would like to hear any input or insights, pro or con into this
decision.  We need to make a final decision for EML 2.0.0 within the 
next week
in order to prepare for the candidate release drafts for the April meeting.
Please reply with any comments to the entire eml-dev list.

Thanks,
Chad Berkley