EML 2.0 and packaging

Fri Mar 22 08:58:30 PST 2002

Hi all,

here are my two cents from the extreme other end of the perspective:

It seems that this whole discussion goes in circles of the same arguments
and comes down to how the two main arguments are valued.

- versioning in monolithic EML: why is this so bad, everyone has been living
with versions ever since the computer was invented and before that we
already had book issues and authors who used the same words differently and
had to cited accordingly. (Ok, during my short brush with taxonomy I have
been swearing up and down the wall about this.) However, it seems to me that
no matter how hard we'll try we will not completly avoid this, considering
that EML is already in its version 2 and this is likely not the last one
even for its core elements.

vs.

- package identifier triples: as you pointed out they do require proprietary
software. To me this looks a little bit like a Microsoft approach of locking
people into a software package. Commercial and opensource xml editing
software is springing up everywhere and at reasonable costs and dramatically
increasing quality and capabilities. Why do we want to prevent people from
using these by introducing something that these software packages will not
deal with? 

and:

EML is not normalized as it stands in packaging and it will not be
completely possible because e.g. responsible party and literature are
inserted in so many places within one schema that this cannot be normalized
with packaging only with specific pointers. Or with the use of a database
(relational or native xml).

I hope I came up with some new arguments :)

Corinna

-----Original Message-----
From: chad berkley [mailto:berkley at nceas.ucsb.edu]
Sent: Wednesday, March 20, 2002 9:28 PM
To: eml-dev at ecoinformatics.org
Subject: EML 2.0 and packaging

Hello EML-dev,

Last week Matt Jones, Mark Schildhauer, Chris Jones, Dan Higgins, Jing
Tao and I met to discuss the packaging issue with regards to EML.
 Specifically, whether we should continue to use our current RDF-like
method of linking different EML modules together or whether we should
use namespaces to create a single monolithic EML document with inherent
relationships.  There are good arguments for both methods, and hence, we
have been discussing this on and off for the last year and a half.

The benefits of using the existing RDF-like triples are that 1) they
provide maximum flexibility when additional modules need to be added 
(e.g., SOILS metadata),
and 2) the information in a module can be reused by reassociating it
with more than one other module, both within and between packages.  
The downside to the triples method is
that the relationships between modules cannot be predefined using an XML
specification language (XML Schema or DTD).  They must be checked using
external software.  It also divides EML up into several different files,
which must be kept together in order to maintain the continuity of the
package.  However, our concensus was that these are not difficult issues 
to address.

The arguments for changing EML into a single document structure are that
EML is simpler to transport as one document and that the relationships
between different modules are explicitly defined within that document
structure.  There are two major issues with this approach.  One is that
it is not easy to add additional modules in the future without changing
the schema of EML itself and releasing a new version of EML.  
We want to avoid this version proliferation. The other
is that information is not normalized, that is, you must reenter
identical information in many places instead of reusing that information
from a central location.  An example of this is attribute information.
 If there are a bunch of entities in your package with the same
attribute structure, the user must reenter the attribute information for
each entity.  We have gotten extensive feedback from ecological
scientists that our current ability to reuse metadata via triple pointers is
an extremely important feature of EML from a practical perspective.

When we met on this issue, the room was, once again, divided on what to
do about this issue.  It was brought up that there are some alternative 
approaches
for pointers in XML.  First, the XML id/ref could be used within a 
single XML
document but not between documents.  Second, XPointer could be used to
allow the use of pointers from one XML
document to another XML document.  We're sure there are other 
possibilities as well.  
One of these could possibly allow us to make
EML into a monolithic document while not having to denormalize the
information that is being marked up.  However, our concensus was that 
these approaches
also have their problems and it was clear that we would not be gaining 
anything
by switching from our existing approach. Thus, we concluded that EML 
2.0.0 should
continue to use triples to link modules, as it is the only system with 
which we
have substantial real-world experience.  We could then revisit the issue 
in a
later version of EML (3.0?) if it seems appropriate.

We would like to hear any input or insights, pro or con into this
decision.  We need to make a final decision for EML 2.0.0 within the 
next week
in order to prepare for the candidate release drafts for the April meeting.
Please reply with any comments to the entire eml-dev list.

Thanks,
Chad Berkley

_______________________________________________
eml-dev mailing list
eml-dev at ecoinformatics.org
http://www.ecoinformatics.org/mailman/listinfo/eml-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20020322/61acd48f/attachment.htm