LTER Network Office thoughts on EML , Packaging, and a metadata specification
Owen Eddins
oeddins at lternet.edu
Mon Apr 1 10:22:18 PST 2002
In the process of developing the harvester design document we discussed
packaging pros and cons in great detail. Wed like to propose that we not
throw out either approach. It seems to us that EML could be structured to
accommodate both approaches. The value of having the ability to define EML
documents as self-contained for purposes of distribution and to support
normalization of the metadata with triples or some other referencing scheme
for purposes of storage would be tremendous and the arguments support an
approach that is flexible.
By way of analogy look at how 90% of the Microsoft Access Database (another
waste product of the Evil Empire) community uses this product. Most users
of this product create flat, de-normalized tables, tightly coupled to Forms
which are trivial to create, yet Access supports most relational database
features. As easy as Access makes it for someone to work in their
development model, few users of this product take advantage of those
features. No one is putting a gun to their head and telling them if the
want to use Access they have to go to third normal form. All the arguments
laid out in Chads original email against putting everything in one document
apply to why someone should normalize their Access database tables in the
first place, but most users will not follow that model because of the
overhead in doing so. I believe there is a significant potential community
out there that could be using and developing applications with EML and by
forcing the use of triples we are raising the bar to their doing so for two
reasons:
1) In addition to XML Schema users will need to understand the concept of
triples and its grammar and syntax.
2) Applications for processing EML will have to accommodate triples for
purposes of processing and sharing their EML.
Central to any determination as to how difficult it will be to support
triples and packaging in EML we would need to define who will be working
with EML and what their range of technical abilities are. The Access
analogy is a good one because someone with no programming background can use
this product to create a very complicated client server application by
historical standards and not write one line of code. So whether it is
difficult to develop apps using triples or even XML for that matter really
depends on the tools available to do so and what level of technical acumen
the user of EML has. W think the first question we need to answer is who
the target user base for EML is going to be. Is the user base
1) exclusively the province of high-end software developers
2) or an increasingly technically savvy group of users who are capable of
manipulating software tools but not writing code
3) or ecologists and data managers who want a metadata standard for their
own internal guidelines?
If it is going to be strictly for software developers then we could create
common libraries to share among ourselves that handle packaging which could
be incorporated in our software. If the target user base includes
categories two and three then we would argue that we are raising the bar too
high for users to work with EML by adding yet another layer of abstraction
with the triples and their grammar and syntax. It has been our observation
that among some of the data managers within the LTER Network the concepts of
XML and XML Schema have not been easy to grasp. Adding the triples to EML
has not made that any easier.
In terms of storing EML, reassociating modules is clearly an issue and a
strong argument for maintaining triples or adopting a standard like XLINK or
RDF. In terms of sharing EML, the argument of reassociation breaks down
because in the act of sharing EML presumably there will be not be any
reassociation. It cannot be overstated how important it is to have, for
example, the ability to change only once an EML Attribute module that is
used a thousand times if that change is global to all entities whose
attribute is defined by that module, and triples are an elegant way of
supporting this. Also if we are going to have data packaging and triples
supported within EML then, at a minimum, the precise grammar and syntax
should be included in the EML standard so that everyone wishing to use
packaging will be able to do so. Without this we cannot build common
software tools or share EML.
Wed like to propose that
1) for purposes of sharing, all of the metadata to be shared should be
contained within one document and that EML be restructured to support it
2) packaging and triples be supported for purposes of storage because of the
benefits of normalization.
Note this would mean packages could be shared among originators and
recipients of EML if the recipient and originator software supports
packaging/triples which conform to the syntax and grammar in the EML
standard for triples/packaging. But it would mean that for that purposes
of sharing, the de facto expectation would be that the EML document would be
self-contained.
If EML is properly structured XSLT filters could be written to
1) assemble EML modules and their triples into a self -contained EML
document
2) decompose self-contained EML documents into EML modules with triples.
These XSLT filters could be shared among all users of EML because they would
carry out transformations within the EML standard. So for example, Metacat
or the Harvester could use a filter to decompose self-contained EML
documents in to EML modules when it came upon a self-contained EML document
it wanted to ingest. This way Morpho will not have to be retooled.
And finally is EML an XML Schema/DTD or is it a specification? EML is an
XML Schema/DTD. The specification, a guideline for metadata management, is
implicit and needs to be made explicit. If we had a well defined
specification for a metadata standard then we could have
1) a metadata specification in English, as a guideline for ecologists and
data managers and for the purpose of soliciting community review.
2) implementation of the metadata specification in XML Schema for purposes
of sharing and storage.
3) implementation of metadata specification in relational database systems
for purposes of storage.
With a metadata specification we think we could meet the needs of all
categories of potential users of EML.
LTER Network Office
801 University Blvd. SE Suite 104
Albuquerque, NM 87106
Phone: 505.272.7319
Fax: 505.272.7080
Email: oeddins at lternet.edu
Web Page: www.lternet.edu
More information about the Eml-dev
mailing list