status of EML 2.0
David Blankman
dblankman at lternet.edu
Mon Aug 19 09:18:32 PDT 2002
You can add me to the conference call.
David
Peter McCartney wrote:
> Hi everyone.
>
> Its been a busy summer what with travel, vacation, and major scores on
> the funding front. While we all probably needed a break, we do need to
> resolve where we are with EML 2.0. Ive noticed a trickle of traffic in
> bugzilla on minor points, but its so small that I suspect I'm not the
> only one that is muddling over the path we should be taking with
> respect to some of the feedback we've been getting. Heres my take on
> the issues drawn both from our experience working with beta 9 this
> summer and from the workshop. I dont find bugzilla well suited to this
> level of comment, so i will make them here first. Im willing to put in
> the effort to turn some of these comments into bugs once we have some
> general sense of how to respond to them, or at least agreement that
> they are bugs. I've cc'd this to the lter IM list so that they can
> confirm whether or not my interpretations of the workshop response are
> fair.
>
> 1) There were a number of issues that Chris and i both felt were
> simple errors in beta 9 when we did our walk-through. The following
> are the most glaring that i recall:
>
> a) there is no recursive link within project to a related
> project description
> b) the dataSourceUsed element which links protocol methodSteps
> to existing eml-datasets from which this dataset was derived is missing
>
> c) there is no recursive link within protocol to reference an
> existing protocol (see separate comments on protocol below)
>
> d)the ascii fixed section of physical doesnt work, nor does it
> support records with multiple physical lines. We've already defined a
> structure that does this.
>
> 2) there are some technical problems with the identifier and keyref
> statements which prevent any instance file from validating. I dont
> understand this aspect of XML very well so i cant really suggest how
> to fix it or where the problem lies but I assume it is just a
> technical matter and not a fundamental problem with what we are trying
> to do with references
>
>
> 3) Literature needs fixing - it doesnt work intuitively with the way
> most of us cite bibliographic information, even after we get some
> robust name parsing tools written. Ive already enumerated the problems
> in bugzilla, so ill won't belabor it here. Ive had a student writing
> XSLs for various journal formats as well as endnote conversion, but
> they are held up waiting for a final version. The fact that the
> network office is investing so much effort into endnote export format
> as a means for harvesting bibliographic information is in my opinion
> not a good letter of recommendation for eml-literature, so we should
> fix it or drop it in favor of something simpler.
>
> 4) The decision to record online distribution only using URLs and only
> as stateless pointers to a single opaque object will, I fear, force us
> to seriously limit the role of EML in the future development of a web
> service based network. The fact that URLs are at best awkward and at
> worse not useable for expressing some types of connections is one
> thing, but it is the lack of support for describing a stateful
> connection that bothers me most. Many LTER sites, not just CAP, are
> attempting to build internet applications that are metadata driven and
> provide an interface (either direct or web service based) to data
> stored in many different systems including SDE, SQL, ascii files and
> various GIS and hyperspectral formats. While few of us intend to give
> out the stateful connection information to end users directly, many of
> us would like to see the development of server-side tools follow some
> standards so that we might all better share software components.
> Without a standard in EML for describing connection information in a
> usable format, the result is adminstrators are force to still develop
> local solutions and then figure out how to relate them to EML. I'd
> hate to see EML perceived as useful for enabling outside institutions
> to build applications around site data but not very useful for sites
> in building their own applications.
>
> 5) the recent traffic on reusable content partly underscores, I think,
> our failure to adequately separate storage and management of metadata
> from its presentation during this design process. The former benefits
> from a high degree of granularity and normalization, the latter
> benefits from just the opposite (assuming size of the eml document is
> not an issue). The references element is a device to introduce some
> normalization capability within EML to better serve management of
> information at the expense of some convenience in reading it. Its not
> likely to satisfy everyone since it doesnt allow addressing between
> documents and this, as well as granularity, will be a perennial
> problem when trying to use EML to serve as both a metadata management
> format as well as a metadata presentation format. For those of us that
> are dynamically building an EML document from a normalized source such
> as a relational database or collections of independent xml fragments,
> this is far less an issue: we can choose our own level of granularity
> within our storage systems and frankly find it easier to write the
> same information out twice rather than going through the hassle of
> creating identifiers and remembering what they are during the entire
> output process. Id hate to see the issue of references and granularity
> hold up the design process given that (in my opinion) they aren't
> really necessary at all in order to define EML content (with the one
> excepton of key definitions in eml-constraint which i dont like).
>
> 6) Finally, and most significantly, the response from the workshop
> indicated that how we have organized project and protocol are at odds
> with most participants. the problem seems to stem from the fact that
> most sites view projects as something that exists at a different level
> from a dataset. while most agreed philosopically that there has to be
> a discreet intellectual activity to produce a dataset, few make any
> formal recognition of this activity. Instead, most see certain
> components (data collection methods, sampling, qaqc, etc) as direct
> properties of the dataset. what is recognized as a project seems to be
> defined more by administrative or research criteria that often are on
> a higher plane than an individual dataset. A few acknowledged that
> they could live with using the immediate project element to record
> these more dataset-specific items and include a link to a higher-order
> project description, but this was hard to visualize at the time
> because that link was missing in beta 9.
>
> There were also similar problems with protocol. As Tim Bergsma put it
> (in better words that I did), we are trying to use one module to carry
> both prescriptive and descriptive information. In deciding to make
> protocol a resource-level element, we have really made the choice to
> use it as a prescriptive information tool - that is, a way of
> describing standardized protocols independent of any particular data
> collection instance. i even recall at least one person saying that
> their personal interpretation of "protocol" was As such, the
> informatin is only peripherally useful for describing the actual
> methods used to produce a specific dataset.
>
> There was also some dissatisfaction with the organization of protocol.
> many objected to the idea of binding QAQC descriptions to specific
> methodStep descriptions. Again, there was no philosophical argument
> that quality control measures by definition impose control over
> actions, nevertheless it does not agree with how most organize this
> information. Instead, QAQC descriptions are typically stored
> indeptendent of the the descriptions of the methodology and cannot be
> easily linked in this way. Finally, as we've seen in recent email
> traffic, there are frustrations with the perenial gray area of
> blending pure content markup (XML) with formating markup (for
> predominantly textual content).
>
> If I were to suggest changes to Beta9 to best address these responses,
> they might go something like this. I would change eml-project to be
> predominantly a research project description including stafffing,
> funding, publications, and links to higher level projects.. I would
> also leave eml-protocol as a resource module, but make it
> predominately text based and prescriptive, used only when a prodedure
> has been formally worked out and used by many datasets. I would make a
> new module called methods, which i would use in every place that we
> now use protocol. methods would contain a repeatable methodStep
> element, which in turn would include references to source datasets
> (type eml-dataset), software (type eml-software), instrumentation, and
> any QAQC procedures that can be logically related to those steps.
> Methods would also include optional links to eml-literature and
> eml-protocol as references to formally published or cataloged
> prodedures. I would create a new module researchContext in which i
> would include the methodological descriptors that directly qualify
> this dataset like site description, sampling, and the above methods
> module. Finally, for QAQC information that arent described under
> methodology but are directly related to specific attributes in the
> data, I would suggest using the data-quality module (in its current
> incarnations as attributeAccuracy, horizonalAccuracy and
> verticalAccuracy) should be used as the mechanism for describing both
> data quality and the various control/assurance procedures used to
> arrive at that quality.
>
> These of course are pretty extreme changes for a beta 9. I think that
> they probably describe something much more inline with metadata that
> the current LTER network is producing, but we have to weigh that
> against time-honored software development procedures which are
> designed to prevent knee-jerk changes like this so late in the game!
> With both momentum from the LTER network workshops and a desire to get
> going on the new ITR(s), we need EML 2.0 out the door soon or we will
> begin to lose our focus but we also need it to work.
>
> So what's the best course of action? an irc meeting? conference call?
> wait for a few days to see who responds to this email?
>
>
> Peter McCartney (peter.mccartney at asu.edu)
> Center for Environmental Studies
> Arizona State University
> 480-965-6791
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20020819/eee07d4f/attachment.htm
More information about the Eml-dev
mailing list