status of EML 2.0

Mon Aug 19 09:18:32 PDT 2002

You can add me to the conference call.

David

Peter McCartney wrote:

> Hi everyone.
>
> Its been a busy summer what with travel, vacation, and major scores on 
> the funding front. While we all probably needed a break, we do need to 
> resolve where we are with EML 2.0. Ive noticed a trickle of traffic in 
> bugzilla on minor points, but its so small that I suspect I'm not the 
> only one that is muddling over the path we should be taking with 
> respect to some of the feedback we've been getting. Heres my take on 
> the issues drawn both from our experience working with beta 9 this 
> summer and from the workshop. I dont find bugzilla well suited to this 
> level of comment, so i will make them here first. Im willing to put in 
> the effort to turn some of these comments into bugs once we have some 
> general sense of how to respond to them, or at least agreement that 
> they are bugs. I've cc'd this to the lter IM list so that they can 
> confirm whether or not my interpretations of the workshop response are 
> fair.
>
> 1) There were a number of issues that Chris and i both felt were 
> simple errors in beta 9 when we did our walk-through. The following 
> are the most glaring that i recall:
>
>         a) there is no recursive link within project to a related 
> project description
>         b) the dataSourceUsed element which links protocol methodSteps 
> to existing eml-datasets from which this dataset was derived is missing
>
>         c) there is no recursive link within protocol to reference an 
> existing protocol (see separate comments on protocol below)
>
>         d)the ascii fixed section of physical doesnt work, nor does it 
> support records with multiple physical lines. We've already defined a 
> structure that does this.
>
> 2) there are some technical problems with the identifier and keyref 
> statements which prevent any instance file from validating. I dont 
> understand this aspect of XML very well so i cant really suggest how 
> to fix it or where the problem lies but I assume it is just a 
> technical matter and not a fundamental problem with what we are trying 
> to do with references
>
>  
> 3) Literature needs fixing - it doesnt work intuitively with the way 
> most of us cite bibliographic information, even after we get some 
> robust name parsing tools written. Ive already enumerated the problems 
> in bugzilla, so ill won't belabor it here. Ive had a student writing 
> XSLs for various journal formats as well as endnote conversion, but 
> they are held up waiting for a final version. The fact that the 
> network office is investing so much effort into endnote export format 
> as a means for harvesting bibliographic information is in my opinion 
> not a good letter of recommendation for eml-literature, so we should 
> fix it or drop it in favor of something simpler.
>
> 4) The decision to record online distribution only using URLs and only 
> as stateless pointers to a single opaque object will, I fear, force us 
> to seriously limit the role of EML in the future development of a web 
> service based network. The fact that URLs are at best awkward and at 
> worse not useable for expressing some types of connections is one 
> thing, but it is the lack of support for describing a stateful 
> connection that bothers me most. Many LTER sites, not just CAP, are 
> attempting to build internet applications that are metadata driven and 
> provide an interface (either direct or web service based) to data 
> stored in many different systems including SDE, SQL, ascii files and 
> various GIS and hyperspectral formats. While few of us intend to give 
> out the stateful connection information to end users directly, many of 
> us would like to see the development of server-side tools follow some 
> standards so that we might all better share software components. 
> Without a standard in EML for describing connection information in a 
> usable format, the result is adminstrators are force to still develop 
> local solutions and then figure out how to relate them to EML. I'd 
> hate to see EML perceived as useful for enabling outside institutions 
> to build applications around site data but not very useful for sites 
> in building their own applications.
>
> 5) the recent traffic on reusable content partly underscores, I think, 
> our failure to adequately separate storage and management of metadata 
> from its presentation during this design process. The former benefits 
> from a high degree of granularity and normalization, the latter 
> benefits from just the opposite (assuming size of the eml document is 
> not an issue). The references element is a device to introduce some 
> normalization capability within EML to better serve management of 
> information at the expense of some convenience in reading it. Its not 
> likely to satisfy everyone since it doesnt allow addressing between 
> documents and this, as well as granularity, will be a perennial 
> problem when trying to use EML to serve as both a metadata management 
> format as well as a metadata presentation format. For those of us that 
> are dynamically building an EML document from a normalized source such 
> as a relational database or collections of independent xml fragments, 
> this is far less an issue: we can choose our own level of granularity 
> within our storage systems and frankly find it easier to write the 
> same information out twice rather than going through the hassle of 
> creating identifiers and remembering what they are during the entire 
> output process. Id hate to see the issue of references and granularity 
> hold up the design process given that (in my opinion) they aren't 
> really necessary at all in order to define EML content (with the one 
> excepton of key definitions in eml-constraint which i dont like).
>
> 6) Finally, and most significantly, the response from the workshop 
> indicated that how we have organized project and protocol are at odds 
> with most participants. the problem seems to stem from the fact that 
> most sites view projects as something that exists at a different level 
> from a dataset. while most agreed philosopically that there has to be 
> a discreet intellectual activity to produce a dataset, few make any 
> formal recognition of this activity. Instead, most see certain 
> components (data collection methods, sampling, qaqc, etc) as direct 
> properties of the dataset. what is recognized as a project seems to be 
> defined more by administrative or research criteria that often are on 
> a higher plane than an individual dataset. A few acknowledged that 
> they could live with using the immediate project element to record 
> these more dataset-specific items and include a link to a higher-order 
> project description, but this was hard to visualize at the time 
> because that link was missing in beta 9.
>
>  There were also similar problems with protocol. As Tim Bergsma put it 
> (in better words that I did), we are trying to use one module to carry 
> both prescriptive and descriptive information. In deciding to make 
> protocol a resource-level element, we have really made the choice to 
> use it as a prescriptive information tool - that is, a way of 
> describing standardized protocols independent of any particular data 
> collection instance. i even recall at least one person saying that 
> their personal interpretation of "protocol" was  As such, the 
> informatin is only peripherally useful for describing the actual 
> methods used to produce a specific dataset.
>
> There was also some dissatisfaction with the organization of protocol. 
> many objected to the idea of binding QAQC descriptions to specific 
> methodStep descriptions. Again, there was no philosophical argument 
> that quality control measures by definition impose control over 
> actions, nevertheless it does not agree with how most organize this 
> information. Instead, QAQC descriptions are typically stored 
> indeptendent of the the descriptions of the methodology and cannot be 
> easily linked in this way.  Finally, as we've seen in recent email 
> traffic, there are frustrations with the perenial gray area of 
> blending pure content markup (XML) with formating markup (for 
> predominantly textual content).
>
> If I were to suggest changes to Beta9 to best address these responses, 
> they might go something like this. I would change eml-project to be 
> predominantly a research project description including stafffing, 
> funding, publications, and links to higher level projects.. I would 
> also leave eml-protocol as a resource module, but make it 
> predominately text based and prescriptive, used only when a prodedure 
> has been formally worked out and used by many datasets. I would make a 
> new module called methods, which i would use in every place that we 
> now use protocol. methods would contain a repeatable methodStep 
> element, which in turn would include references to source datasets 
> (type eml-dataset), software (type eml-software), instrumentation, and 
> any QAQC procedures that can be logically related to those steps. 
> Methods would also include optional links to eml-literature and 
> eml-protocol as references to formally published or cataloged 
> prodedures.  I would create a new module researchContext in which i 
> would include the methodological descriptors that directly qualify 
> this dataset like site description, sampling, and the above methods 
> module. Finally, for QAQC information that arent described under 
> methodology but are directly related to specific attributes in the 
> data, I would suggest using the data-quality module (in its current 
> incarnations as attributeAccuracy, horizonalAccuracy and 
> verticalAccuracy) should be used as the mechanism for describing both 
> data quality and the various control/assurance procedures used to 
> arrive at that quality.
>
> These of course are pretty extreme changes for a beta 9. I think that 
> they probably describe something much more inline with metadata that 
> the current LTER network is producing, but we have to weigh that 
> against time-honored software development procedures which are 
> designed to prevent knee-jerk changes like this so late in the game! 
> With both momentum from the LTER network workshops and a desire to get 
> going on the new ITR(s), we need EML 2.0 out the door soon or we will 
> begin to lose our focus but we also need it to work.
>
> So what's the best course of action? an irc meeting? conference call? 
> wait for a few days to see who responds to this email?
>
>
> Peter McCartney (peter.mccartney at asu.edu)
> Center for Environmental Studies
> Arizona State University
> 480-965-6791
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20020819/eee07d4f/attachment.htm