provisional GCE-LTER eml available, comments appreciated

Fri Oct 17 20:58:46 PDT 2003

David (and the eml-dev crew),

I finished shaking out a few minor issues with my scripts today, and I'm now able to dynamically generate valid "discovery plus"-level eml from our metadata database for all data sets in the GCE catalog (http://gce-lter.marsci.uga.edu/lter/asp/db/data_catalog.asp -- click on any data set and look for the 'Provisional GCE Metadata in EML 2.0 format' link in the Metadata section). I followed the level 2.5 recommendations in the best practices doc-in-progress pretty closely, and included both our data set accession #, major, and minor version in the "packageid" attribute. I'd appreciate any comments and advice on improving the implementation, but please ignore the redundant <deliveryPoint> entry for creators (I just need to tweak a database view and that will be fixed).

Adding the eml links on our data set detail pages isn't too useful for end-users at this point -- it was just a convenient way to let others access and play around with our provisional eml docs and will help increase visibility of this effort at our site. Let me know how you want to proceed with getting our eml into the metacat at LNO. Generating eml live from our metadata database seems to be working very well, so I'll probably continue along that track (but file caching could obviously be implemented if that would be easier).

FYI, a few specific issues I ran into are:

1) Some method description text in our database includes high-bit ASCII codes (degrees, microns, etc) which were throwing validation errors with a UTF-8 encoding directive; I changed the encoding to ISO-8859-1 and that fixed the validation problem, but is that acceptable as a general practice or is there a way to escape individual characters without converting the whole doc to UTF-16 (which would require a different programming approach)?

2) I included dataset/method/sampling to incorporate sampling design info we store, but the required studyExtent tag isn't an exact semantic match to our sampling design field. I will probably have to revise the way we store this info in the future to comply with eml.

3) We store a lot of granular information about instrumentation (nested under method steps) which I wanted to include, but it wasn't clear how to best format these in instrumentation elements. For now I kludged it by stringing together the various fields, delimiting sections with parenthesis and semicolons.

4) I plan to include taxonomicCoverage info as well (our data sets are linked to our taxonomic database), but I need to study the eml implementation a bit more. Pointers to specific examples would be helpful.

5) In order to make our eml roll-out more manageable, I am considering providing an eml-optimized comma-delimited format as an additional static or dynamically-generated data format option for all tabular data sets. I'm not sure if I can accurately (or more importantly, usefully) describe some of our customizable data formats in eml (e.g. MATLAB arrays and matrices), so focusing on one eml-friendly format will simplify the problem space and speed up implementation of entity and attribute-level metadata. (Whether we continue to provide non-eml-described custom data sets down the road depends on where this whole thing goes in LTER).

Regards,

Wade Sheldon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20031017/270303ea/attachment.htm