[eml-dev] EML question

Fri Jun 27 14:49:59 PDT 2008

Hi Gail,

I agree with Margaret's comments on the LTER EML best practice recommendations (I recall I wrote that section, as the only person populating taxonomicCoverage in LTER at that point). EML was still fairly untested in 2004 and the trade-offs between using the taxonomicCoverage tree and data tables were hard to anticipate (and perhaps still are).

If you are unclear on the difference between the "list" style and "tree" style of tag nesting (based on what Callie quoted), you can use our taxonomic database web application to generate species lists as EML 2.01 documents with either implementation at: http://gce-lter.marsci.uga.edu/public/app/all_species_lists.asp

At GCE LTER we use the "list" style (without tag nesting within common taxa) for our data sets regardless of the number of references, but that's easy for us because the taxonomicCoverage is automatically generated from our taxonomic database for all referenced species, so it's no more effort. We then include species codes in the primary data tables with codes defined in the attribute metadata. However in EML 2.01 we can't link the codes to the taxonomicCoverage nodes anyway, so there's no linkage to the taxonomic details. That may argue for the secondary table approach Margaret uses (where you can even define a foreign key relationship between entities using the EML "constraint" module).

As for whether to include higher level taxa or not, the key advantage as Margaret said is to support metadata searches. As for how many end-users search for data based on taxonomic terms, perhaps Matt et al. can answer based on Metacat search logs.

Regards,

Wade Sheldon
GCE-LTER

Margaret O'Brien wrote:
> Hi Gail -
> Adding to what Callie told you, I have seen several ways to include 
> taxonomic information in EML. By the way, the document that Callie 
> referenced is not really precise in it's recommendations, partly because 
> in 2004 there were not a large number of rich EML files to learn from. 
> It is in need of an update, and somewhat specific to LTER needs, but if 
> you are interested in seeing how one group uses EML, I can get a copy to 
> you.
> 
> We often put taxonomic information in a data table as you have 
> suggested. This is the simplest method when the list is long, or is 
> already included in the table to be published. If a dataset is concerned 
> with only a few species, then we include a taxonomicCoverage tree with 
> all the ranks labled. The flexibility of EML means that you could 
> include any (or all) ranks, or just the unique binomial. The entire 
> binomial should be included as one string, according to the rules of 
> binomial nomenclature. So this form is recommended:
> <taxonomicClassification>
> <taxonRankName>genus</taxonRankName>
> <taxonRankValue>Macrocystis</taxonRankValue>
> <taxonomicClassification>
> <taxonRankName>species</taxonRankName>
> <taxonRankValue>Macrocystis pyrifera</taxonRankValue>
> </taxonomicClassification>
> </taxonomicClassification>
> 
> but not
> <taxonomicClassification>
> <taxonRankName>genus</taxonRankName>
> <taxonRankValue>Macrocystis</taxonRankValue>
> <taxonomicClassification>
> <taxonRankName>species</taxonRankName>
> <taxonRankValue>pyrifera</taxonRankValue>
> </taxonomicClassification>
> </taxonomicClassification>
> 
> Ideally, it would be great to get all the taxonomic info into the 
> metadata so that it can be effectively searched. This can be impractical 
> though, and if many taxa are included, the metadata can be quite 
> extensive. I have cc'd the EML development group with your question, in 
> case any others want to chime in. Please let this group know of your 
> experiences using EML -
> Regards,
> Margaret O'Brien
> 
> ========================
> Margaret O'Brien
> Information Management
> Santa Barbara Coastal LTER 
> Marine Science Institute
> University of California
> Santa Barbara, CA  93106-6150
> 
> 805-893-2071
> mob at icess.ucsb.edu
> http://sbc.lternet.edu
> ========================
> 
> 
> 
> Callie Bowdish wrote:
>> Hi Gail,
>>
>> Here is a section out of the LTER emlbestpractices_oct2004.doc. I 
>> think the phrase "organisms relevant to the study" and "broader 
>> taxonomic searches" are helpful things to keep in mind when making 
>> decisions on how much taxonomic information to include. It is also 
>> considered important to include the Classification System or authority 
>> that was used for naming when possible. Archived data is designed to 
>> last for a long time so the ability to find something that may not 
>> seem so important currently may in the future be valuable. It is also 
>> a good reason to put some thought into including the Classification 
>> System and choosing what taxon to include in the eml document.
>>
>> "<taxonomicCoverage> The <taxonomicCoverage> element (see Example 2.1) 
>> should be used to document taxonomic information for all organisms 
>> relevant to the study. Genus, species name binomial and common name 
>> should always be included, but higher level taxa should also be 
>> included whenever possible to support broader taxonomic searches. 
>> Blocks of <taxonomicClassification> elements should be hierarchically 
>> nested within a single <taxonomicCoverage> element as illustrated in 
>> Example 2.1 rather than repeated at the same level. The 
>> <generalTaxonomicCoverage> element should be included to describe the 
>> general procedure of how the taxonomy was determined (keys used, 
>> etc.), should include a general textual description of all flora/fauna 
>> in the study (scope), as well as how finely grained the taxonomy is 
>> broken down to – for example “family” or “genus and species.”
>>
>> Note that elements within common <taxonRankName> entries can be 
>> combined in the hierarchy to create a taxonomic “tree” (not 
>> illustrated), but this practice may impede combining and re-using 
>> <taxonomicClassification> information from multiple documents and is 
>> not generally recommended for data set documentation."
>>
>> I have also cc'd Matt Jones at NCEAS and Margaret who is an LTER 
>> information manager to see if they have any comments or insight into 
>> your "best practice" question.
>>
>> Callie
>>
>>
>> Gail Steinhart wrote:
>>> Hi Callie,
>>>
>>> We're wondering if there is a "best practice" when it comes to 
>>> specifying taxonomic coverage in EML. We have some data sets where 
>>> there are a couple of dozen species (fish), and others where there 
>>> might be hundreds (phytoplankton). In most cases we have or can make 
>>> (without too much effort) a complete table of species and upload that 
>>> as a data table, but is that overkill? Would it be better to simply 
>>> specify a higher taxa - (phytoplankton rather than all of the 
>>> species)? Can you offer any advice on that?
>>>
>>> Thanks,
>>> Gail
>>>
>>>
>>>
>>> Gail Steinhart
>>> Research Data & Environmental Sciences Librarian
>>> Albert R. Mann Library
>>> Cornell University
>>> Ithaca, NY 14853
>>>
>>> Phone: 607-255-7251
>>> Fax: 607-255-0318
>>> E-mail: GSS1 at cornell.edu
>>>
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev

-- 
______________________________________________________________________________

Wade M. Sheldon
GCE-LTER Information Manager/SIMO Database Administrator
School of Marine Programs
University of Georgia
Athens, GA 30602-3636
Email: sheldon at uga.edu
WWW: http://gce-lter.marsci.uga.edu/public/app/personnel_bios.asp?id=wsheldon