[eml-dev] EML question

Mon Jun 30 11:42:32 PDT 2008

Hi Gail,

I recently revisited the taxonomy table upload feature in Morpho to see 
what it could do. It looks like it works best when you have all of one 
kind of type of rank to do at a time such as all species.  Each rank 
would be done separate data import (Kingdom, Phylum, Class, Order, 
Family, Genus, Species). So a previously imported table in Morpho with a 
list of species as one of the columns could then be pointed to and then 
the species column selected. Morpho will then do the EML for the 
Taxonomic Coverage for all of the species in the column. My impression 
is that the table import feature will work well if all you want to do is 
import a single rank at a time. This would still save a lot of typing 
but I do not think it would preserve any connection between the ranks 
that may be part of the table.

Callie

Gail Steinhart wrote:
> Thanks everyone, for your helpful suggestions.
>
> I think where the species lists are not too long, we can manage to get 
> them (and higher tax. levels) into the metadata, and will aim for 
> that. Where we have too many (just a phytoplankton data set, 
> probably), we'll decide how far down the taxonomic tree we can go in 
> terms of getting that info into the metadata, and upload a table that 
> lists all the species and the codes used in the data set. For phyto we 
> have a species list but I know of no easy way to get the entire 
> hierarchy above that from the list (does anyone?), since we don't have 
> time to look up/record this info for possibly 100's of species. 
> Lacking that I think we are shooting for a compromise between 
> including as much info as we can, and actually completing metadata 
> records.
>
> Best,
> Gail
>
> At 01:22 AM 6/28/2008, David Blankman wrote:
>> Hi Gail,
>>
>> I wanted to add an observation that may be covered in LTER EML best 
>> practice recommendations, but one that I think is worth noting. 
>> People rarely search at the same taxonomic level that represented by 
>> the data. Generally data is presented at the species level, but 
>> taxonomic searches are usually done higher up on the tree.
>>
>> The searching rank differs both for the taxonomic group and domain of 
>> the researcher.  For example, most "plant" people are rarely 
>> interested in distinctions above the level of Family. The same is 
>> probably not the case for people interested in insects or for 
>> invertebrates in general. People interested in mammals on the other 
>> hand, probably are interested in distinctions further down the tree.
>>
>> I don't know if there is any clear rule for what rank is most likely 
>> to be searched for any different taxonomic group, but there may be 
>> some general guidelines.
>>
>> I say all of this in order to recommend including documentation above 
>> the species level. Clearly the more of the taxonomic tree that is 
>> included the greater is the likelihood that your data will be found 
>> by someone's taxonomic search.
>>
>> Wade, as usual, has made life easier by providing a variety of 
>> different trees. If, on the otherhand, you want to provide a minimal 
>> set of taxonomic coverage, find out from the domain scientists what 
>> their search heuristics are.
>>
>> David Blankman
>> Director of Information Management, Israel LTER/Ma'arag
>> Mitrani Department of Desert Ecology
>> Jacob Blaustein Desert Research Institute
>> Ben Gurion University
>> Midreshet Ben Gurion, 84990 Israel
>> 972-54-685-9345 (cell)
>> 1-505-349-5680 (Skype)
>>
>>
>>
>> On Sat, Jun 28, 2008 at 12:49 AM, Wade Sheldon <sheldon at uga.edu 
>> <mailto:sheldon at uga.edu>> wrote:
>>
>>     Hi Gail,
>>
>>     I agree with Margaret's comments on the LTER EML best practice
>>     recommendations (I recall I wrote that section, as the only
>>     person populating taxonomicCoverage in LTER at that point). EML
>>     was still fairly untested in 2004 and the trade-offs between
>>     using the taxonomicCoverage tree and data tables were hard to
>>     anticipate (and perhaps still are).
>>
>>     If you are unclear on the difference between the "list" style and
>>     "tree" style of tag nesting (based on what Callie quoted), you
>>     can use our taxonomic database web application to generate
>>     species lists as EML 2.01 documents with either implementation
>>     at: http://gce-lter.marsci.uga.edu/public/app/all_species_lists.asp
>>
>>     At GCE LTER we use the "list" style (without tag nesting within
>>     common taxa) for our data sets regardless of the number of
>>     references, but that's easy for us because the taxonomicCoverage
>>     is automatically generated from our taxonomic database for all
>>     referenced species, so it's no more effort. We then include
>>     species codes in the primary data tables with codes defined in
>>     the attribute metadata. However in EML 2.01 we can't link the
>>     codes to the taxonomicCoverage nodes anyway, so there's no
>>     linkage to the taxonomic details. That may argue for the
>>     secondary table approach Margaret uses (where you can even define
>>     a foreign key relationship between entities using the EML
>>     "constraint" module).
>>
>>     As for whether to include higher level taxa or not, the key
>>     advantage as Margaret said is to support metadata searches. As
>>     for how many end-users search for data based on taxonomic terms,
>>     perhaps Matt et al. can answer based on Metacat search logs.
>>
>>     Regards,
>>
>>     Wade Sheldon
>>     GCE-LTER
>>
>>
>>     Margaret O'Brien wrote:
>>     > Hi Gail -
>>     > Adding to what Callie told you, I have seen several ways to include
>>     > taxonomic information in EML. By the way, the document that Callie
>>     > referenced is not really precise in it's recommendations,
>>     partly because
>>     > in 2004 there were not a large number of rich EML files to
>>     learn from.
>>     > It is in need of an update, and somewhat specific to LTER
>>     needs, but if
>>     > you are interested in seeing how one group uses EML, I can get
>>     a copy to
>>     > you.
>>     >
>>     > We often put taxonomic information in a data table as you have
>>     > suggested. This is the simplest method when the list is long, or is
>>     > already included in the table to be published. If a dataset is
>>     concerned
>>     > with only a few species, then we include a taxonomicCoverage
>>     tree with
>>     > all the ranks labled. The flexibility of EML means that you could
>>     > include any (or all) ranks, or just the unique binomial. The entire
>>     > binomial should be included as one string, according to the
>>     rules of
>>     > binomial nomenclature. So this form is recommended:
>>     > <taxonomicClassification>
>>     > <taxonRankName>genus</taxonRankName>
>>     > <taxonRankValue>Macrocystis</taxonRankValue>
>>     > <taxonomicClassification>
>>     > <taxonRankName>species</taxonRankName>
>>     > <taxonRankValue>Macrocystis pyrifera</taxonRankValue>
>>     > </taxonomicClassification>
>>     > </taxonomicClassification>
>>     >
>>     > but not
>>     > <taxonomicClassification>
>>     > <taxonRankName>genus</taxonRankName>
>>     > <taxonRankValue>Macrocystis</taxonRankValue>
>>     > <taxonomicClassification>
>>     > <taxonRankName>species</taxonRankName>
>>     > <taxonRankValue>pyrifera</taxonRankValue>
>>     > </taxonomicClassification>
>>     > </taxonomicClassification>
>>     >
>>     > Ideally, it would be great to get all the taxonomic info into the
>>     > metadata so that it can be effectively searched. This can be
>>     impractical
>>     > though, and if many taxa are included, the metadata can be quite
>>     > extensive. I have cc'd the EML development group with your
>>     question, in
>>     > case any others want to chime in. Please let this group know of
>>     your
>>     > experiences using EML -
>>     > Regards,
>>     > Margaret O'Brien
>>     >
>>     > ========================
>>     > Margaret O'Brien
>>     > Information Management
>>     > Santa Barbara Coastal LTER
>>     > Marine Science Institute
>>     > University of California
>>     > Santa Barbara, CA  93106-6150
>>     >
>>     > 805-893-2071
>>     > mob at icess.ucsb.edu <mailto:mob at icess.ucsb.edu>
>>     > http://sbc.lternet.edu <http://sbc.lternet.edu/>
>>     > ========================
>>     >
>>     >
>>     >
>>     > Callie Bowdish wrote:
>>     >> Hi Gail,
>>     >>
>>     >> Here is a section out of the LTER emlbestpractices_oct2004.doc. I
>>     >> think the phrase "organisms relevant to the study" and "broader
>>     >> taxonomic searches" are helpful things to keep in mind when making
>>     >> decisions on how much taxonomic information to include. It is also
>>     >> considered important to include the Classification System or
>>     authority
>>     >> that was used for naming when possible. Archived data is
>>     designed to
>>     >> last for a long time so the ability to find something that may not
>>     >> seem so important currently may in the future be valuable. It
>>     is also
>>     >> a good reason to put some thought into including the
>>     Classification
>>     >> System and choosing what taxon to include in the eml document.
>>     >>
>>     >> "<taxonomicCoverage> The <taxonomicCoverage> element (see
>>     Example 2.1)
>>     >> should be used to document taxonomic information for all organisms
>>     >> relevant to the study. Genus, species name binomial and common
>>     name
>>     >> should always be included, but higher level taxa should also be
>>     >> included whenever possible to support broader taxonomic searches.
>>     >> Blocks of <taxonomicClassification> elements should be
>>     hierarchically
>>     >> nested within a single <taxonomicCoverage> element as
>>     illustrated in
>>     >> Example 2.1 rather than repeated at the same level. The
>>     >> <generalTaxonomicCoverage> element should be included to
>>     describe the
>>     >> general procedure of how the taxonomy was determined (keys used,
>>     >> etc.), should include a general textual description of all
>>     flora/fauna
>>     >> in the study (scope), as well as how finely grained the
>>     taxonomy is
>>     >> broken down to âEUR" for example "family" or "genus and species."
>>     >>
>>     >> Note that elements within common <taxonRankName> entries can be
>>     >> combined in the hierarchy to create a taxonomic "tree" (not
>>     >> illustrated), but this practice may impede combining and re-using
>>     >> <taxonomicClassification> information from multiple documents
>>     and is
>>     >> not generally recommended for data set documentation."
>>     >>
>>     >> I have also cc'd Matt Jones at NCEAS and Margaret who is an LTER
>>     >> information manager to see if they have any comments or
>>     insight into
>>     >> your "best practice" question.
>>     >>
>>     >> Callie
>>     >>
>>     >>
>>     >> Gail Steinhart wrote:
>>     >>> Hi Callie,
>>     >>>
>>     >>> We're wondering if there is a "best practice" when it comes to
>>     >>> specifying taxonomic coverage in EML. We have some data sets
>>     where
>>     >>> there are a couple of dozen species (fish), and others where
>>     there
>>     >>> might be hundreds (phytoplankton). In most cases we have or
>>     can make
>>     >>> (without too much effort) a complete table of species and
>>     upload that
>>     >>> as a data table, but is that overkill? Would it be better to
>>     simply
>>     >>> specify a higher taxa - (phytoplankton rather than all of the
>>     >>> species)? Can you offer any advice on that?
>>     >>>
>>     >>> Thanks,
>>     >>> Gail
>>     >>>
>>     >>>
>>     >>>
>>     >>> Gail Steinhart
>>     >>> Research Data & Environmental Sciences Librarian
>>     >>> Albert R. Mann Library
>>     >>> Cornell University
>>     >>> Ithaca, NY 14853
>>     >>>
>>     >>> Phone: 607-255-7251
>>     >>> Fax: 607-255-0318
>>     >>> E-mail: GSS1 at cornell.edu <mailto:GSS1 at cornell.edu>
>>     >>>
>>     > _______________________________________________
>>     > Eml-dev mailing list
>>     > Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
>>     >
>>     http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>
>>
>>     --
>>     ______________________________________________________________________________
>>
>>     Wade M. Sheldon
>>     GCE-LTER Information Manager/SIMO Database Administrator
>>     School of Marine Programs
>>     University of Georgia
>>     Athens, GA 30602-3636
>>     Email: sheldon at uga.edu <mailto:sheldon at uga.edu>
>>     WWW:
>>     http://gce-lter.marsci.uga.edu/public/app/personnel_bios.asp?id=wsheldon
>>
>>
>>     _______________________________________________
>>     Eml-dev mailing list
>>     Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
>>     http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>
>>
>>
>>
>>
>> -- 
>> Nature is trying very hard to make us succeed, but nature does not 
>> depend on us. We are not the only experiment.
>> - R. Buckminster Fuller
>>
>> If I am not for myself, then who will be for me? If I am for myself 
>> alone, then who am I? If not now, when?
>> - Rabbi Hillel 
>
>
>
> Gail Steinhart
> Research Data & Environmental Sciences Librarian
> Albert R. Mann Library
> Cornell University
> Ithaca, NY 14853
>
> Phone: 607-255-7251
> Fax: 607-255-0318
> E-mail: GSS1 at cornell.edu
>
> ------------------------------------------------------------------------
>
> This body part will be downloaded on demand.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20080630/6704b694/attachment-0001.htm