[seek-kr-sms] Re: [SEEK-Taxon] Question about EML

Robert A. Morris ram at cs.umb.edu
Fri Mar 5 21:11:00 PST 2004


Nice exposition Matt!

There is a slight further complication in this particular dataset,which 
I take to be SA003 from the Andrews LTER, described at
http://www.fsl.orst.edu/lter/data/abstract.cfm?dbcode=SA003&topnav=97
I can only guess at this without the EML document (and maybe with it), 
but it seems like a good guess since the paragraph quoted by Shawn is in 
the "Field Methods" metadata of that URL, and the data snippets match 
SA003. If I'm wrong, press delete now :-)

The problem begins with the statement in the "Design Methods" metadata 
for SA003 that the list "represents the collective observations of many 
people over 20- plus years, but should not be viewed as either current 
or complete."

Problem part 2 (executive summary): in the data there are no dates 
assigned to the observations and in the metadata no meaning assigned to 
"collective observations".

Problem part 2(elaboration): SA003 was compiled in 1995, according to 
its metadata. Alas, records in SA003 do not carry any date)s) at which 
the observation(s) represented by the record was(were) made. Thus, for 
example, suppose the first record happens to represent a claim of a wood 
duck observed in 1977. If, in 1977 there were another authority for 
deriving scientific names from common names, and if that authority did 
not assign Aix sponsa as in SA003, then it might or might not be that 
the wood duck represents the same concept as had an observer contributed 
the datum in 1995. Also, even were there such an authority and it gave 
Aix sponsa, it is possible that a taxonomic revision caused the concept 
of Aix sponsa to change between 1977 and 1991. Without a date on the 
origin of the primary key (here the common name) it is quite difficult 
to compare this claim of the occurence of a wood duck with another such 
claim in another data set. Probably the only hope is the SEEK 
probabilistic approach, but I wonder how SEEK would in this case 
represent the complete lack of knowledge of when the observation was 
made. For example, I doubt that it is a good idea to assume that within 
the "20-plus years", all time intervals of the same size are 
equiprobable. [But wait, Bayes Rule might actually save the duck fat 
here, as it usually does. Maybe this part of the question is outside the 
scope of this list and somebody from SEEK could just point me at the 
probabilistic model? ]

This is a generic problem with checklists which are in this way---and I 
suppose many other ways---different from specimen records.


BTW, Aix sponsa (L.) seems to have no synonyms according to ITIS
so Aix sponsa's concept seems not have changed between whenever the 
critter was observed and 1991.  But what about the mapping between 
common and scientific name? I'm told that only for birds are there 
widely accepted authorities for assigning common names to scientific 
names. For other groups, this mapping is the problem that dare not speak 
its name.

BTW.2. Systematists might say: If it walks like a wood duck, and quacks 
like a wood duck, then it's a wood duck.


-- Bob Morris

Matt Jones wrote:

> Hi Shawn,
> 
> [Matt's excellent exposition omitted] 
> Shawn Bowers wrote:
> 
>>
>> Hi,
>>
>> I recently found this statement in the methods section of an EML 
>> document:
>>
>>       "Nomenclature for common names follow the 1987 edition of the
>>     National Geographic Society's field guide, 'Bird's of North
>>     America'. Species codes used are those of the American
>>     Ornithologist's Union. The USFWS Checklist OF Vertebrates, 1991,
>>     was used to quantify scientific names from the common names."
>>
>> Can anyone help me interpret what these two sentences mean, and how I 
>> might use the Taxon-group work to "understand/resolve" the actual 
>> species references in a dataset based on the above sentence? Here is a 
>> snippet from the corresponding dataset with the only those columns 
>> that refer to something "taxonomic". (Note that there are actually 23 
>> columns in the dataset and about 165 rows.)
>>
>>
>> class  tax_order      family    sci_name        aoucode  commonname
>> -----  ---------      ------    --------        -------  ----------
>> aves   anseriformes   anatidae  aix sponsa      wodu     wood duck
>> aves   apodiformes    apodidae  chaetura vauxi  vasw     vauxs wift
>> aves   ciconiiformes  ardeidae  ardea alba      greg     great egret
>> ...
>>
>> I am particularly interested in understanding the relationship between 
>> the concept XML schema and it's use for "registering" or "mapping" 
>> this data set to information captured in the concept work.  For 
>> example, if I want to search for datasets based on taxonomic concepts.
>>
>> (You have to be patient with me because I am clueless about these 
>> issues), but it seems like the common name and aoucode represent 
>> redundant information: the aoucode is some kind of convention for 
>> representing the common name, and the class/tax_order/family/sci_name 
>> uniquely identifies the common name? How would one align this dataset 
>> with an instantiated taxon concept schema -- in particular, what 
>> information would need to be available in the instantiated concept 
>> schema?
>>
>> Any help is greatly appreciated,
>>
>> Shawn
>>
>>
>>
>>
>>
>> _______________________________________________
>> seek-taxon mailing list
>> seek-taxon at ecoinformatics.org
>> http://www.ecoinformatics.org/mailman/listinfo/seek-taxon
> 
> 
> 
> _______________________________________________
> seek-kr-sms mailing list
> seek-kr-sms at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/seek-kr-sms

-- 
Robert A. Morris
Professor of Computer Science
UMASS-Boston
http://www.cs.umb.edu/~ram
phone (+1)617 287 6466



More information about the Seek-taxon mailing list