[seek-kr-sms] Re: [SEEK-Taxon] Question about EML
Robert A. Morris
ram at cs.umb.edu
Fri Mar 5 21:11:00 PST 2004
Nice exposition Matt!
There is a slight further complication in this particular dataset,which
I take to be SA003 from the Andrews LTER, described at
http://www.fsl.orst.edu/lter/data/abstract.cfm?dbcode=SA003&topnav=97
I can only guess at this without the EML document (and maybe with it),
but it seems like a good guess since the paragraph quoted by Shawn is in
the "Field Methods" metadata of that URL, and the data snippets match
SA003. If I'm wrong, press delete now :-)
The problem begins with the statement in the "Design Methods" metadata
for SA003 that the list "represents the collective observations of many
people over 20- plus years, but should not be viewed as either current
or complete."
Problem part 2 (executive summary): in the data there are no dates
assigned to the observations and in the metadata no meaning assigned to
"collective observations".
Problem part 2(elaboration): SA003 was compiled in 1995, according to
its metadata. Alas, records in SA003 do not carry any date)s) at which
the observation(s) represented by the record was(were) made. Thus, for
example, suppose the first record happens to represent a claim of a wood
duck observed in 1977. If, in 1977 there were another authority for
deriving scientific names from common names, and if that authority did
not assign Aix sponsa as in SA003, then it might or might not be that
the wood duck represents the same concept as had an observer contributed
the datum in 1995. Also, even were there such an authority and it gave
Aix sponsa, it is possible that a taxonomic revision caused the concept
of Aix sponsa to change between 1977 and 1991. Without a date on the
origin of the primary key (here the common name) it is quite difficult
to compare this claim of the occurence of a wood duck with another such
claim in another data set. Probably the only hope is the SEEK
probabilistic approach, but I wonder how SEEK would in this case
represent the complete lack of knowledge of when the observation was
made. For example, I doubt that it is a good idea to assume that within
the "20-plus years", all time intervals of the same size are
equiprobable. [But wait, Bayes Rule might actually save the duck fat
here, as it usually does. Maybe this part of the question is outside the
scope of this list and somebody from SEEK could just point me at the
probabilistic model? ]
This is a generic problem with checklists which are in this way---and I
suppose many other ways---different from specimen records.
BTW, Aix sponsa (L.) seems to have no synonyms according to ITIS
so Aix sponsa's concept seems not have changed between whenever the
critter was observed and 1991. But what about the mapping between
common and scientific name? I'm told that only for birds are there
widely accepted authorities for assigning common names to scientific
names. For other groups, this mapping is the problem that dare not speak
its name.
BTW.2. Systematists might say: If it walks like a wood duck, and quacks
like a wood duck, then it's a wood duck.
-- Bob Morris
Matt Jones wrote:
> Hi Shawn,
>
> [Matt's excellent exposition omitted]
> Shawn Bowers wrote:
>
>>
>> Hi,
>>
>> I recently found this statement in the methods section of an EML
>> document:
>>
>> "Nomenclature for common names follow the 1987 edition of the
>> National Geographic Society's field guide, 'Bird's of North
>> America'. Species codes used are those of the American
>> Ornithologist's Union. The USFWS Checklist OF Vertebrates, 1991,
>> was used to quantify scientific names from the common names."
>>
>> Can anyone help me interpret what these two sentences mean, and how I
>> might use the Taxon-group work to "understand/resolve" the actual
>> species references in a dataset based on the above sentence? Here is a
>> snippet from the corresponding dataset with the only those columns
>> that refer to something "taxonomic". (Note that there are actually 23
>> columns in the dataset and about 165 rows.)
>>
>>
>> class tax_order family sci_name aoucode commonname
>> ----- --------- ------ -------- ------- ----------
>> aves anseriformes anatidae aix sponsa wodu wood duck
>> aves apodiformes apodidae chaetura vauxi vasw vauxs wift
>> aves ciconiiformes ardeidae ardea alba greg great egret
>> ...
>>
>> I am particularly interested in understanding the relationship between
>> the concept XML schema and it's use for "registering" or "mapping"
>> this data set to information captured in the concept work. For
>> example, if I want to search for datasets based on taxonomic concepts.
>>
>> (You have to be patient with me because I am clueless about these
>> issues), but it seems like the common name and aoucode represent
>> redundant information: the aoucode is some kind of convention for
>> representing the common name, and the class/tax_order/family/sci_name
>> uniquely identifies the common name? How would one align this dataset
>> with an instantiated taxon concept schema -- in particular, what
>> information would need to be available in the instantiated concept
>> schema?
>>
>> Any help is greatly appreciated,
>>
>> Shawn
>>
>>
>>
>>
>>
>> _______________________________________________
>> seek-taxon mailing list
>> seek-taxon at ecoinformatics.org
>> http://www.ecoinformatics.org/mailman/listinfo/seek-taxon
>
>
>
> _______________________________________________
> seek-kr-sms mailing list
> seek-kr-sms at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/seek-kr-sms
--
Robert A. Morris
Professor of Computer Science
UMASS-Boston
http://www.cs.umb.edu/~ram
phone (+1)617 287 6466
More information about the Seek-kr-sms
mailing list