[seek-kr-sms] Re: [SEEK-Taxon] Question about EML

Shawn Bowers bowers at sdsc.edu
Tue Mar 9 09:08:52 PST 2004


Matt, Robert, Jessie, and Nico,

Thanks for all the feedback on this question.  I really appreciate it 
and have a better understanding of the taxon concept approach.

I don't have questions at this time, but may in the future :)

Thanks a lot!

Shawn

Kennedy, Jessie wrote:
> Hi Matt/Shawn
> 
> I think Matt gave a very good summary of the situation - enough detail to
> get the point over wihtout missing anything vital to the understanding of
> the problem. I didn't read anything that needs "correcting" - only as Matt
> says we could expand more into related issues like type specimens and
> exceptions to the rules but I don't think that is useful just now.
> 
>>From Shawn's point of view - I guess the thing to note is that when we've
> agreed on the model of concepts to use then the relevant bit of EML will be
> changed. Eventually the EML section will allow ecologists to register
> concepts (for our purposes at the moment - a full scientific name e.g. Aus
> bus L. with the reference to the publication the concept was described in).
> It is possible that we will need to develop tools to help the ecologist mark
> up their data with concepts. So part of the XML Schema will be incorporated
> into EML.
> Also, I believe that at the last Taxon group meeting we agreed to ignore
> common names for the time being. So an ecologist would only give the
> scientific name and publication (as a mechanism to identify the organism
> they recorded in the field) - they wouldn't give the full hierarchy from
> Kingdom down to species (which is really classificatory information or a
> means to access names/concepts which would be held inthe SEEK DB).
> 
> If you have any specific questions, maybe I can help while Matt is busy....
> 
> Jessie
> 
> 
> 
>>-----Original Message-----
>>From: Matt Jones [mailto:jones at nceas.ucsb.edu]
>>Sent: 06 March 2004 03:18
>>To: Shawn Bowers
>>Cc: seek-taxon at ecoinformatics.org; seek-kr-sms at ecoinformatics.org
>>Subject: Re: [SEEK-Taxon] Question about EML
>>
>>
>>Hi Shawn,
>>
>>I am not a taxonomist, and I have only a superficial view of these 
>>issues, but I think I can help clarify with a highly simplistic 
>>exposition of the issues.  So here's a simplified version of 
>>the issue 
>>and how concept info helps to resolve the problems. I'm going 
>>to mostly 
>>ignore the idea of type specimens, even though it is actually 
>>central to 
>>the discussion. It'll still be a tome, even though 
>>simplistic.  Others 
>>on seek-taxon can point out my mistakes :) and hopefully clarify the 
>>utility of the approach.
>>
>>--- Taxonomy ---
>>Taxonomists collect specimens from the field and use them to classify 
>>clusters of organisms into groups at various levels (aka Ranks) in a 
>>hierarchy (e.g., at the Species rank).  These groups are generally 
>>defined by a description of the characters from the specimens 
>>that can 
>>be used to distinguish the groups from each other.  The suite of 
>>characters used to distinguish one group from another need not match 
>>(ie, overall_length might distinguish species A from species B but 
>>number_of_Hairs might distinguish species B from species C).  So, the 
>>taxonomists have a "concept" of the group in mind when they write the 
>>description of the species in a manuscript -- the description is the 
>>manifestation of the taxonomic concept the person had in mind.  The 
>>taxonomist who first writes the description that defines a 
>>concept can 
>>be called the "concept author".  There are specific rules 
>>about how to 
>>create the scientific name when a taxonomist wants to create a new 
>>grouping (aka concept).  Upon first creation of a concept A 
>>with name a, 
>>the person creating the name a is usually called the name 
>>author and is 
>>credited with the 'discovery'. Taxonomists generally try to preserve 
>>this precendence, but don't worry too much about the concept author. 
>>You might see a species written as 'Acer rubrum L.'; the 'L.' 
>>stands for 
>>Linnaeus who is the name author (there are many abbreviations 
>>used). For 
>>the very first definition of a concept and name, the name 
>>author and the 
>>concept author are the same.  Later on they are not the same. 
>> So far so 
>>good.
>>
>>Taxonomists usually disagree about how to classify (e.g., what 
>>characters are important), and so they want to change the concepts as 
>>time progresses and new informaiton surfaces.  They generally try to 
>>distinguish the new concepts from some existing concepts by 
>>splitting or 
>>lumping various concepts using new descriptions based on the earlier 
>>descriptions.  The nomenclature rules for those groups say 
>>how the new 
>>concepts should be named.  Usually, if a concept (A) with name (a) is 
>>split into two new concepts (A' and A'') then one of those usually 
>>retains the original name (a) and the other gets a new name 
>>(a').    So 
>>now there are 3 concepts in existince (A, A', and A''), but only two 
>>names (a and a').  Thus, the name 'a' actually can be used to 
>>refer to 
>>two distinct concepts (A and A') with distinct definitions.  The name 
>>author for 'a' is still the same, but the concept author is different 
>>for A and A'.
>>
>>So the current situation is that one name can refer to many concepts, 
>>AND that one concept can have many names.  Quite ambiguous.  
>>There are 
>>millions of species concepts in most views of things, and 
>>many of them 
>>have been revised multiple times over a several hundred year 
>>history of 
>>classification.  Also, the species are organized into higher 
>>level ranks 
>>(e.g., genera), and these have the same name/concept issues as the 
>>lowest level ranks).  Egad.
>>
>>--- Biology using taxonomy ---
>>Biologists use scientific names to identify organisms in the 
>>field and 
>>elsewhere.  When they collect data, they use a field guide or 
>>otherwise 
>>learn to identify species according to the descriptions of 
>>the species, 
>>usually provided in a field guide or other authority.  Thus, 
>>if you know 
>>the name that the biologist used to identify an organism and the 
>>reference that contains the description of the concept that that name 
>>refers to, you have a good idea of exactly what concept the biologist 
>>thought the organism was.   Unfortunately, most biologists do 
>>not write 
>>down the authoritative reference that they were using to identify 
>>species, instead providing the name only in their data sets (and 
>>sometimes they provide the name author, especially for 
>>plants).  Thus, a 
>>biologist who references name 'a' in a dataset in 1950 might be 
>>referring to a different taxonomic concept than another biologist who 
>>references name 'a' later, say in 2000.  Thus, if you were to do a 
>>retrospective analysis of the properties of 'a' over time without 
>>resolving the concepts that 'a' refers to, you'd be comparing 
>>apples and 
>>oranges (or, more likely, apples and MacIntosh apples).  In 
>>studies like 
>>biodiversity studies, this could result in inflation or deflation of 
>>changes in species abundance simply based on changes in the 
>>predominant 
>>view of species classifications.
>>
>>--- Taxon databases ---
>>Typical taxonomic databases today use the taxon name as a 
>>surrogate for 
>>the concept, and so are actually just lists of names.  The seek-taxon 
>>group (along with others in the taxonomic community) is 
>>proposing that a 
>>better approach is to explicitly model the distinction 
>>between taxonomic 
>>concepts and taxonomic names by specifiying a unique identifier for 
>>every concept ever created based on the reference in which it was 
>>described (yep, BIG task), and then associate the many names 
>>that have 
>>been used to refer to that concept.  They also propose that 
>>relationships among concepts can be mapped (e.g., that 
>>concept A defines 
>>a superset of concept A').  The relationships between concepts 
>>recognized now are: congruent, includes, included in, 
>>overlaps, excludes 
>>(see 
>>http://www.bgbm.org/BioDivInf/Projects/MoreTax/standard_liste_
>>en.htm for 
>>details).
>>
>>--- Using taxon concepts ---
>>OK, so to your example data set.  Your dataset has a series 
>>of taxonomic 
>>names at various ranks.  If you take the 'sci_name' column as 
>>representing the species rank, you have a dataset that 
>>identifies only a 
>>name, not a concept.  This alone does not unambiguously tell you what 
>>the biologist was referring to.  The metadata provides 
>>references to the 
>>authorities that they used for species identification, and so 
>>(theoretically at least :), you could find each name in those 
>>references 
>>and get an exact concept definition that the biologist meant.  Of 
>>course, this requires matching the name and reference in the 
>>metadata to 
>>a corresponding name /concept in the seek-taxon concept database.
>>
>>Once you've done this, you have more ability to reason about the 
>>relationships among data items in different data sets.  For 
>>example, if 
>>I want to search for data about name "a", a taxon concept resolution 
>>service might tell me that the name has been used to 
>>represent concepts 
>>A and A' at different times, and that searching for *all* 
>>names for both 
>>A and A' might be what the user wants (a type of query 
>>expansion).  Thus 
>>a better query can be defined semi-automatically.  Also, if a 
>>user wants 
>>to combine two datasets from different times that use 
>>taxonomic names, 
>>the concept database might be used to reason about the relationships 
>>between concepts used in the different data sources (e.g., 
>>that the name 
>>'a' used in dataset 1 refers to concept A but the name a used 
>>in dataset 
>>2 refers to A', and so the data MAY not be comparable.  I say MAY 
>>because this is an extremely subjective and subtle decision 
>>-- it really 
>>depends on what kinds of measurements the scientist is 
>>comparing and why 
>>they are comparing them.  So scientific judgement will be 
>>critical here. 
>>Sometimes we'll be able to tell a nice tidy relationship 
>>among concepts 
>>(for congruence, superset, subset) but other times it might 
>>be ambiguous 
>>(intersection, disjunction).  Finally, when we know only a taxon name 
>>and have no info about the concept, some contextual information about 
>>the data may allow us to assign a probabilistic estimate of the name 
>>representing a series of concepts.  For example, if the data 
>>contained 
>>name 'a' and was collected in 1900, and the only concept that 
>>used name 
>>a in 1900 was A, then we might assign a high probability that a 
>>represented A in that data.  Later on, we might see that a is used in 
>>data collected in 2004, and we might know that everyone has 
>>thought that 
>>A' and A'' are the right concepts to use so no data has referred to A 
>>since 1930, so there is a high probablility that 'a' references A'. 
>>Part of the seek-taxon work is to work on probabilistic approaches to 
>>making these judgements based on various corpora.
>>
>>To address your last question (what needs to be in a concept schema), 
>>you might want to review current concept schema that the seek-taxon 
>>group (and J. Kennedy in particular) has been developing (its in cvs).
>>
>>Hope this has helped.  It certainly took a while to write, 
>>even though I 
>>know its pretty sloppy in some places.  Unfortuantely, I'm in an NSF 
>>site review all next week so won't be able to respond to any 
>>questions, 
>>but I hope the rest of the seek-taxon group can follow up and 
>>correct my 
>>mistakes, and then I will try to during the week after next.
>>
>>Cheers,
>>Matt
>>
>>Shawn Bowers wrote:
>>
>>>Hi,
>>>
>>>I recently found this statement in the methods section of 
>>
>>an EML document:
>>
>>>      "Nomenclature for common names follow the 1987 edition of the
>>>    National Geographic Society's field guide, 'Bird's of North
>>>    America'. Species codes used are those of the American
>>>    Ornithologist's Union. The USFWS Checklist OF Vertebrates, 1991,
>>>    was used to quantify scientific names from the common names."
>>>
>>>Can anyone help me interpret what these two sentences mean, 
>>
>>and how I 
>>
>>>might use the Taxon-group work to "understand/resolve" the actual 
>>>species references in a dataset based on the above 
>>
>>sentence? Here is a 
>>
>>>snippet from the corresponding dataset with the only those 
>>
>>columns that 
>>
>>>refer to something "taxonomic". (Note that there are 
>>
>>actually 23 columns 
>>
>>>in the dataset and about 165 rows.)
>>>
>>>
>>>class  tax_order      family    sci_name        aoucode  commonname
>>>-----  ---------      ------    --------        -------  ----------
>>>aves   anseriformes   anatidae  aix sponsa      wodu     wood duck
>>>aves   apodiformes    apodidae  chaetura vauxi  vasw     vauxs wift
>>>aves   ciconiiformes  ardeidae  ardea alba      greg     great egret
>>>...
>>>
>>>I am particularly interested in understanding the 
>>
>>relationship between 
>>
>>>the concept XML schema and it's use for "registering" or 
>>
>>"mapping" this 
>>
>>>data set to information captured in the concept work.  For 
>>
>>example, if I 
>>
>>>want to search for datasets based on taxonomic concepts.
>>>
>>>(You have to be patient with me because I am clueless about these 
>>>issues), but it seems like the common name and aoucode represent 
>>>redundant information: the aoucode is some kind of convention for 
>>>representing the common name, and the 
>>
>>class/tax_order/family/sci_name 
>>
>>>uniquely identifies the common name? How would one align 
>>
>>this dataset 
>>
>>>with an instantiated taxon concept schema -- in particular, what 
>>>information would need to be available in the instantiated 
>>
>>concept schema?
>>
>>>Any help is greatly appreciated,
>>>
>>>Shawn
>>>
>>>
>>>
>>>
>>>
>>>_______________________________________________
>>>seek-taxon mailing list
>>>seek-taxon at ecoinformatics.org
>>>http://www.ecoinformatics.org/mailman/listinfo/seek-taxon
>>
>>
>>_______________________________________________
>>seek-taxon mailing list
>>seek-taxon at ecoinformatics.org
>>http://www.ecoinformatics.org/mailman/listinfo/seek-taxon
>>




More information about the Seek-kr-sms mailing list