[seek-kr-sms] taxonomists teaching computers what ecologists should know

Nico M. Franz franz at nceas.ucsb.edu
Fri Apr 16 11:15:15 PDT 2004

 > sent this out yesterday, but now I subscribed to KR-SMS. You might get 
the original e-mail once it's gone through approval (unless our moderator 
wisely rejects it). Nico

Hi there:

    I'm going off on a bit of a tangent here; this is for anyone who is 
involved in the SEEK KR & SMS working groups. As you may know the Taxon WG 
is moving towards the so-called "concept approach". Having realized that 
the same taxonomic name can mean different things through time (in terms of 
organisms included) and different names can mean the same thing, some in 
our field have proposed to treat --names as they occur in a particular 
"circumscription reference"-- as separate entries in a taxonomic database. 
These "concept" entries can then be related to each other as "congruent", 
"overlapping", "including", "excluding", etc.

    A question that was largely left unanswered at the beginning of this 
development is: "what's a circumscription reference?" Does it suffice to 
provide a list of properties ("diagnosis"), or a list of constituents 
(species in a genus)? We're doing a bit of both at this point, depending of 
what databases have to offer.

    But, in a forward-looking sense, the "concept approach" really has 
something entirely different and much more ambitious to offer. Taxonomic 
databases offer the possibility to establish and update connections among 
taxonomic views that are currently represented in physically separate and 
"uneditable" (historical) print publications. In a database, one can easily 
realign these views "next to each other" and e.g. establish connections 
among disparate views in old and new works. Right now that would hardly 
merit going through the tedious print publication process. One can also 
establish connections among the "primary literature" and field guides that 
ecologists would use.

    Furthermore, by "atomizing" the use of the same name through time into 
the individual instances in which the name was attached to a particular 
"circumscription reference", there's a way to circumscribe the differences 
and similarities among meanings as they've been applied to the same name in 
a more explicit way; a way that "non-experts" may be able to understand. We 
(the taxonomists) may be able to fill in the implicit assumptions necessary 
to understand the historical literature. We could use the "concept 
approach" to annotate our interpretations of historical uses (meanings) of 
names, and also be more explicit when we publish new "concepts". To make 
explicit how much these new "concepts" agree and differ from previous ones, 
is something that could become a standard task of descriptive taxonomy.

    I guess I'm partly thinking about this because you (SEEK KR & SMS) are 
involved in "teaching" the meaning of (ecology-relevant) concepts to 
computers. That has to do with weighing "same" against "different", and 
make a judgement, right? I'm guessing that ontologies make this problem an 
issue that gets sorted out from the start, but maybe there are also 
probabilistic approaches.

    I'm interested in understanding how much taxonomists could exploit the 
"concept approach" to encode in databases similarities and differences in 
meaning that names by themselves can't convey. For example, "concepts" that 
carry no "significant" difference in meaning shouldn't in my view have the 
same independent "status" in a database as "concepts" that stand for unique 
meanings. Are there retrospective rules to detect "concept sameness" and 
"concept difference" that taxonomists can apply, encode in the database, 
and thus help ecologists understand the meanings of names in the historical 
taxonomic literature?

    Of course there's the alternative strategy to try to teach computers to 
"reason" among names without any additional annotation from experts. We'll 
probably need and have a bit of both.

    I actually have a question! Dan Higgins kindly lent me a book "Sowa: 
Knowledge Representation". Any comments on that reference? If you happen to 
know of any other useful introductory references in this area (meaning of 
words, differences among humans and computers, difficulties in making 
implicit human meanings explicit and encode them in computer language, 
etc), I'd be most grateful for any suggestions. Are there topics that 
you've worked with that bear strong resemblances to the issues we face in 
biological taxonomy? I'm always scared to spent too much time reinvent the 

    What I want to work towards is a document of "best practice", i.e. how 
a taxonomist should interpret and enter "concepts" in the historical 
taxonomic literature in such a way that the similarities and differences 
among them are largely "understood" by the computer. At the end of this top 
down approach, ecologists will get more from the taxonomic database than 
they would from the literature. The semantic capabilities of the electronic 
medium (not just its storage functions) would be used.

    "Worst practice" (suboptimal, to be polite) would be simply to enter 
every printed work "as is". This lifts the burden of understanding and 
interpreting the newly digitized "concept" information from taxonomists, 
and also makes them a bit useless.

    I should add that even though there are many (many) taxonomic databases 
out there already on the internet, only very few (i.e. a list of German 
mosses) rigorously employ the "concept approach." I'd say over 95% deal 
with names, at least when it comes to synonymy and parent/child relations. 
Almost the entire print literature deals with names in that sense. There's 
a rough-and-ready way to achieve the name-to-concept transfer process (just 
hang an "according to" onto every name), but as I said, that would make 
suboptimal use of taxonomist's expertise and fail to encode meaning 
differences through time so that computers can "know" them. In my view 
we're very much at the beginning of this development.

    Hope this wasn't a waste of your time!



Nico M. Franz
National Center for Ecological Analysis and Synthesis
735 State Street, Suite 300
Santa Barbara, CA 93101

Phone: (805) 966-1677; Fax: (805) 892-2510; E-mail: franz at nceas.ucsb.edu
Website: http://www.cals.cornell.edu/dept/entomology/wheeler/Franz/Nico.html  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-kr-sms/attachments/20040416/9ba71e0f/attachment.htm

More information about the Seek-kr-sms mailing list