[seek-kr-sms] taxonomists teaching computers what ecologists should know
Nico M. Franz
franz at nceas.ucsb.edu
Fri Apr 16 11:15:15 PDT 2004
> sent this out yesterday, but now I subscribed to KR-SMS. You might get
the original e-mail once it's gone through approval (unless our moderator
wisely rejects it). Nico
Hi there:
I'm going off on a bit of a tangent here; this is for anyone who is
involved in the SEEK KR & SMS working groups. As you may know the Taxon WG
is moving towards the so-called "concept approach". Having realized that
the same taxonomic name can mean different things through time (in terms of
organisms included) and different names can mean the same thing, some in
our field have proposed to treat --names as they occur in a particular
"circumscription reference"-- as separate entries in a taxonomic database.
These "concept" entries can then be related to each other as "congruent",
"overlapping", "including", "excluding", etc.
A question that was largely left unanswered at the beginning of this
development is: "what's a circumscription reference?" Does it suffice to
provide a list of properties ("diagnosis"), or a list of constituents
(species in a genus)? We're doing a bit of both at this point, depending of
what databases have to offer.
But, in a forward-looking sense, the "concept approach" really has
something entirely different and much more ambitious to offer. Taxonomic
databases offer the possibility to establish and update connections among
taxonomic views that are currently represented in physically separate and
"uneditable" (historical) print publications. In a database, one can easily
realign these views "next to each other" and e.g. establish connections
among disparate views in old and new works. Right now that would hardly
merit going through the tedious print publication process. One can also
establish connections among the "primary literature" and field guides that
ecologists would use.
Furthermore, by "atomizing" the use of the same name through time into
the individual instances in which the name was attached to a particular
"circumscription reference", there's a way to circumscribe the differences
and similarities among meanings as they've been applied to the same name in
a more explicit way; a way that "non-experts" may be able to understand. We
(the taxonomists) may be able to fill in the implicit assumptions necessary
to understand the historical literature. We could use the "concept
approach" to annotate our interpretations of historical uses (meanings) of
names, and also be more explicit when we publish new "concepts". To make
explicit how much these new "concepts" agree and differ from previous ones,
is something that could become a standard task of descriptive taxonomy.
I guess I'm partly thinking about this because you (SEEK KR & SMS) are
involved in "teaching" the meaning of (ecology-relevant) concepts to
computers. That has to do with weighing "same" against "different", and
make a judgement, right? I'm guessing that ontologies make this problem an
issue that gets sorted out from the start, but maybe there are also
probabilistic approaches.
I'm interested in understanding how much taxonomists could exploit the
"concept approach" to encode in databases similarities and differences in
meaning that names by themselves can't convey. For example, "concepts" that
carry no "significant" difference in meaning shouldn't in my view have the
same independent "status" in a database as "concepts" that stand for unique
meanings. Are there retrospective rules to detect "concept sameness" and
"concept difference" that taxonomists can apply, encode in the database,
and thus help ecologists understand the meanings of names in the historical
taxonomic literature?
Of course there's the alternative strategy to try to teach computers to
"reason" among names without any additional annotation from experts. We'll
probably need and have a bit of both.
I actually have a question! Dan Higgins kindly lent me a book "Sowa:
Knowledge Representation". Any comments on that reference? If you happen to
know of any other useful introductory references in this area (meaning of
words, differences among humans and computers, difficulties in making
implicit human meanings explicit and encode them in computer language,
etc), I'd be most grateful for any suggestions. Are there topics that
you've worked with that bear strong resemblances to the issues we face in
biological taxonomy? I'm always scared to spent too much time reinvent the
wheel.
What I want to work towards is a document of "best practice", i.e. how
a taxonomist should interpret and enter "concepts" in the historical
taxonomic literature in such a way that the similarities and differences
among them are largely "understood" by the computer. At the end of this top
down approach, ecologists will get more from the taxonomic database than
they would from the literature. The semantic capabilities of the electronic
medium (not just its storage functions) would be used.
"Worst practice" (suboptimal, to be polite) would be simply to enter
every printed work "as is". This lifts the burden of understanding and
interpreting the newly digitized "concept" information from taxonomists,
and also makes them a bit useless.
I should add that even though there are many (many) taxonomic databases
out there already on the internet, only very few (i.e. a list of German
mosses) rigorously employ the "concept approach." I'd say over 95% deal
with names, at least when it comes to synonymy and parent/child relations.
Almost the entire print literature deals with names in that sense. There's
a rough-and-ready way to achieve the name-to-concept transfer process (just
hang an "according to" onto every name), but as I said, that would make
suboptimal use of taxonomist's expertise and fail to encode meaning
differences through time so that computers can "know" them. In my view
we're very much at the beginning of this development.
Hope this wasn't a waste of your time!
Cheers,
Nico
Nico M. Franz
National Center for Ecological Analysis and Synthesis
735 State Street, Suite 300
Santa Barbara, CA 93101
Phone: (805) 966-1677; Fax: (805) 892-2510; E-mail: franz at nceas.ucsb.edu
Website: http://www.cals.cornell.edu/dept/entomology/wheeler/Franz/Nico.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-kr-sms/attachments/20040416/9ba71e0f/attachment.htm
More information about the Seek-kr-sms
mailing list