[seek-kr-sms] taxonomists teaching computers what ecologists should know

Joseph Goguen goguen at cs.ucsd.edu
Wed Apr 21 21:21:24 PDT 2004

Hello!  A very interesting post!  Id like to suggest that
people on this list might want to read *Sorting Things Out*
by Bowker and Star (MIT 1999), which is a deep sociological
study of categories and classification.  Many of their case
studies have a biological flavor, including the International
Classification of Diseases, the Nursing Intervention Classification
and the color classification system used in apartheid South Africa.

I am afraid that this kind of work makes Sowa's approach look
rather naive.  It also raises some interesting ethical issues.



> Delivered-To: seek-kr-sms at ecoinformatics.org
> X-Sender: franz at hyperion.nceas.ucsb.edu (Unverified)
> From: "Nico M. Franz" <franz at nceas.ucsb.edu>
> Cc: bowers at sdsc.edu, thau at learningsite.com
> X-NCEAS-MailScanner-Information: Please contact the ISP for more information
> X-NCEAS-MailScanner: Found to be clean
> X-BeenThere: seek-kr-sms at ecoinformatics.org
> X-Mailman-Version: 2.0.13
> Precedence: bulk
> List-Help: <mailto:seek-kr-sms-request at ecoinformatics.org?subject=help>
> List-Post: <mailto:seek-kr-sms at ecoinformatics.org>
> List-Subscribe: <http://www.ecoinformatics.org/mailman/listinfo/seek-kr-sms>,
> 	<mailto:seek-kr-sms-request at ecoinformatics.org?subject=subscribe>
> List-Id: <seek-kr-sms.ecoinformatics.org>
> List-Unsubscribe: <http://www.ecoinformatics.org/mailman/listinfo/seek-kr-sms>,
> 	<mailto:seek-kr-sms-request at ecoinformatics.org?subject=unsubscribe>
> List-Archive: <http://www.ecoinformatics.org/pipermail/seek-kr-sms/>
> Date: Fri, 16 Apr 2004 11:15:15 -0700
> X-Spam-Flag: Spam NO
> X-Scanned-By: milter-spamc/0.15.245 (fast.ucsd.edu []); pass=YES; Fri, 16 Apr 2004 11:17:13 -0700
> X-Scanned-By: milter-spamc/0.15.245 (gradlab.ucsd.edu []); pass=YES; Fri, 16 Apr 2004 11:17:10 -0700
> X-Spam-Status: NO, hits=-4.80 required=5.00
> X-Spam-Level: Level 
>  > sent this out yesterday, but now I subscribed to KR-SMS. You might get 
> the original e-mail once it's gone through approval (unless our moderator 
> wisely rejects it). Nico
> Hi there:
>     I'm going off on a bit of a tangent here; this is for anyone who is 
> involved in the SEEK KR & SMS working groups. As you may know the Taxon WG 
> is moving towards the so-called "concept approach". Having realized that 
> the same taxonomic name can mean different things through time (in terms of 
> organisms included) and different names can mean the same thing, some in 
> our field have proposed to treat --names as they occur in a particular 
> "circumscription reference"-- as separate entries in a taxonomic database. 
> These "concept" entries can then be related to each other as "congruent", 
> "overlapping", "including", "excluding", etc.
>     A question that was largely left unanswered at the beginning of this 
> development is: "what's a circumscription reference?" Does it suffice to 
> provide a list of properties ("diagnosis"), or a list of constituents 
> (species in a genus)? We're doing a bit of both at this point, depending of 
> what databases have to offer.
>     But, in a forward-looking sense, the "concept approach" really has 
> something entirely different and much more ambitious to offer. Taxonomic 
> databases offer the possibility to establish and update connections among 
> taxonomic views that are currently represented in physically separate and 
> "uneditable" (historical) print publications. In a database, one can easily 
> realign these views "next to each other" and e.g. establish connections 
> among disparate views in old and new works. Right now that would hardly 
> merit going through the tedious print publication process. One can also 
> establish connections among the "primary literature" and field guides that 
> ecologists would use.
>     Furthermore, by "atomizing" the use of the same name through time into 
> the individual instances in which the name was attached to a particular 
> "circumscription reference", there's a way to circumscribe the differences 
> and similarities among meanings as they've been applied to the same name in 
> a more explicit way; a way that "non-experts" may be able to understand. We 
> (the taxonomists) may be able to fill in the implicit assumptions necessary 
> to understand the historical literature. We could use the "concept 
> approach" to annotate our interpretations of historical uses (meanings) of 
> names, and also be more explicit when we publish new "concepts". To make 
> explicit how much these new "concepts" agree and differ from previous ones, 
> is something that could become a standard task of descriptive taxonomy.
>     I guess I'm partly thinking about this because you (SEEK KR & SMS) are 
> involved in "teaching" the meaning of (ecology-relevant) concepts to 
> computers. That has to do with weighing "same" against "different", and 
> make a judgement, right? I'm guessing that ontologies make this problem an 
> issue that gets sorted out from the start, but maybe there are also 
> probabilistic approaches.
>     I'm interested in understanding how much taxonomists could exploit the 
> "concept approach" to encode in databases similarities and differences in 
> meaning that names by themselves can't convey. For example, "concepts" that 
> carry no "significant" difference in meaning shouldn't in my view have the 
> same independent "status" in a database as "concepts" that stand for unique 
> meanings. Are there retrospective rules to detect "concept sameness" and 
> "concept difference" that taxonomists can apply, encode in the database, 
> and thus help ecologists understand the meanings of names in the historical 
> taxonomic literature?
>     Of course there's the alternative strategy to try to teach computers to 
> "reason" among names without any additional annotation from experts. We'll 
> probably need and have a bit of both.
>     I actually have a question! Dan Higgins kindly lent me a book "Sowa: 
> Knowledge Representation". Any comments on that reference? If you happen to 
> know of any other useful introductory references in this area (meaning of 
> words, differences among humans and computers, difficulties in making 
> implicit human meanings explicit and encode them in computer language, 
> etc), I'd be most grateful for any suggestions. Are there topics that 
> you've worked with that bear strong resemblances to the issues we face in 
> biological taxonomy? I'm always scared to spent too much time reinvent the 
> wheel.
>     What I want to work towards is a document of "best practice", i.e. how 
> a taxonomist should interpret and enter "concepts" in the historical 
> taxonomic literature in such a way that the similarities and differences 
> among them are largely "understood" by the computer. At the end of this top 
> down approach, ecologists will get more from the taxonomic database than 
> they would from the literature. The semantic capabilities of the electronic 
> medium (not just its storage functions) would be used.
>     "Worst practice" (suboptimal, to be polite) would be simply to enter 
> every printed work "as is". This lifts the burden of understanding and 
> interpreting the newly digitized "concept" information from taxonomists, 
> and also makes them a bit useless.
>     I should add that even though there are many (many) taxonomic databases 
> out there already on the internet, only very few (i.e. a list of German 
> mosses) rigorously employ the "concept approach." I'd say over 95% deal 
> with names, at least when it comes to synonymy and parent/child relations. 
> Almost the entire print literature deals with names in that sense. There's 
> a rough-and-ready way to achieve the name-to-concept transfer process (just 
> hang an "according to" onto every name), but as I said, that would make 
> suboptimal use of taxonomist's expertise and fail to encode meaning 
> differences through time so that computers can "know" them. In my view 
> we're very much at the beginning of this development.
>     Hope this wasn't a waste of your time!
> Cheers,
> Nico
> Nico M. Franz
> National Center for Ecological Analysis and Synthesis
> 735 State Street, Suite 300
> Santa Barbara, CA 93101
> Phone: (805) 966-1677; Fax: (805) 892-2510; E-mail: franz at nceas.ucsb.edu
> Website: http://www.cals.cornell.edu/dept/entomology/wheeler/Franz/Nico.html  

More information about the Seek-kr-sms mailing list