[seek-kr-sms] taxonomists teaching computers what ecologists should know
Joseph Goguen
goguen at cs.ucsd.edu
Wed Apr 21 21:21:24 PDT 2004
Hello! A very interesting post! Id like to suggest that
people on this list might want to read *Sorting Things Out*
by Bowker and Star (MIT 1999), which is a deep sociological
study of categories and classification. Many of their case
studies have a biological flavor, including the International
Classification of Diseases, the Nursing Intervention Classification
and the color classification system used in apartheid South Africa.
I am afraid that this kind of work makes Sowa's approach look
rather naive. It also raises some interesting ethical issues.
Cheers,
joseph
> Delivered-To: seek-kr-sms at ecoinformatics.org
> X-Sender: franz at hyperion.nceas.ucsb.edu (Unverified)
> From: "Nico M. Franz" <franz at nceas.ucsb.edu>
> Cc: bowers at sdsc.edu, thau at learningsite.com
> X-NCEAS-MailScanner-Information: Please contact the ISP for more information
> X-NCEAS-MailScanner: Found to be clean
> X-BeenThere: seek-kr-sms at ecoinformatics.org
> X-Mailman-Version: 2.0.13
> Precedence: bulk
> List-Help: <mailto:seek-kr-sms-request at ecoinformatics.org?subject=help>
> List-Post: <mailto:seek-kr-sms at ecoinformatics.org>
> List-Subscribe: <http://www.ecoinformatics.org/mailman/listinfo/seek-kr-sms>,
> <mailto:seek-kr-sms-request at ecoinformatics.org?subject=subscribe>
> List-Id: <seek-kr-sms.ecoinformatics.org>
> List-Unsubscribe: <http://www.ecoinformatics.org/mailman/listinfo/seek-kr-sms>,
> <mailto:seek-kr-sms-request at ecoinformatics.org?subject=unsubscribe>
> List-Archive: <http://www.ecoinformatics.org/pipermail/seek-kr-sms/>
> Date: Fri, 16 Apr 2004 11:15:15 -0700
> X-Spam-Flag: Spam NO
> X-Scanned-By: milter-spamc/0.15.245 (fast.ucsd.edu [132.239.15.4]); pass=YES; Fri, 16 Apr 2004 11:17:13 -0700
> X-Scanned-By: milter-spamc/0.15.245 (gradlab.ucsd.edu [132.239.55.107]); pass=YES; Fri, 16 Apr 2004 11:17:10 -0700
> X-Spam-Status: NO, hits=-4.80 required=5.00
> X-Spam-Level: Level
>
> > sent this out yesterday, but now I subscribed to KR-SMS. You might get
> the original e-mail once it's gone through approval (unless our moderator
> wisely rejects it). Nico
>
> Hi there:
>
> I'm going off on a bit of a tangent here; this is for anyone who is
> involved in the SEEK KR & SMS working groups. As you may know the Taxon WG
> is moving towards the so-called "concept approach". Having realized that
> the same taxonomic name can mean different things through time (in terms of
> organisms included) and different names can mean the same thing, some in
> our field have proposed to treat --names as they occur in a particular
> "circumscription reference"-- as separate entries in a taxonomic database.
> These "concept" entries can then be related to each other as "congruent",
> "overlapping", "including", "excluding", etc.
>
> A question that was largely left unanswered at the beginning of this
> development is: "what's a circumscription reference?" Does it suffice to
> provide a list of properties ("diagnosis"), or a list of constituents
> (species in a genus)? We're doing a bit of both at this point, depending of
> what databases have to offer.
>
> But, in a forward-looking sense, the "concept approach" really has
> something entirely different and much more ambitious to offer. Taxonomic
> databases offer the possibility to establish and update connections among
> taxonomic views that are currently represented in physically separate and
> "uneditable" (historical) print publications. In a database, one can easily
> realign these views "next to each other" and e.g. establish connections
> among disparate views in old and new works. Right now that would hardly
> merit going through the tedious print publication process. One can also
> establish connections among the "primary literature" and field guides that
> ecologists would use.
>
> Furthermore, by "atomizing" the use of the same name through time into
> the individual instances in which the name was attached to a particular
> "circumscription reference", there's a way to circumscribe the differences
> and similarities among meanings as they've been applied to the same name in
> a more explicit way; a way that "non-experts" may be able to understand. We
> (the taxonomists) may be able to fill in the implicit assumptions necessary
> to understand the historical literature. We could use the "concept
> approach" to annotate our interpretations of historical uses (meanings) of
> names, and also be more explicit when we publish new "concepts". To make
> explicit how much these new "concepts" agree and differ from previous ones,
> is something that could become a standard task of descriptive taxonomy.
>
> I guess I'm partly thinking about this because you (SEEK KR & SMS) are
> involved in "teaching" the meaning of (ecology-relevant) concepts to
> computers. That has to do with weighing "same" against "different", and
> make a judgement, right? I'm guessing that ontologies make this problem an
> issue that gets sorted out from the start, but maybe there are also
> probabilistic approaches.
>
> I'm interested in understanding how much taxonomists could exploit the
> "concept approach" to encode in databases similarities and differences in
> meaning that names by themselves can't convey. For example, "concepts" that
> carry no "significant" difference in meaning shouldn't in my view have the
> same independent "status" in a database as "concepts" that stand for unique
> meanings. Are there retrospective rules to detect "concept sameness" and
> "concept difference" that taxonomists can apply, encode in the database,
> and thus help ecologists understand the meanings of names in the historical
> taxonomic literature?
>
> Of course there's the alternative strategy to try to teach computers to
> "reason" among names without any additional annotation from experts. We'll
> probably need and have a bit of both.
>
> I actually have a question! Dan Higgins kindly lent me a book "Sowa:
> Knowledge Representation". Any comments on that reference? If you happen to
> know of any other useful introductory references in this area (meaning of
> words, differences among humans and computers, difficulties in making
> implicit human meanings explicit and encode them in computer language,
> etc), I'd be most grateful for any suggestions. Are there topics that
> you've worked with that bear strong resemblances to the issues we face in
> biological taxonomy? I'm always scared to spent too much time reinvent the
> wheel.
>
> What I want to work towards is a document of "best practice", i.e. how
> a taxonomist should interpret and enter "concepts" in the historical
> taxonomic literature in such a way that the similarities and differences
> among them are largely "understood" by the computer. At the end of this top
> down approach, ecologists will get more from the taxonomic database than
> they would from the literature. The semantic capabilities of the electronic
> medium (not just its storage functions) would be used.
>
> "Worst practice" (suboptimal, to be polite) would be simply to enter
> every printed work "as is". This lifts the burden of understanding and
> interpreting the newly digitized "concept" information from taxonomists,
> and also makes them a bit useless.
>
> I should add that even though there are many (many) taxonomic databases
> out there already on the internet, only very few (i.e. a list of German
> mosses) rigorously employ the "concept approach." I'd say over 95% deal
> with names, at least when it comes to synonymy and parent/child relations.
> Almost the entire print literature deals with names in that sense. There's
> a rough-and-ready way to achieve the name-to-concept transfer process (just
> hang an "according to" onto every name), but as I said, that would make
> suboptimal use of taxonomist's expertise and fail to encode meaning
> differences through time so that computers can "know" them. In my view
> we're very much at the beginning of this development.
>
> Hope this wasn't a waste of your time!
>
> Cheers,
>
> Nico
>
>
>
> Nico M. Franz
> National Center for Ecological Analysis and Synthesis
> 735 State Street, Suite 300
> Santa Barbara, CA 93101
>
> Phone: (805) 966-1677; Fax: (805) 892-2510; E-mail: franz at nceas.ucsb.edu
> Website: http://www.cals.cornell.edu/dept/entomology/wheeler/Franz/Nico.html
>
More information about the Seek-kr-sms
mailing list