[seek-kr-sms] algorithms and the owlfication of taxon

Wed Oct 26 11:03:43 PDT 2005

Hi all:

I realize we've had this exachange before in some form and it's mostly 
about playing catch-up on both sides. It's fun (for me too) to think 
about representing taxonomy in an ontology format and think about the 
potential services and benefits of such a move. I'll try to state my 
current perspective in a way that might be helpful.

For the Taxon group, the main challenge is actually not the 
representation of any single classification with all its components 
("taxa"), their names and properties, their subcomponents, and 
interrelationships (parent/child, etc.). We would gain next to nothing 
from having a single-classification representation function in isolation.

We're also not immediately charged with the task of merging parts of or 
entire (multiple) classifications.

Serguei told me yesterday that one of the main benefits of ontology 
representation is "checking for internal (logical) consistency". 
Discovery and correction of errors, etc. That is most certainly not what 
Taxon is trying to do. We know that any given taxonomy is highly 
idiosyncratic, implicit, and assumptive of a vaguely specified 
background history involving select competent speakers. A single 
taxonomic classification might not only turn out to be false in terms of 
not representing the relationships or properties (composition) of taxa 
correctly (as subsequent and more refined studies are bound to show). 
The classifications are probably also highly inconsistent internally in 
your sense of the work "inconsistent". Meaning, they will mention some 
specimens but not all that went into the definition of a species, they 
will mention some species but not all that go into the definition of a 
genus, things will be left out here and there, partially contradict each 
other, and so on.

The issue here is that we are not charged immediately with improving 
this state of affairs, i.e. helping taxonomist be better, more 
transparent taxonomists from here on. Taxon has no "normative ambitions" 
in terms of telling scientists how to produce classifications using a 
more complete and formal approach (description logic rules, etc.).

So what is Taxon's mandate? Basically, we're charged with building a 
language and supporting infrastructure that will allow users and (in a 
second phase) machines to make more sense of the semantic similarities 
and differences between the components of multiple existing taxonomic 
classification - to a higher degree of precision than can be achieved 
using name strings and conventional taxonomic synonymy relationships 
alone. We're trying to build tools for taxonomic experts to do "brain 
dumps" on what they know about the classificatory history of their 
groups of expertise but wouldn't be able to express clearly and 
comprehensively without our assistance.

To that end, we need to be able to import fairly decent representations 
of at least two hierarchical classifications into a graphic interface. 
For our purposes those representation do not have to be any more 
ontology-complying than the original classifications (which largely 
weren't I would think).

Then in a second step, we need to provide taxonomic experts with a more 
powerful language than "is a synonym of" in order to assess the semantic 
similarities and differences of elements ("taxa") defined in the two 
classifications. That language will use terms like "is congruent with", 
"excludes", "is less inclusive (taxonomically) than", etc. Those 
assessments require the assessor to be intimately familiar with the 
written and unwritten idiosyncracies of the two classifications. We're 
talking about people here whose lifetime work was exactly that - 
learning the taxonomic history of a specific group as captured in the 
literature and museum collections. And different experts may still come 
up with different judgments when confronted with the same two 
classifications.

Then once we have those more semantically informative assessments of 
interrelationship, we can reap benefits by constructing more powerful 
searches on biological data, and make more informed choices and when to 
integrate the information associated with taxonomic names, or when to 
keep it separate. At that stage we would benefit from being very 
explicit and consistent about how we handle searches and data 
integration steps.

Just for fun I've attached a blurp from Rich Pyle about a particular bit 
of taxonomic history concerning a group of fishes. Let me know if this 
was helpful.

Cheers,

Nico

**********
Here's a case in fishes that might meet your needs (family Sparidae):

Pagellus calamus Valenciennes in Cuvier & Valenciennes 1830 was 
described on the basis of four syntypes (MNHN 5565, 5566, A-8101 & 
8664). As of 1966, apparently two of these (5566 & 8664) had been lost 
or destroyed, so only two remained.

Calamus pennatula Guichenot 1868 was apparently based on the same series 
of type specimens as P. calamus -- which means that one would at first 
assume pennatula to be an objective (homotypic) synonym of calamus.

However...

Randall & Caldwell (1966) examined the two existing syntypes of P. 
calamus (MNHN 5565 & A-8101), and discovered that they represented two 
different species. They selected one of them (A-8101) as the lectotype 
of P. calamus, and selected the other (MNHN 5565) as the lectotype of of 
C. pennatula, thereby preserving both names.

But there's more:

Swainson (1839:171) described the genus-group name Callamus (as a 
subgenus of Chrysophrys Quoy & Gaimard 1824), the type species of which 
is Calamus megacephalus Swainson 1839:222 (by monotypy). However 
according to Jordan & Gilbert (1884:18) and Randall & Caldwell 
(1966:36), Swainson used the species epithet "megacephalus" only because 
it was customary at the time to create new species epithets to avoid 
tautonyms, and his "megacephalus" is treated as a junior synonym of 
Pagellus calamus Valenciennes in Cuvier & Valenciennes 1830.

So...here is a case of one series of syntypes, with two different names 
based on that same series of syntypes, and two different species 
represented among that same series. One of those species is the defacto 
type species of a genus (although I doubt that anyone would ever split 
the two species into separate genera).

And as if that's not enough....

Randall & Caldwell also describe a similar situation for Pegallus penna 
Valenciennes in Cuvier & Valenciennes 1830:209. Among its three existing 
syntypes, two are what are now considered to be Calamus penna (one of 
which Randall & Caldwell designated as the lectotype), and the third is 
identified as C. pennatula.
**********

Serguei Krivov wrote:

> There are many ways to represent biological taxonomies in OWL. The 
> main problem here is how to avoid a second order style logic i.e. 
> assigning properties to classes rather then specifying properties of 
> objects by defining classes. There is temptation to use owl as meta- 
> language of taxonomy rather then as the language of taxonomy (which it 
> is intended to be), or say it metaphorically writing OWL interpreter 
> for OWL.
>
> I believe this could be easily avoided. Here is how I would represent 
> the part of taxonomies from Dave’s design document:
>
> Each instance of class species would have attributes hasKingdom, 
> hasPhylum, etc. One could also add hasAuthority, hasReference etc. And 
> so we describe species exactly as humans do. Now the question is how 
> to say that all Anthropoda are Animals and all Chordata are Animals. 
> It is easy in OWL if we use subsumption axioms on anonymous classes:
>
> this states that anonymous class hasKingdom:Animals (property value 
> restriction) is subclass of anonymous class hasPhylum:Anthropoda. Now 
> when subsumption relation is established one could use owl reasoner to 
> check consistency
>
> ciao,
>
> serguei
>
> --------------------------------------------------------------------------------------
>
> Serguei Krivov, Assist. Research Professor,
>
> Computer Science Dept. & Gund Inst. for Ecological Economics,
>
> University of Vermont; 590 Main St. Burlington VT 05405
>
> phone: (802)-656-2978
>
> -----Original Message-----
> From: dave thau [mailto:thau at learningsite.com]
> Sent: Wednesday, October 26, 2005 11:22 AM
> To: Serguei.Krivov at uvm.edu; bertram
> Subject: algorithms and the owlfication of taxon
>
> Hello,
>
> Attached are two documents you may find interesting. The first was the
>
> first assignment in my algorithms class. The puzzle I described yesterday
>
> is part II.
>
> Second, when I first started working on SEEK, I tried to pitch OWL as the
>
> most appropriate representation for the Taxon stuff, but didn't get too
>
> far. I did a little work doing a couple of representations, and a
>
> graduate student of Susan Gauch went further in documenting options. This
>
> dates from about 3 years ago, and we were all just learning OWL DL, so it
>
> may be poorly informed. But it'll give you a notion of the thinking at
>
> the time.
>
> Dave
>