[seek-kr-sms] algorithms and the owlfication of taxon
Nico Franz
franz at nceas.ucsb.edu
Wed Oct 26 11:03:43 PDT 2005
Hi all:
I realize we've had this exachange before in some form and it's mostly
about playing catch-up on both sides. It's fun (for me too) to think
about representing taxonomy in an ontology format and think about the
potential services and benefits of such a move. I'll try to state my
current perspective in a way that might be helpful.
For the Taxon group, the main challenge is actually not the
representation of any single classification with all its components
("taxa"), their names and properties, their subcomponents, and
interrelationships (parent/child, etc.). We would gain next to nothing
from having a single-classification representation function in isolation.
We're also not immediately charged with the task of merging parts of or
entire (multiple) classifications.
Serguei told me yesterday that one of the main benefits of ontology
representation is "checking for internal (logical) consistency".
Discovery and correction of errors, etc. That is most certainly not what
Taxon is trying to do. We know that any given taxonomy is highly
idiosyncratic, implicit, and assumptive of a vaguely specified
background history involving select competent speakers. A single
taxonomic classification might not only turn out to be false in terms of
not representing the relationships or properties (composition) of taxa
correctly (as subsequent and more refined studies are bound to show).
The classifications are probably also highly inconsistent internally in
your sense of the work "inconsistent". Meaning, they will mention some
specimens but not all that went into the definition of a species, they
will mention some species but not all that go into the definition of a
genus, things will be left out here and there, partially contradict each
other, and so on.
The issue here is that we are not charged immediately with improving
this state of affairs, i.e. helping taxonomist be better, more
transparent taxonomists from here on. Taxon has no "normative ambitions"
in terms of telling scientists how to produce classifications using a
more complete and formal approach (description logic rules, etc.).
So what is Taxon's mandate? Basically, we're charged with building a
language and supporting infrastructure that will allow users and (in a
second phase) machines to make more sense of the semantic similarities
and differences between the components of multiple existing taxonomic
classification - to a higher degree of precision than can be achieved
using name strings and conventional taxonomic synonymy relationships
alone. We're trying to build tools for taxonomic experts to do "brain
dumps" on what they know about the classificatory history of their
groups of expertise but wouldn't be able to express clearly and
comprehensively without our assistance.
To that end, we need to be able to import fairly decent representations
of at least two hierarchical classifications into a graphic interface.
For our purposes those representation do not have to be any more
ontology-complying than the original classifications (which largely
weren't I would think).
Then in a second step, we need to provide taxonomic experts with a more
powerful language than "is a synonym of" in order to assess the semantic
similarities and differences of elements ("taxa") defined in the two
classifications. That language will use terms like "is congruent with",
"excludes", "is less inclusive (taxonomically) than", etc. Those
assessments require the assessor to be intimately familiar with the
written and unwritten idiosyncracies of the two classifications. We're
talking about people here whose lifetime work was exactly that -
learning the taxonomic history of a specific group as captured in the
literature and museum collections. And different experts may still come
up with different judgments when confronted with the same two
classifications.
Then once we have those more semantically informative assessments of
interrelationship, we can reap benefits by constructing more powerful
searches on biological data, and make more informed choices and when to
integrate the information associated with taxonomic names, or when to
keep it separate. At that stage we would benefit from being very
explicit and consistent about how we handle searches and data
integration steps.
Just for fun I've attached a blurp from Rich Pyle about a particular bit
of taxonomic history concerning a group of fishes. Let me know if this
was helpful.
Cheers,
Nico
**********
Here's a case in fishes that might meet your needs (family Sparidae):
Pagellus calamus Valenciennes in Cuvier & Valenciennes 1830 was
described on the basis of four syntypes (MNHN 5565, 5566, A-8101 &
8664). As of 1966, apparently two of these (5566 & 8664) had been lost
or destroyed, so only two remained.
Calamus pennatula Guichenot 1868 was apparently based on the same series
of type specimens as P. calamus -- which means that one would at first
assume pennatula to be an objective (homotypic) synonym of calamus.
However...
Randall & Caldwell (1966) examined the two existing syntypes of P.
calamus (MNHN 5565 & A-8101), and discovered that they represented two
different species. They selected one of them (A-8101) as the lectotype
of P. calamus, and selected the other (MNHN 5565) as the lectotype of of
C. pennatula, thereby preserving both names.
But there's more:
Swainson (1839:171) described the genus-group name Callamus (as a
subgenus of Chrysophrys Quoy & Gaimard 1824), the type species of which
is Calamus megacephalus Swainson 1839:222 (by monotypy). However
according to Jordan & Gilbert (1884:18) and Randall & Caldwell
(1966:36), Swainson used the species epithet "megacephalus" only because
it was customary at the time to create new species epithets to avoid
tautonyms, and his "megacephalus" is treated as a junior synonym of
Pagellus calamus Valenciennes in Cuvier & Valenciennes 1830.
So...here is a case of one series of syntypes, with two different names
based on that same series of syntypes, and two different species
represented among that same series. One of those species is the defacto
type species of a genus (although I doubt that anyone would ever split
the two species into separate genera).
And as if that's not enough....
Randall & Caldwell also describe a similar situation for Pegallus penna
Valenciennes in Cuvier & Valenciennes 1830:209. Among its three existing
syntypes, two are what are now considered to be Calamus penna (one of
which Randall & Caldwell designated as the lectotype), and the third is
identified as C. pennatula.
**********
Serguei Krivov wrote:
> There are many ways to represent biological taxonomies in OWL. The
> main problem here is how to avoid a second order style logic i.e.
> assigning properties to classes rather then specifying properties of
> objects by defining classes. There is temptation to use owl as meta-
> language of taxonomy rather then as the language of taxonomy (which it
> is intended to be), or say it metaphorically writing OWL interpreter
> for OWL.
>
> I believe this could be easily avoided. Here is how I would represent
> the part of taxonomies from Dave’s design document:
>
> Each instance of class species would have attributes hasKingdom,
> hasPhylum, etc. One could also add hasAuthority, hasReference etc. And
> so we describe species exactly as humans do. Now the question is how
> to say that all Anthropoda are Animals and all Chordata are Animals.
> It is easy in OWL if we use subsumption axioms on anonymous classes:
>
> this states that anonymous class hasKingdom:Animals (property value
> restriction) is subclass of anonymous class hasPhylum:Anthropoda. Now
> when subsumption relation is established one could use owl reasoner to
> check consistency
>
> ciao,
>
> serguei
>
> --------------------------------------------------------------------------------------
>
> Serguei Krivov, Assist. Research Professor,
>
> Computer Science Dept. & Gund Inst. for Ecological Economics,
>
> University of Vermont; 590 Main St. Burlington VT 05405
>
> phone: (802)-656-2978
>
> -----Original Message-----
> From: dave thau [mailto:thau at learningsite.com]
> Sent: Wednesday, October 26, 2005 11:22 AM
> To: Serguei.Krivov at uvm.edu; bertram
> Subject: algorithms and the owlfication of taxon
>
> Hello,
>
> Attached are two documents you may find interesting. The first was the
>
> first assignment in my algorithms class. The puzzle I described yesterday
>
> is part II.
>
> Second, when I first started working on SEEK, I tried to pitch OWL as the
>
> most appropriate representation for the Taxon stuff, but didn't get too
>
> far. I did a little work doing a couple of representations, and a
>
> graduate student of Susan Gauch went further in documenting options. This
>
> dates from about 3 years ago, and we were all just learning OWL DL, so it
>
> may be poorly informed. But it'll give you a notion of the thinking at
>
> the time.
>
> Dave
>
More information about the Seek-kr-sms
mailing list