[seek-kr-sms] algorithms and the owlfication of taxon

Mon Oct 31 05:50:34 PST 2005

Nico (and all):

Your summary of what TAXON is and isn't trying to do is very helpful!

A quick note: 
NF> The classifications are probably also highly inconsistent internally in 
NF> your sense of the work "inconsistent". Meaning, they will mention some 
NF> specimens but not all that went into the definition of a species, they 
NF> will mention some species but not all that go into the definition of a 
NF> genus, things will be left out here and there, partially contradict each 
NF> other, and so on.

In logic the term 'inconsistent' is quite different from
'incomplete'. Some examples you refer to above seem to indicate that
taxonomies are often incomplete (which is common and often unavoidable
in logic formalizations) and maybe only occassionally inconsistent
(which is much more problematic in logic).

It would be interesting to see to what extent an individual taxonomy
is consistent with another one, with itself. Also notions of 'relative
completeness' or 'subsumption' might make some sense when applied to
taxonomies. 

Here are my concrete questions:

How can we use TAXON within Kepler? Are we "stuck" with the current
use of taxon support in EML, or what can we do beyond that? 
Can we reuse some of the SMS infrastructure of Kepler to deal with
TAXON information? 

For the latter, it might be helpful to capture some of the TAXON
information in a form that could be used by SMS.

Maybe we could drive this discussion by a specific use case that is
realistic both in the use of TAXON and in the use of data analysis
steps... Do we already have such a use case??

Bertram

>>> On Wed, 26 Oct 2005 11:03:43 -0700
>>> Nico Franz <franz at nceas.ucsb.edu> wrote: 
NF> 
NF> Hi all:
NF> I realize we've had this exachange before in some form and it's mostly 
NF> about playing catch-up on both sides. It's fun (for me too) to think 
NF> about representing taxonomy in an ontology format and think about the 
NF> potential services and benefits of such a move. I'll try to state my 
NF> current perspective in a way that might be helpful.
NF> 
NF> For the Taxon group, the main challenge is actually not the 
NF> representation of any single classification with all its components 
NF> ("taxa"), their names and properties, their subcomponents, and 
NF> interrelationships (parent/child, etc.). We would gain next to nothing 
NF> from having a single-classification representation function in isolation.
NF> 
NF> We're also not immediately charged with the task of merging parts of or 
NF> entire (multiple) classifications.
NF> 
NF> Serguei told me yesterday that one of the main benefits of ontology 
NF> representation is "checking for internal (logical) consistency". 
NF> Discovery and correction of errors, etc. That is most certainly not what 
NF> Taxon is trying to do. We know that any given taxonomy is highly 
NF> idiosyncratic, implicit, and assumptive of a vaguely specified 
NF> background history involving select competent speakers. A single 
NF> taxonomic classification might not only turn out to be false in terms of 
NF> not representing the relationships or properties (composition) of taxa 
NF> correctly (as subsequent and more refined studies are bound to show). 
NF> The classifications are probably also highly inconsistent internally in 
NF> your sense of the work "inconsistent". Meaning, they will mention some 
NF> specimens but not all that went into the definition of a species, they 
NF> will mention some species but not all that go into the definition of a 
NF> genus, things will be left out here and there, partially contradict each 
NF> other, and so on.
NF> 
NF> The issue here is that we are not charged immediately with improving 
NF> this state of affairs, i.e. helping taxonomist be better, more 
NF> transparent taxonomists from here on. Taxon has no "normative ambitions" 
NF> in terms of telling scientists how to produce classifications using a 
NF> more complete and formal approach (description logic rules, etc.).
NF> 
NF> So what is Taxon's mandate? Basically, we're charged with building a 
NF> language and supporting infrastructure that will allow users and (in a 
NF> second phase) machines to make more sense of the semantic similarities 
NF> and differences between the components of multiple existing taxonomic 
NF> classification - to a higher degree of precision than can be achieved 
NF> using name strings and conventional taxonomic synonymy relationships 
NF> alone. We're trying to build tools for taxonomic experts to do "brain 
NF> dumps" on what they know about the classificatory history of their 
NF> groups of expertise but wouldn't be able to express clearly and 
NF> comprehensively without our assistance.
NF> 
NF> To that end, we need to be able to import fairly decent representations 
NF> of at least two hierarchical classifications into a graphic interface. 
NF> For our purposes those representation do not have to be any more 
NF> ontology-complying than the original classifications (which largely 
NF> weren't I would think).
NF> 
NF> Then in a second step, we need to provide taxonomic experts with a more 
NF> powerful language than "is a synonym of" in order to assess the semantic 
NF> similarities and differences of elements ("taxa") defined in the two 
NF> classifications. That language will use terms like "is congruent with", 
NF> "excludes", "is less inclusive (taxonomically) than", etc. Those 
NF> assessments require the assessor to be intimately familiar with the 
NF> written and unwritten idiosyncracies of the two classifications. We're 
NF> talking about people here whose lifetime work was exactly that - 
NF> learning the taxonomic history of a specific group as captured in the 
NF> literature and museum collections. And different experts may still come 
NF> up with different judgments when confronted with the same two 
NF> classifications.
NF> 
NF> Then once we have those more semantically informative assessments of 
NF> interrelationship, we can reap benefits by constructing more powerful 
NF> searches on biological data, and make more informed choices and when to 
NF> integrate the information associated with taxonomic names, or when to 
NF> keep it separate. At that stage we would benefit from being very 
NF> explicit and consistent about how we handle searches and data 
NF> integration steps.
NF> 
NF> Just for fun I've attached a blurp from Rich Pyle about a particular bit 
NF> of taxonomic history concerning a group of fishes. Let me know if this 
NF> was helpful.
NF> 
NF> Cheers,
NF> 
NF> Nico
NF> 
NF> **********
NF> Here's a case in fishes that might meet your needs (family Sparidae):
NF> 
NF> Pagellus calamus Valenciennes in Cuvier & Valenciennes 1830 was 
NF> described on the basis of four syntypes (MNHN 5565, 5566, A-8101 & 
NF> 8664). As of 1966, apparently two of these (5566 & 8664) had been lost 
NF> or destroyed, so only two remained.
NF> 
NF> Calamus pennatula Guichenot 1868 was apparently based on the same series 
NF> of type specimens as P. calamus -- which means that one would at first 
NF> assume pennatula to be an objective (homotypic) synonym of calamus.
NF> 
NF> However...
NF> 
NF> Randall & Caldwell (1966) examined the two existing syntypes of P. 
NF> calamus (MNHN 5565 & A-8101), and discovered that they represented two 
NF> different species. They selected one of them (A-8101) as the lectotype 
NF> of P. calamus, and selected the other (MNHN 5565) as the lectotype of of 
NF> C. pennatula, thereby preserving both names.
NF> 
NF> But there's more:
NF> 
NF> Swainson (1839:171) described the genus-group name Callamus (as a 
NF> subgenus of Chrysophrys Quoy & Gaimard 1824), the type species of which 
NF> is Calamus megacephalus Swainson 1839:222 (by monotypy). However 
NF> according to Jordan & Gilbert (1884:18) and Randall & Caldwell 
NF> (1966:36), Swainson used the species epithet "megacephalus" only because 
NF> it was customary at the time to create new species epithets to avoid 
NF> tautonyms, and his "megacephalus" is treated as a junior synonym of 
NF> Pagellus calamus Valenciennes in Cuvier & Valenciennes 1830.
NF> 
NF> So...here is a case of one series of syntypes, with two different names 
NF> based on that same series of syntypes, and two different species 
NF> represented among that same series. One of those species is the defacto 
NF> type species of a genus (although I doubt that anyone would ever split 
NF> the two species into separate genera).
NF> 
NF> And as if that's not enough....
NF> 
NF> Randall & Caldwell also describe a similar situation for Pegallus penna 
NF> Valenciennes in Cuvier & Valenciennes 1830:209. Among its three existing 
NF> syntypes, two are what are now considered to be Calamus penna (one of 
NF> which Randall & Caldwell designated as the lectotype), and the third is 
NF> identified as C. pennatula.
NF> **********
NF> 
NF> Serguei Krivov wrote:
NF> 
>> There are many ways to represent biological taxonomies in OWL. The 
>> main problem here is how to avoid a second order style logic i.e. 
>> assigning properties to classes rather then specifying properties of 
>> objects by defining classes. There is temptation to use owl as meta- 
>> language of taxonomy rather then as the language of taxonomy (which it 
>> is intended to be), or say it metaphorically writing OWL interpreter 
>> for OWL.
>> 
>> I believe this could be easily avoided. Here is how I would represent 
>> the part of taxonomies from Dave’s design document:
>> 
>> Each instance of class species would have attributes hasKingdom, 
>> hasPhylum, etc. One could also add hasAuthority, hasReference etc. And 
>> so we describe species exactly as humans do. Now the question is how 
>> to say that all Anthropoda are Animals and all Chordata are Animals. 
>> It is easy in OWL if we use subsumption axioms on anonymous classes:
>> 
>> this states that anonymous class hasKingdom:Animals (property value 
>> restriction) is subclass of anonymous class hasPhylum:Anthropoda. Now 
>> when subsumption relation is established one could use owl reasoner to 
>> check consistency
>> 
>> ciao,
>> 
>> serguei
>> 
>> --------------------------------------------------------------------------------------
>> 
>> Serguei Krivov, Assist. Research Professor,
>> 
>> Computer Science Dept. & Gund Inst. for Ecological Economics,
>> 
>> University of Vermont; 590 Main St. Burlington VT 05405
>> 
>> phone: (802)-656-2978
>> 
>> -----Original Message-----
>> From: dave thau [mailto:thau at learningsite.com]
>> Sent: Wednesday, October 26, 2005 11:22 AM
>> To: Serguei.Krivov at uvm.edu; bertram
>> Subject: algorithms and the owlfication of taxon
>> 
>> Hello,
>> 
>> Attached are two documents you may find interesting. The first was the
>> 
>> first assignment in my algorithms class. The puzzle I described yesterday
>> 
>> is part II.
>> 
>> Second, when I first started working on SEEK, I tried to pitch OWL as the
>> 
>> most appropriate representation for the Taxon stuff, but didn't get too
>> 
>> far. I did a little work doing a couple of representations, and a
>> 
>> graduate student of Susan Gauch went further in documenting options. This
>> 
>> dates from about 3 years ago, and we were all just learning OWL DL, so it
>> 
>> may be poorly informed. But it'll give you a notion of the thinking at
>> 
>> the time.
>> 
>> Dave
>>