[seek-kr-sms] algorithms and the owlfication of taxon
Bertram Ludaescher
ludaesch at ucdavis.edu
Mon Oct 31 05:50:34 PST 2005
Nico (and all):
Your summary of what TAXON is and isn't trying to do is very helpful!
A quick note:
NF> The classifications are probably also highly inconsistent internally in
NF> your sense of the work "inconsistent". Meaning, they will mention some
NF> specimens but not all that went into the definition of a species, they
NF> will mention some species but not all that go into the definition of a
NF> genus, things will be left out here and there, partially contradict each
NF> other, and so on.
In logic the term 'inconsistent' is quite different from
'incomplete'. Some examples you refer to above seem to indicate that
taxonomies are often incomplete (which is common and often unavoidable
in logic formalizations) and maybe only occassionally inconsistent
(which is much more problematic in logic).
It would be interesting to see to what extent an individual taxonomy
is consistent with another one, with itself. Also notions of 'relative
completeness' or 'subsumption' might make some sense when applied to
taxonomies.
Here are my concrete questions:
How can we use TAXON within Kepler? Are we "stuck" with the current
use of taxon support in EML, or what can we do beyond that?
Can we reuse some of the SMS infrastructure of Kepler to deal with
TAXON information?
For the latter, it might be helpful to capture some of the TAXON
information in a form that could be used by SMS.
Maybe we could drive this discussion by a specific use case that is
realistic both in the use of TAXON and in the use of data analysis
steps... Do we already have such a use case??
Bertram
>>> On Wed, 26 Oct 2005 11:03:43 -0700
>>> Nico Franz <franz at nceas.ucsb.edu> wrote:
NF>
NF> Hi all:
NF> I realize we've had this exachange before in some form and it's mostly
NF> about playing catch-up on both sides. It's fun (for me too) to think
NF> about representing taxonomy in an ontology format and think about the
NF> potential services and benefits of such a move. I'll try to state my
NF> current perspective in a way that might be helpful.
NF>
NF> For the Taxon group, the main challenge is actually not the
NF> representation of any single classification with all its components
NF> ("taxa"), their names and properties, their subcomponents, and
NF> interrelationships (parent/child, etc.). We would gain next to nothing
NF> from having a single-classification representation function in isolation.
NF>
NF> We're also not immediately charged with the task of merging parts of or
NF> entire (multiple) classifications.
NF>
NF> Serguei told me yesterday that one of the main benefits of ontology
NF> representation is "checking for internal (logical) consistency".
NF> Discovery and correction of errors, etc. That is most certainly not what
NF> Taxon is trying to do. We know that any given taxonomy is highly
NF> idiosyncratic, implicit, and assumptive of a vaguely specified
NF> background history involving select competent speakers. A single
NF> taxonomic classification might not only turn out to be false in terms of
NF> not representing the relationships or properties (composition) of taxa
NF> correctly (as subsequent and more refined studies are bound to show).
NF> The classifications are probably also highly inconsistent internally in
NF> your sense of the work "inconsistent". Meaning, they will mention some
NF> specimens but not all that went into the definition of a species, they
NF> will mention some species but not all that go into the definition of a
NF> genus, things will be left out here and there, partially contradict each
NF> other, and so on.
NF>
NF> The issue here is that we are not charged immediately with improving
NF> this state of affairs, i.e. helping taxonomist be better, more
NF> transparent taxonomists from here on. Taxon has no "normative ambitions"
NF> in terms of telling scientists how to produce classifications using a
NF> more complete and formal approach (description logic rules, etc.).
NF>
NF> So what is Taxon's mandate? Basically, we're charged with building a
NF> language and supporting infrastructure that will allow users and (in a
NF> second phase) machines to make more sense of the semantic similarities
NF> and differences between the components of multiple existing taxonomic
NF> classification - to a higher degree of precision than can be achieved
NF> using name strings and conventional taxonomic synonymy relationships
NF> alone. We're trying to build tools for taxonomic experts to do "brain
NF> dumps" on what they know about the classificatory history of their
NF> groups of expertise but wouldn't be able to express clearly and
NF> comprehensively without our assistance.
NF>
NF> To that end, we need to be able to import fairly decent representations
NF> of at least two hierarchical classifications into a graphic interface.
NF> For our purposes those representation do not have to be any more
NF> ontology-complying than the original classifications (which largely
NF> weren't I would think).
NF>
NF> Then in a second step, we need to provide taxonomic experts with a more
NF> powerful language than "is a synonym of" in order to assess the semantic
NF> similarities and differences of elements ("taxa") defined in the two
NF> classifications. That language will use terms like "is congruent with",
NF> "excludes", "is less inclusive (taxonomically) than", etc. Those
NF> assessments require the assessor to be intimately familiar with the
NF> written and unwritten idiosyncracies of the two classifications. We're
NF> talking about people here whose lifetime work was exactly that -
NF> learning the taxonomic history of a specific group as captured in the
NF> literature and museum collections. And different experts may still come
NF> up with different judgments when confronted with the same two
NF> classifications.
NF>
NF> Then once we have those more semantically informative assessments of
NF> interrelationship, we can reap benefits by constructing more powerful
NF> searches on biological data, and make more informed choices and when to
NF> integrate the information associated with taxonomic names, or when to
NF> keep it separate. At that stage we would benefit from being very
NF> explicit and consistent about how we handle searches and data
NF> integration steps.
NF>
NF> Just for fun I've attached a blurp from Rich Pyle about a particular bit
NF> of taxonomic history concerning a group of fishes. Let me know if this
NF> was helpful.
NF>
NF> Cheers,
NF>
NF> Nico
NF>
NF> **********
NF> Here's a case in fishes that might meet your needs (family Sparidae):
NF>
NF> Pagellus calamus Valenciennes in Cuvier & Valenciennes 1830 was
NF> described on the basis of four syntypes (MNHN 5565, 5566, A-8101 &
NF> 8664). As of 1966, apparently two of these (5566 & 8664) had been lost
NF> or destroyed, so only two remained.
NF>
NF> Calamus pennatula Guichenot 1868 was apparently based on the same series
NF> of type specimens as P. calamus -- which means that one would at first
NF> assume pennatula to be an objective (homotypic) synonym of calamus.
NF>
NF> However...
NF>
NF> Randall & Caldwell (1966) examined the two existing syntypes of P.
NF> calamus (MNHN 5565 & A-8101), and discovered that they represented two
NF> different species. They selected one of them (A-8101) as the lectotype
NF> of P. calamus, and selected the other (MNHN 5565) as the lectotype of of
NF> C. pennatula, thereby preserving both names.
NF>
NF> But there's more:
NF>
NF> Swainson (1839:171) described the genus-group name Callamus (as a
NF> subgenus of Chrysophrys Quoy & Gaimard 1824), the type species of which
NF> is Calamus megacephalus Swainson 1839:222 (by monotypy). However
NF> according to Jordan & Gilbert (1884:18) and Randall & Caldwell
NF> (1966:36), Swainson used the species epithet "megacephalus" only because
NF> it was customary at the time to create new species epithets to avoid
NF> tautonyms, and his "megacephalus" is treated as a junior synonym of
NF> Pagellus calamus Valenciennes in Cuvier & Valenciennes 1830.
NF>
NF> So...here is a case of one series of syntypes, with two different names
NF> based on that same series of syntypes, and two different species
NF> represented among that same series. One of those species is the defacto
NF> type species of a genus (although I doubt that anyone would ever split
NF> the two species into separate genera).
NF>
NF> And as if that's not enough....
NF>
NF> Randall & Caldwell also describe a similar situation for Pegallus penna
NF> Valenciennes in Cuvier & Valenciennes 1830:209. Among its three existing
NF> syntypes, two are what are now considered to be Calamus penna (one of
NF> which Randall & Caldwell designated as the lectotype), and the third is
NF> identified as C. pennatula.
NF> **********
NF>
NF> Serguei Krivov wrote:
NF>
>> There are many ways to represent biological taxonomies in OWL. The
>> main problem here is how to avoid a second order style logic i.e.
>> assigning properties to classes rather then specifying properties of
>> objects by defining classes. There is temptation to use owl as meta-
>> language of taxonomy rather then as the language of taxonomy (which it
>> is intended to be), or say it metaphorically writing OWL interpreter
>> for OWL.
>>
>> I believe this could be easily avoided. Here is how I would represent
>> the part of taxonomies from Daves design document:
>>
>> Each instance of class species would have attributes hasKingdom,
>> hasPhylum, etc. One could also add hasAuthority, hasReference etc. And
>> so we describe species exactly as humans do. Now the question is how
>> to say that all Anthropoda are Animals and all Chordata are Animals.
>> It is easy in OWL if we use subsumption axioms on anonymous classes:
>>
>> this states that anonymous class hasKingdom:Animals (property value
>> restriction) is subclass of anonymous class hasPhylum:Anthropoda. Now
>> when subsumption relation is established one could use owl reasoner to
>> check consistency
>>
>> ciao,
>>
>> serguei
>>
>> --------------------------------------------------------------------------------------
>>
>> Serguei Krivov, Assist. Research Professor,
>>
>> Computer Science Dept. & Gund Inst. for Ecological Economics,
>>
>> University of Vermont; 590 Main St. Burlington VT 05405
>>
>> phone: (802)-656-2978
>>
>> -----Original Message-----
>> From: dave thau [mailto:thau at learningsite.com]
>> Sent: Wednesday, October 26, 2005 11:22 AM
>> To: Serguei.Krivov at uvm.edu; bertram
>> Subject: algorithms and the owlfication of taxon
>>
>> Hello,
>>
>> Attached are two documents you may find interesting. The first was the
>>
>> first assignment in my algorithms class. The puzzle I described yesterday
>>
>> is part II.
>>
>> Second, when I first started working on SEEK, I tried to pitch OWL as the
>>
>> most appropriate representation for the Taxon stuff, but didn't get too
>>
>> far. I did a little work doing a couple of representations, and a
>>
>> graduate student of Susan Gauch went further in documenting options. This
>>
>> dates from about 3 years ago, and we were all just learning OWL DL, so it
>>
>> may be poorly informed. But it'll give you a notion of the thinking at
>>
>> the time.
>>
>> Dave
>>
More information about the Seek-kr-sms
mailing list