[seek-kr-sms] OWL - taxonomy

Mon Oct 31 21:32:28 PST 2005

Hi all:

   To continue our OWL-taxonomy exchange I've opted to cut out some 
select passages from previous e-mails and also added some new points at 
the end.

First, *Bertram* wrote in response to my estimate that taxonomies are 
often inconsistent:

"In logic the term 'inconsistent' is quite different from 'incomplete'. 
Some examples you refer to above seem to indicate that taxonomies are 
often incomplete (which is common and often unavoidable in logic 
formalizations) and maybe only occasionally inconsistent (which is much 
more problematic in logic).

It would be interesting to see to what extent an individual taxonomy is 
consistent with another one, with itself. Also notions of 'relative 
completeness' or 'subsumption' might make some sense when applied to 
taxonomies.

Here are my concrete questions:
How can we use TAXON within Kepler?
Are we "stuck" with the current use of taxon support in EML, or what can 
we do beyond that?
Can we reuse some of the SMS infrastructure of Kepler to deal with TAXON 
information?

For the latter, it might be helpful to capture some of the TAXON 
information in a form that could be used by SMS.

Maybe we could drive this discussion by a specific use case that is 
realistic both in the use of TAXON and in the use of data analysis 
steps... Do we already have such a use case??"

**********

*My response to this:*

- I suppose I could use a helpful, exemplified account of what 
"inconsistent" means in OWL-DL. But see also my discussion below.
- I almost want to punt on the Taxon/Kepler question. I think for the 
moment, and also only if we want to, we should try to see to what extent 
we can represent a single taxonomic classification in OWL/DL in general. 
I think we all sense that something ought to be possible but haven't 
gotten sufficiently specific yet. Once we do, then some useful 
Taxon/Kepler options might emerge.
- Agreed?

**********

*Betram* added in another e-mail, concerning the issue of real-life 
taxonomic definitions working interchangeably as both classes and instances:

Maybe, or maybe not. Could one not distinguish, e.g., between an 
"element as instance" and "element as class"? Things that hold for the 
former may not hold for the latter and vice versa. We simply distinguish 
between elements/terms/concepts when used at the instance vs. when used 
at the class level.

Let me make a simplifying example: Say you've figured out a way to 
represent all your information in the form of triples (X,Y,Z). If a term 
t occurs in the X position (call it the instance position), it doesn't 
say anything about it occurring in the Z position (call it the class 
position, provided Y has some "class-valued property" say "hasClass").

So we can distinguish between 't as an instance' and 't as a class'. It 
is up to some convention (or axiomatization) to establish a link between 
these two uses of t.

Can we express this link that taxonomists make? Do they identify the two 
uses of t? Is there never a distinction between a species (name? 
concept? element?) when used in the instance sense vs. in the class sense?

**********

To which I respond:

YES, something like this has to happen I think. It does in fact happen 
in real life that scientists will "decompose" a multi-natured taxonomic 
definition (with e.g. 1. a type specimen, 2. other included specimens or 
3. species, and 4. also a diagnosis of distinguishing features for the 
organisms). In a particular situation they will refer to and reuse only 
parts of that definition and ignore the rest, even and especially when 
the rest doesn't really fit their current purpose. To have the "class" 
aspects and "instance" aspects of a taxonomic definition to be 
optionally dissociable is therefore necessary I think.

**********

I've had a quick look at the "Representing Classes As Property Values" 
document. It might well be on the right track. But my feeling is the 
examples are still too far away from actual practice to help our case. 
My view is that "you guys" know OWL-DL in and out but maybe we should 
get a better understanding how taxonomic definitions work, and what 
issues are involved. For this purpose I've attached a 2-page PDF with 
three hopefully useful examples.

The first page of the PDF shows a character-by-taxon matrix in which 22 
species (species concepts, strictly speaking) are evaluated for the 
presence or absence of 32 morphological features. Each species here is a 
class (I assume also in OWL-speak) with a list of properties that 
characterize it (and others that it doesn't have). In some case the 
properties are "not applicable" ("-" in the matrix), e.g. when the 
question is "wing color red or green?" in a species that happens to be 
wingless. Also, some properties have not yet been observed and are 
marked as "?". From the set or properties, and using a 
phylogeny-generating algorithm, the tree (and classification) on the 
second page is inferred.

So let us look at example 1 - the genus concept Cyclanthura (sec. Franz 
in this humble scenario). The genus Cyclanthura may be defined by the 15 
species it contains. One could also name any or all of the "subclades" 
(groups of species that can be traced to one and the same origin in the 
tree) and use those to make up the definition. I call this "ostensive 
defining", or defining by "pointing at". The species are instances of 
the genus. I think this is also called nominalism and probably something 
else (in addition) in computer science.

On the other hand, note that the genus Cyclanthura has three distinctive 
features - characters 12, 25, and 27. Those are postulated to have 
jointly evolved at the time of the genus' origin according to the 
distribution of these features in the 15 species and the phylogeny 
algorithm used. So alternatively the genus is defined by those features, 
which I call "intensional defining", or defining by "describing". The 
definition could in principle be understood without any instances listed 
(and vise-versa). This is called essentialism (and probably something 
else still in computer science).

Almost ANY taxonomic classification that is worthwhile representing - 
from Aristotle to today - will have elements showing these two aspects - 
ostensive and intensional. Scientists use them together OR separately in 
a way that helps them make the most sense out of a particular situation. 
The serve different purposes but are both indispensable for representing 
the taxonomic information content.

Let's look at example 2 - Cyclanthura pilosa. This is a species concept 
that works (1) as a instance of Cyclanthura (see above), but (2) as a 
class which has unique properties by itself. Note that Cyclanthura 
pilosa does NOT have feature 27 present as predicated by the 
higher-level definition (of the genus Cyclanthura). The phylogenetic 
analysis postulates the it has "secondarily lost" that property in the 
course of evolution (what we call "homoplasy"). Question - is this a 
true inconsistency sense OWL-DL? The instance Cyclanthura pilosa was 
supposed to have all distinguishing features of its class Cyclanthura 
but in fact it does not (anymore). Species in particular have been 
called "homeostatic property clusters", meaning that any of their 
supposedly defining features could in fact be missing due to 
evolutionary change, yet the properties in general are still needed to 
define species.

And now example 3 - Ganglionus undulatus. This species concept is used 
here in a sense to represent the entire genus concept Ganglionus. This 
is called "exemplar approach" and is of course very common. It is what I 
mean by "incomplete". There are in fact five species of Ganglionus and 
most taxonomists would understand me to know this even if I don't say 
so. Every regionally or temporally restricted taxonomic summary will run 
into this problem of leaving out things known or presumed to exist. Of 
course I can afford to leave out the four other species here because I 
am really only interested in the defining properties of the concept 
Ganglionus, and one instance is sufficient in this case to illustrate them.

**********

In summary, I hope again that this was helpful. *I* would benefit from 
someone in the OWL group telling me what vocabulary is preferred to name 
the most significant aspects associated with a typical taxonomic 
definition (see examples 1, 2, 3). Is my ostensive/intensional 
terminology understandable and acceptable (even if we do not/need not 
use it in the future)? How would *you* express the dual nature of 
taxonomic definitions? Finally, I fully agree with Bertam's assessment 
that class- and instance-characteristics of taxonomic concepts need to 
be combinable as well as flexibly dissociable for us to make significant 
progress. Feel free to ask any questions about this, and let me know if 
it was too little or (more likely) too much at once. What other kinds of 
examples might you find helful?

Cheers,

Nico

Nico M. Franz, Ph.D.
Postdoctoral Research Fellow
National Center for Ecological Analysis and Synthesis
MSB, Room # 3411, University of California
Santa Barbara, CA  93106-6150

Phone: (805) 893-5934; Fax: (805) 893-8062; E-mail: franz at nceas.ucsb.edu
Website: http://www.nceas.ucsb.edu/~franz/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Cyclanthura-OWL.pdf
Type: application/pdf
Size: 2213316 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-kr-sms/attachments/20051031/a46f5000/Cyclanthura-OWL-0001.pdf