[SEEK-Taxon] Request for Comment from SEEK-TAXON members

Beach, James H beach at ku.edu
Fri Oct 31 13:31:16 PST 2003


Matt and all:

This is for your consideration (and mine) from Dave Thau as a proposed
work scope for the next 2-3 months.  Dave is a member of this list, so
if you want to say anything private please send it to me, though we are
all open about this.  I will ping you all early next week again to see
if you have any thoughts.

Boo!

Jim


 
--------------------------------
James H. Beach
Biodiversity Research Center
University of Kansas
1345 Jayhawk Boulevard
Lawrence, KS 66045, USA
Tel: 785 864-4645, Fax: 785 864-5335
Televideocon: (H.323): 129.237.201.102



-----Original Message-----
From: thau at learningsite.com [mailto:thau at learningsite.com] 
Sent: 31 October, 2003 2:21 PM
To: Beach, James H
Subject: proposal


Hello Jim,

Here's my proposal for work over the next couple of months.  I've
attached a word document for greater legibility.  Let me know what you
think, and if there are other folks to whom I should send this.

Thanks!
Dave.

-----

Proposal for Next Couple of Months Work
Dave Thau
10/31/2003

Problem:

When a taxonomic name appears in a data set or publication, it is often
unclear what that name represents.  A name may represent a set of
specimens, character states, and/or other taxonomic concepts.  As that
set, or taxonomic concept, changes from publication to publication, the
meaning of the name changes and multiplies.  This well known problem
complicates the process of combining data sets which have been tagged
with taxonomic names.

The uncertainty of name meanings within data sets may be reduced in two
places: in the data sets themselves, and in an external name and concept
resolution system, henceforth called a taxonomic concept server.  

It would be best to reduce uncertainty closest to where the data are
recorded.  Tools ensuring that future data sets use taxonomic concept
tags reflecting more precisely defined taxonomic entities are critical
for this task.  For the person tagging the data set, this may be as
simple as saying "Rynchospora plumosa according to ITIS 2003."  This
string would map onto a unique id which would represent a more
completely defined representation of this taxonomic concept stored in a
repository of some
sort.    Alternatively, the taxonomic concept server could guess at the
intended concept from a name supplied by a user, and if there's a great
deal of uncertainty about the name, could ask the user to select from
several possible definitions.

For legacy data sets the task is harder. Often nothing is known about
the sources of names used by an ecologist, so the meaning of a data
point labeled "oak" is very difficult to determine.  In this case, it
becomes the task of a taxonomic concept server to add clarity where it
can, and to express the degree of uncertainty about the name. 

In addition to labeling data sets with well defined concepts, a
taxonomic concept server should provide a host of other facilities,
including:

1.  Determining the compatibility of two names or concepts with regards
to merging data associated with those names or concepts. 2.  Given a
concept, what are its possible siblings, parents and children. 3.  Given
a name, what are some good concepts to represent it. 4.  Given a
concept, what are the names which may be associated with it.

In all of these cases, in order to use a taxonomic concept, the concept
must be defined in some standard manner, and must be identifiable using
a standard identifier.  There must also be a way to represent different
types and degrees of uncertainty in taxonomic name and concept usage.



Proposal:

I would like to work on four aspects of the SEEK project:

I. Unique Identifiers for Taxonomic Concepts

I would like to investigate different mechanisms for uniquely
identifying taxonomic concepts in SEEK.  Two initial candidate
representations are the Resource Description Framework mechanism for
identifying resources, and the Digital Object ID mechanism.  I would
like to spend two weeks investigating these two mechanisms, contrasting
them, and proposing a way they could be used in the broader context of
SEEK.  The deliverable for this stage would be a brief paper describing
the two mechanisms, discussing their merits, placing them in the SEEK
context, and providing a recommendation.

II. Appropriateness and Usefulness of OWL

I would like to spend some more time, perhaps three or four weeks
investigating the appropriateness and usefulness of OWL as a
representational language, relative to XML and XML Schema, for the
following tasks:

1. Checking the consistency of a taxonomy before it's imported into a
taxonomic concept server 2. As a standard for describing relationships
(such as hasSynonym, hasAuthor, hasSpecimen) 3. As a standard language
for outputting descriptions of taxonomic concepts to other components of
SEEK 4. As a way of tagging data sets with relevant taxonomic concepts

The deliverable for this stage would be a demonstration of a taxonomic
concept data provider consistency checker and some notes on scalability,
a mapping of the Napier schema to OWL, and the beginnings of standards
for tagging data sets and communicating with other components of SEEK.

III. Formalization

Following the investigations described above, I would like to spend some
time, perhaps another three or four weeks formalizing the types of
interactions between the taxon concept server and the rest of SEEK.
This formalization need not involve OWL (should it prove to be
inappropriate) and would include:

1. Helping flesh out the taxonomic concept server API (if that's
necessary) 
2. Helping formally represent relations and operations involved with
taxonomic concepts 
3. Helping work out representations of the degree of uncertainty in
concepts and the different types of  equivalence between concept pairs
4. Helping iron out a common representation language for communicating
between the taxonomic concept server and the rest of SEEK

The deliverable for this phase would be a set of specifications
reflecting the four areas described.

IV.Concurrent tasks

In addition to these projects, I'll be more than happy to help out with
XSLT and lend a hand wherever else I'm needed.

Conclusion:

This is an ambitious proposal, but I won't be starting from scratch, and
hopefully I'll be getting some help.  Many parts of these tasks are
already under way.  There is already a list of use cases and an API.
Walter Behrendson's team and others have done a good deal of work
formalizing taxonomic concept relations.  And, I have already done a
fair amount of work relating to OWL and consistency checking.  

Furthermore, I have no illusions that I'll be able to finish any of
these projects.  Each of these tasks will involve a great deal of
revision and
collaboration with the rest of SEEK.   My main goal will be to promote
dialog in these areas and provide a basis for continued discussion.

I would be happy to focus on any of the projects described above, should
any of them seem particularly important to investigate in depth.  The
area of uncertainty in the comparison of taxonomic concepts, and within
a taxonomic name, is of particular interest to me, as is the
formalization of different types of equivalence between names and
concepts.

Finally, if implementation is deemed to be more important than
specification, I would gladly take a crack at trying to integrate the
current taxon group demo into a more complete Ptolemy II work flow.
Attempting this would inevitably give rise to the beginnings of the
specifications described above.  To be honest, I have not explored
Ptolemy II to any great extent, but I feel confident that my skill set
would enable me to make some headway along this path and the results
would be more concrete, though perhaps less well thought out.

Thanks for your kind consideration, and I look forward to further
conversation.

Dave Thau.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-taxon/attachments/20031031/0c56c065/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: proposal.doc
Type: application/msword
Size: 26624 bytes
Desc: proposal.doc
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-taxon/attachments/20031031/0c56c065/proposal.doc


More information about the Seek-taxon mailing list