[seek-kr] RE: [SEEK-Taxon] thoughts and notes from meeting in portugal

Wed Nov 5 14:18:59 PST 2003

I believe that the taxon group has, perhaps, multiple missions.
For the concepts to be merged "validly", it seems to be required
that we develop and test algorithms for

"ResolveConcept" (i.e, given a partially
specified concept from a user or from an EML markup return the
best matching ConceptIDS within a particular taxonomy)

and other algorithms that compare concept across taxonomies.
These algorithms will be developed by the taxon WG in conjunction with
ecologists, taxonomists, and other interested domain experts.

However, this does not preclude providing additional APIs that
allow portions of one or more of our taxonomies to be output in
a format that could be input to SMS for reasoning on the raw data.
What algorithms a user may wish to enter and how valid those
algorithms are may mean that this is less desirable than using
our built in algorithms.  However, if a user _wishes_ to get a portion
of our concept database in order to reason on it in a non-standard way,
I don't see why we can't provide that as an option.

The backend representation of the concept database is really driven by
efficiency issues and is likely to remain a relational or possibly
native XML database in future.  The Schema for how a concept is
represented is in the hands of our domain experts.  The implementation of
that schema is in the hands of our computing experts.  However, the SMS
group, as one user of our system, should be involved in telling us how
they would like the "raw" taxonomy presented to them.  Right now
OWL seems to be the preferred format, but OWL is just a representation
language.  We need to explore what version of OWL and how to identify
instances versus objects versus object relationships etc.

I see Dave as exploring the "how to use OWL" to provide the desired input
to the SMS group and other issues.

Mapping between taxonomies is a research question for computer science
in general.  It is looking like an area that Joana will be exploring in
her Ph.D. dissertation with me.  That said, there is more than enough
room in such a topic for additional help and no expectation that any
such exploration will yield results that are accurate enough to be useful
(merely better than what already exists!).

Another large topic that is more SMS related is that the reasoners are
purely symbolic right now.  A huge breakthrough in the NLP community
happened with Church and Hanks and Brill etc when they combined
statistical information with traditional parsing and semantics to
produce the most _likely_ interpretations for sentences and thereby
deal with ambiguity.  I think that in future years we will need to find
ways to see how far we can handle taxonomic ambiguity symbolically and
then how this can be improved, if at all, with the inclusion of
learning/statistics/training data.

Susan.