[SEEK-Taxon] thoughts and notes from meeting in portugal

Fri Oct 31 15:41:51 PST 2003

Howdy everyone,

I spent some time writing up my impressions of what was said during my
little presentation in portugal last week.  If you're feeling so
inclined, please look this over and let me know if it jibes with what you
remember.  If you have any amendments, let me know.  I'll incorporate them
and commit the whole thing to CVS.

Happy Halloween!
Dave.

Thoughts and rememberences of my presentation to the Taxon Group in
Portugal.

I started out a the end of the Oct 22nd by giving a brief introduction on
what the SMS group has been doing.  I discussed GEON and how it was being
used to map data sets onto ontologies and how mapping other ontologies to
the first allows the data to be viewed through the different mapped
ontologies.  I also discussed the paper by Shawn and Bertram on the
generic framework for semantic registration of scientific data.

The reaction to these projects ranged from "Let's do that!" to "That won't
scale to the taxonomic domain."  My feeling is that blindly running a
description logic classifier on multiple taxonomies of hundreds of
thousands of nodes probably won't scale.  However, it's not apparent to me
that we'd ever need to do that. It may be the case that we can work with
subsets of the taxonomies to keep the scaling issues under control.  This
needs further investigation.

The next time we met, Oct 24th, I gave some simple demos of how Protege
works with OWL, and slightly scratched the surface of what OWL actually
looks like and how it relates to things like RDF and RDF Schema.  Issues
of scalability arose again, especially when talking about using Protege as
a visualization tool.  Another question arose about whether biological
taxonomies really are ontologies, and whether or not an ontology language
is necessary.  This was another topic that needed more investigation.

The biggest bone of contention was over how to do the mapping between
taxonomies.  In GEON, the mappings are fairly simple.  In the taxonomic
world, it's much harder, and trying to figure out a sensible way to do the
mappings is perhaps the hardest part of the problem.  OWL by itself
doesn't help figure out how the mappings should be done.  However, it does
provide an easy way to state mappings in a way which doesn't depend on
knowing the internals of some piece of software.

It seemed that there was some concensus that OWL would not make a good
internal representation for whatever repository the taxon group was
building.  But creating OWL wrappers for representing information that
might go into the repository and come out of the repository seemed
worthwhile.

The possibility of using OWL (or RDF Schema) to provide unique identifiers
for taxonomic concepts met with some support - the alternative being
Digital Object Identifiers (http://www.doi.org/).  A comparison of these
two would be nice.

We ended with a list of potential next steps (I may have expanded this
list a bit...):

  1.  try doing something like GEON in the taxonomic domain
    a.  register data to one taxonomy - say ITIS
    b.  map ITIS to another taxonomy - say species 2000
    c.  see how far it can scale, and how useful it is

  2.  check the feasibility of using OWL as a representation for the
        taxonomic concept repository
    a.  outputting query responses in owl
    b.  owl as a source of input into the repository

  3.  define the operations on the repository and representation of
        the information in the repository in a way useful to the
        semantic mediation group
    a.  business rules describing legal operations
    b.  formalizing different types of equality among taxa
    c.  formalizing vocabulary such as synonym, pro parte, etc.

  4.  make an ontological representation of the xml schema being built
        to facilitate transformations between XML conforming to the schema
        and an OWL representation

  5.  Build tools to show the usefulness of an ontological representation
    a.  consistency checkers for data providers
    b.  navigators to link data sources together

In addition to this list, a few other areas for exploration arose after
lunch:

1.  Looking at ways to share information about how taxa overlap.
2.  Helping out with the various XSLT tasks which will arise
    once we get to pouring data into the repository.