[seek-kr] RE: [SEEK-Taxon] thoughts and notes from meeting in portugal

Kennedy, Jessie J.Kennedy at napier.ac.uk
Tue Nov 4 09:09:57 PST 2003


Hi Everyone, (apologies for the long rambling.....)

I think Dave has captured the main points from the meetings in Portugal. One
thing that maybe hasn't been made clear was our discussion on the relation
between the taxon group and the semantic mediation group i.e. where does the
taxon group's remit end and the semantic mediation group's remit begin. 

I agree with Matt that the taxon information is meta-data about the
ecological data sets. But, the taxonomic information with which to reason
about concept similarity is taxonomic knowledge and really can only be dealt
with by the taxon group if the reasoning is to be valid. consider the detail
I guess the KR group is going into on the measurement stuff in order to do
any reasoning - to do so in the taxonomy domain would be too much I think -
so I guess this means (I think it does) that the taxon group does some
mediation on taxonomic concepts??

In order to do any semantic mediation of taxon concepts you need to
understand the relationships between the taxa sufficiently and be able to
record these relationships in a manner that can be reasoned about. This is
one area where I believe that it would be unreasonable to expect the
semantic mediation group to solve the problems - I think the taxon group
need to investigate and clarify the different ways in which it is meaningful
to compare the similarity or other relationships between taxa and find
someway to represent this knowledge. The way to represent the knowledge or
how the two groups interact is where the taxon group overlaps with SMS. 

At TDWG we heard several presenters discussing the need for a concept based
taxonomy model for the exchange of data (more later) and that there had been
some investigations into quantifying the effects of ignoring concepts and
using names only. This investigation was carried out on mosses and we have
asked Walter Berendsohn if we can have access to the data related to that
project to allow us to look at modelling the concepts as described (still to
write formally describing the purpose for the request etc.). Clearly the
SEEK project cannot consider actually building the relationship information
between all known taxa for use in SEEK, but we need to at least understand
how these relationships are decided upon and investigate whether or not it
will be reasonable to infer relationships between existing taxa and if so on
what basis. This can only be done by taxonomists investigating the issue for
taxa they understand well (like the moss example and example taxa that say
Bob and Nico could work on).  

An issue I raised which I would like feedback from other groups on is the
question of which taxa we should focus on in our SEEK scenario. We have been
working on the "fringed file fish" example - but we don't seem to have any
experts on fringed filefish on the taxonomic working group (or none I know
of....). I realise that although fringed filefish is really a placeholder
for any species, it would be more useful for us in the taxon group (I think)
to focus on a species with which we have good taxonomic coverage in terms of
multiple concepts and their treatments with which we could do a realistic
study of the issues involved in having a concept based taxonomy model and
the issues associated with creating relationships between concepts. (here is
where the mosses data and the caryx for Bob and weevils? for Nico come in.)
I would see as an outcome of this, a report that analyses the different
mechanism that taxonomists use to determine relationships between taxa -
this would be good evidence for the way things are built. This could then be
analysed computationally to determine if we can use this knowledge in any
way to speed up the process, build tools to help or automatically infer
relationships in any way.

With regards the being able to integrate and reason about these different
domains uniformly, I do not believe that we could use something like OWL to
represent many trees of hundreds of thousands of nodes with relationships
between those nodes efficiently - but I am happy to be proven wrong on this.
I think we need a system which will manage the taxonomic concepts and
relationships in general and agree some way of interacting with the SMS so
that we can supply the SMS the information it requires to complete the
semantic mediation between the users request, the EML and what the Taxonomic
concept server knows about. Dave's point about passing a subset of info to
the SMS might work but not sure how yet - I guess why we need some more
investigating.

For info for those not in Portugal......
I presented the abstract taxonomic concept model (slides on CVS) I
introduced in San Diego as a start to finding a consensus Taxonomic concept
model which we could use in SEEK and which would be of interest/relevance to
the rest of the community. As a result, I've been asked to become chair of
the group which will define the taxonomic concept transfer standard. This
means that we can show outreach for SEEK in terms of getting the rest of the
taxonomic community to buy into what we're doing. GBIF in particular (who
have a similar requirement to SEEK in terms of a names index and concept
resolution server) are prepared to contribute finances towards this effort
as is SEEK. I don't want to go into details now, but think this was a
positive outcome for SEEK from TDWG. Of particular relevance to SEEK will be
that fact that the transfer schema can inform the schema for our concept
repository, my involvement will mean that EML, VegBank and other SEEK
related models will be considered in the standard and should help to ensure
longer term applicability of what we do. It might be hard work but hopefully
of benefit to SEEK....

I've put a few comments embedded into Dave's notes.....
There's lots more to talk about probably but enough for now......
Do we have a conference call organised?
Have we managed to fix a date for January?

Jessie



> -----Original Message-----
> From: thau at learningsite.com [mailto:thau at learningsite.com]
> Sent: 31 October 2003 23:42
> To: seek-taxon at ecoinformatics.org
> Subject: [SEEK-Taxon] thoughts and notes from meeting in portugal
> 
> 
> Howdy everyone,
> 
> I spent some time writing up my impressions of what was said during my
> little presentation in portugal last week.  If you're feeling so
> inclined, please look this over and let me know if it jibes 
> with what you
> remember.  If you have any amendments, let me know.  I'll 
> incorporate them
> and commit the whole thing to CVS.
> 
> Happy Halloween!
> Dave.
> 
> Thoughts and rememberences of my presentation to the Taxon Group in
> Portugal.
>  
> I started out a the end of the Oct 22nd by giving a brief 
> introduction on
> what the SMS group has been doing.  I discussed GEON and how 
> it was being
> used to map data sets onto ontologies and how mapping other 
> ontologies to
> the first allows the data to be viewed through the different mapped
> ontologies.  I also discussed the paper by Shawn and Bertram on the
> generic framework for semantic registration of scientific data.
>  
> The reaction to these projects ranged from "Let's do that!" 
> to "That won't
> scale to the taxonomic domain."  My feeling is that blindly running a
> description logic classifier on multiple taxonomies of hundreds of
> thousands of nodes probably won't scale.  However, it's not 
> apparent to me
> that we'd ever need to do that. It may be the case that we 
> can work with
> subsets of the taxonomies to keep the scaling issues under 
> control.  This
> needs further investigation.

agree - we need to investigate what information we could extract from a
concept repsoitory to send to the SMS to include in its mediation process...
>  
> The next time we met, Oct 24th, I gave some simple demos of 
> how Protege
> works with OWL, and slightly scratched the surface of what 
> OWL actually
> looks like and how it relates to things like RDF and RDF 
> Schema.  Issues
> of scalability arose again, especially when talking about 
> using Protege as
> a visualization tool.  Another question arose about whether biological
> taxonomies really are ontologies, and whether or not an 
> ontology language
> is necessary.  This was another topic that needed more investigation.

agree - I'm sure that under some definitions of ontology they are - but
others probably not - a good discussion with Bertram and Shawn here might
help us scope what we mean by ontology for this project and therefore
whether or not we really have taxonomic ontologies. (For info Bertram - I
had some discussion a while back with the Manchester folk who thought that
the taxonomic problem wasn't an ontology problem)

>  
> The biggest bone of contention was over how to do the mapping between
> taxonomies.  In GEON, the mappings are fairly simple.  In the 
> taxonomic
> world, it's much harder, and trying to figure out a sensible 
> way to do the
> mappings is perhaps the hardest part of the problem.  OWL by itself
> doesn't help figure out how the mappings should be done.  
> However, it does
> provide an easy way to state mappings in a way which doesn't depend on
> knowing the internals of some piece of software.

agree - wby we need some taxonomic investigation into this.... if it turns
out that OWL and an OWL like vis tool would help people define these
relationships then that might be a useful outcome. We could look at
translating between OWL representations and the repository representation.

>  
> It seemed that there was some concensus that OWL would not make a good
> internal representation for whatever repository the taxon group was
> building.  But creating OWL wrappers for representing information that
> might go into the repository and come out of the repository seemed
> worthwhile.
>  
agree

> The possibility of using OWL (or RDF Schema) to provide 
> unique identifiers
> for taxonomic concepts met with some support - the alternative being
> Digital Object Identifiers (http://www.doi.org/).  A 
> comparison of these
> two would be nice.
>  

agree - would like to undestand the benefits (or not of this)

> We ended with a list of potential next steps (I may have expanded this
> list a bit...):
>  
>   1.  try doing something like GEON in the taxonomic domain
>     a.  register data to one taxonomy - say ITIS
>     b.  map ITIS to another taxonomy - say species 2000
>     c.  see how far it can scale, and how useful it is

not sure how pratical this really is - but seeing how far it would scale as
a representation mechanism would be very useful to know.

>    
>   2.  check the feasibility of using OWL as a representation for the
>         taxonomic concept repository
>     a.  outputting query responses in owl
>     b.  owl as a source of input into the repository

input and output yes - scaling would be an issue for the general
representation of the data (but presume you mean in principal for
representing smaller bits of the data set.
>     
>   3.  define the operations on the repository and representation of
>         the information in the repository in a way useful to the
>         semantic mediation group
>     a.  business rules describing legal operations
>     b.  formalizing different types of equality among taxa
>     c.  formalizing vocabulary such as synonym, pro parte, etc.

agree would all be useful to know.
>      
>   4.  make an ontological representation of the xml schema being built
>         to facilitate transformations between XML conforming 
> to the schema
>         and an OWL representation

might be better when htis is worked out a bit more...

>  
>   5.  Build tools to show the usefulness of an ontological 
> representation
>     a.  consistency checkers for data providers
>     b.  navigators to link data sources together
>    

yes - would be useful but requires the data first - will need to chase up
good data sets continaing hte challenges we face.
                                                           
>                   
> In addition to this list, a few other areas for exploration 
> arose after
> lunch:
>  
> 1.  Looking at ways to share information about how taxa overlap.
> 2.  Helping out with the various XSLT tasks which will arise
>     once we get to pouring data into the repository.
> 
agree again this would be interesting......
> 
another thing is the relationship between the taxonomic concept as will be
defined in the EML data set and those as stored in the concept repository
needs to be invesitgated - what information will we need to be able to
determine similarity to different levels of accuracy.

Is it possible to (semi-)automatically mark up the taxonomic coverage
section for EML to encourage the ecologists to provide better data for
mediating on?



> _______________________________________________
> seek-taxon mailing list
> seek-taxon at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/seek-taxon
> 



More information about the Seek-kr mailing list