[obs] Joining DwC, OBOE, PO and PATO

Cam Webb cwebb at oeb.harvard.edu
Tue Oct 26 02:31:19 PDT 2010


Dear colleagues,

I'd like to request some suggestions on semantic modeling of biological 
observations.  I haven't be part of the previous discussions many of you 
have had in other forums, and face to face, so there may be a ready-made 
solution out there for what I am hoping to do, and which I would be very 
grateful to be pointed towards.  If not, I hope this question falls within 
the domain of your interests.

I'm interested in modeling morphological observations of plants in the 
field, as part of an expanding biodiversity inventory and informatics 
project in Indonesia.  Please see: http://phylodiversity.net/xmalesia/ for 
a demo site.  We'll be collecting specimens, images, DNA, and making field 
observations (basic herbarium label data: tree diameter, flower color, 
etc).  We'd then like to present these online as both a nice GUI-driven 
website and via a Linked Data model with a SPARQL endpoint.  I haven't 
found a pre-existing RDF template, although Peter de Vries' work (e.g., 
http://lod.geospecies.org/ses/73F2V?format=html) and Steve Baskauf's work 
(e.g., http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf) come 
close.  There is an active discussion on the tdwg-content mailing list 
right now about using Darwin Core in a semantic web context, including 
issues such as adding an Individual class and the best way to treat 
specimens and images.  However, there is little discussion of observations 
on tdwg-content, so I thought I'd bring it up here (apologies to any of 
you who have see overlapping posts by me on tdwg-content).

So, I'm wondering if OBOE terms can be used to link up from DwC concepts 
to OBO ontology terms with PATO qualities.  Perhaps the best way to ask 
the questions is in the context of a specific example.  Here is an attempt 
to model an observation of the fruit color of a particular individual (in 
Turtle):


@prefix oboe: <http://ecoinformatics.org/oboe/oboe.1.0/oboe-core.owl#> .
@prefix dwc: <http://rs.tdwg.org/dwc/terms/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix sernec: <http://bioimages.vanderbilt.edu/rdf/terms#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix ro: <http://www.obofoundry.org/ro/ro.owl#> .
@prefix pato: <http://purl.org/obo/owl/PATO#> .
@prefix po: <http://purl.org/obo/owl/PO#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://phylodiversity.net/xmalesia/indiv/9>
      a sernec:Individual ;
      sernec:derivativeOccurrence _:blank1 .

_:blank1
      a dwc:Occurrence ;
      dcterms:created "2008-01-01" ;
      dcterms:spatial [
          geo:lon "109.95371" ;
          geo:lat "-1.25530" ;
          ] ;
      dcterms:creator "Cam Webb" ;
      dwc:basisOfRecord "HumanObservation" .

# The details of the observation:
[]   a oboe:Observation ;
      oboe:ofEntity [
          a oboe:Entity ;
          ro:part_of _:blank1 ;
          a po:PO_0009001 ;
          ] ;
      oboe:hasMeasurement pato:PATO_0000320 .

po:PO_0009001 rdfs:label "fruit" .
pato:PATO_0000320
     rdfs:label "green" ;
     a oboe:Measurement .

( The network diagram is at: http://phylodiversity.net/cwebb/img/obs-eg.jpg )


The model includes an Individual, its Occurrence at a particular place 
in space and time, and an Observation of a fruit that is part_of the 
Occurrence.


My questions/issues are:

1. * Space-time information* Is this the best way to link the Observation 
to the Individual, i.e., via the Occurrence, or is it better to link the 
Observation directly to the Individual.  In the former case, the 
time-space instance is specified in the Occurrence (as above), in the 
latter, the time-space instance would have to be added via an extra 
oboe:hasContext link from the Observation to another Observation of a 
Temporal Point entity.  The latter way of linking the Individual is less 
satisfying in the context of Darwin Core, which already uses the 
Occurrence for "HumanObservations".

2. *part_of* If we want to record an observation of part of an organism, 
we could use the ro:part_of property to link the Observation of that 
part to the Individual which has that part.  Two issues here, i) is it 
meristically valid to say that the Occurrence (the space-time instance of 
the continuant Individual) can have a part?  As I read the definition of 
ro:part_of:

   ``For continuants: C part_of C' if and only if: given any c that
     instantiates C at a time t, there is some c' such that c' instantiates
     C' at time t, and c *part_of* c' at t.''

I think a fruit_txyz0 is indeed part of plant_txyz0, but I am not 
well-verse in mereology.  ii) Is there a way in the OBO/EQ ontologies to 
say `the observed Entity is the general class of fruits' rather than a 
specific instance of a fruit, which is what I imagine

   [] a po:Fruit ;
      ro:part_of :Individual123 .

means.  This is important, because our final observation of the nature of 
parts of an Individual is usually actually an average over many instances 
of those parts.

3. *Measurement* In the usage above, I have combined oboe:ofCharacteristic 
and oboe:hasValue into a single PATO quality term `green color'.  Is this 
acceptable usage, within the intentions of OBOE?  I.e. is it fair to 
assert that a pato:Quality is a oboe;Measurement? This solution is 
satisfying because the popular Entity-Quality model can then be mapped 
directly into OBOE, as above.  Matt Jones mentioned that there was an 
effort underway by participants of SONet, DataONE, and the Data 
Conservancy, and possibly the Plant Trait observations ontology group to 
``try to harmonize many of the existing observations models, including 
OBOE, O+M, and EQ, as well as more traditional models like Darwin Core.'' 
I'm wondering if there are any documents available describing this 
ontology interoperability development process?

An alternative to using the OBOE ontology at all is to use a 
phenotype-focussed ontology (i.e., an OBO one emerging from the phenoscape 
group), where a pato:quality ro:inheres_in a po:Fruit.  However, I'm not 
sure there are terms yet published that can be used in RDF.  Any updates 
on this would be vaulable.

4. *General*  I would also appreciate any guidance on whether these 
questions are appropriate for a public forum, or whether modeling a 
particular set of data is a `private' enterprise and too full of context 
dependent decisions.  The options for someone in my position are to design 
a model that represents the data as I see it, coining various new terms 
where needed, or to find/wait for a semantic template, with standardized 
terms, and fit my data into it.  For data re-use (especially LOD 
applications) the latter is preferable, but I don't think we are at the 
stage yet of having an agreed upon template.  No right or wrong here, but 
your opinions would be valued.

Thanks for your time.

Best wishes,

Cam


  +-------------------------------------------------+
  |  CAMPBELL O. WEBB                               |
  |     Senior Research Scientist                   |
  |  Arnold Arboretum of Harvard University         |
  |  [  Harvard University Herbaria,                |
  |     22 Divinity Ave, Cambridge MA, 02138, USA ] |
  +-------------------------------------------------+
  |  Mail: Kotak Pos 2, Sukadana, Kab. Kayong Utara |
  |     Kalimantan Barat 78852, Indonesia           |
  |  Mobile/SMS: +62-813-9917-7663          (GMT+7) |
  |  Skype: ctenolophon       Twitter: @cmwbb       |
  |  Web/PGP: http://phylodiversity.net/cwebb/      |
  +-------------------------------------------------+



More information about the obs mailing list