[seek-kr-sms] SMS Partial Design -- Request for Comments

Tue Apr 6 11:13:27 PDT 2004

I thought I'd share a bit of what we're working on here re: URI
mechanisms for concept identification and format selection:

- we see each data source (eml document, postgres database, rdf
individual collection, ODBC interface, etc..) as a generic "abox"  - all
data are instances of a known concept (the abox's abstract model derives
from that of a generic, OWL-inspired ontology).

- each abox has a URI and each "data" (column, number etc) has a
fragment id in it (#id);

- the abox's URI identifies:

	a protocol that allows a server to know how to access it: 
	pg -> 	postgres database with 
	eml ->  EML document
	rdf ->  a collection of RDF individuals
	imt ->  a user session in an IMT server
	ecg -> wraps the ecogrid services

  the rest of the URI identifies access modes, host, port, password, and
  the path is interpreted by the wrapping code (e.g. in the postgres
  abox it loads operators and extensions from XML specs: 

	pg://host:port/postgis/mydata#123 connects to a PG database 
	on host:port, loads the postgis extensions and ops (used to
	interpret queries and produce properly formatted literals for
	stuff like polygons), and retrieves object 123

- each abox protocol corresponds to a registered implementation of the
abox abstract model that can make good use of the info in the URI and
knows how to translate a query from the generic IMT constraint into the
host language (the generic IMT constraint looks a bit like RDQL but has
polymorphic literal types and operators, so you can extend it to know
about space etc).

So each concept has a URI, each abox is in turn a concept, and you can
query it for concepts that contain the concepts you're searching for in
a specified relationship (e.g. all dataset of a certain semantic type
that contain a space domain in the specified region) - that's how we
keep everything nice and general. 

The query can return the concept in one of many formats - as a lisp-like
list, as RDF, as HTML, or - needless to say - any other format that you
want to plug in into the server. You can then import the concept's
representation in your own session if you're connected to an IMA server
(a session is also an abox/ontology) - which creates an implementation
in the server, and allows you to work with it (e.g. run it if it's a
model/pipeline). 

A few cents in case it can be relevant to this discussion! I'll be back
soon with comments on Shawn's great contribution. Ciao for now,

ferdinando

On Tue, 2004-04-06 at 13:48, Bertram Ludaescher wrote:
> The qformat={knb,xml} is quite nice.
> 
> Reminds me of what I heard about the LSID (life science id). 
> 
> Do we have in SEEK (EcoGrid?) a simple URI-based mechanism to identify 
> datasets, and URI-components that have different ways to "resolve" the 
> URI (similar to the above knb/xml trick)?
> 
> Bertram
> 
> >>>>> "MJ" == Matt Jones <jones at nceas.ucsb.edu> writes:
> MJ> 
> MJ> Hey Bertram,
> >> Looking at the EML file that Matt pointed us to, I see a lot of
> >> structural information e.g., here
> >> 
> MJ> http://metacat.nceas.ucsb.edu/knb/servlet/metacat?action=read&qformat=knb&docid=knb-lter-gce.23.4&displaymodule=entity&entitytype=dataTable&entityindex=1
> >> 
> >> and some temporal, spatial, and taxonomic coverage.
> >> 
> >> Is that were we should start?
> MJ> 
> MJ> Yeah, that's a good place to start. Note that the URL I sent is 
> MJ> delivering the MEL in HTML format, but you can easily switch it to 
> MJ> deliver all of the metadata in XML format by changing "qformat=knb" in 
> MJ> the URL to "qformat=xml" in the URL.  The XML format is obviously much 
> MJ> easier to machine parse, and it includes all of the metadata in one 
> MJ> document, rather than breaking it into two or more as the HTML view does.
> MJ> 
> MJ> Matt
> MJ> 
> MJ> Bertram Ludaescher wrote:
> >> Shawn: 
> >> 
> >> First, I wish we all had such nice design documents! Great!
> >> 
> >> Second, I agree with Rich and Matt that we need to make use of EML as
> >> much as we can. Also Rich's idea of "initializing" the semantic
> >> registration mapping from the EML info makes sense to me.
> >> 
> >> So much for the good news ;-)
> >> 
> >> Now I think we need to figure out a way to work with some actual EML
> >> examples and see how those and the "ad-hoc" examples that Shawn has
> >> come up with (or is coming up with) fit into a single framework.
> >> 
> >> The nice thing about Shawn's examples (e.g., in the DILS paper) is
> >> that they are simple and show the principle approach. 
> >> 
> >> Looking at the EML file that Matt pointed us to, I see a lot of
> >> structural information e.g., here
> >> http://metacat.nceas.ucsb.edu/knb/servlet/metacat?action=read&qformat=knb&docid=knb-lter-gce.23.4&displaymodule=entity&entitytype=dataTable&entityindex=1
> >> 
> >> and some temporal, spatial, and taxonomic coverage.
> >> 
> >> Is that were we should start?
> >> 
> >> Shawn, Rich, Matt?
> >> 
> >> Bertram
> >> 
> >> 
> >> 
> >> 
> >> 
> >>>>>>> "MJ" == Matt Jones <jones at nceas.ucsb.edu> writes:
> >> 
> MJ> 
> MJ> I think its a great approach, Rich.  EML does indeed have a lot of the 
> MJ> information that you would want in the the semantic representation, and 
> MJ> grabbing information out of EML woulod reveal a lot about 
> MJ> addiitons/changes to EML that would be useful.  There are a lot of 
> MJ> partial EML documents filled out, but more recently a few pretty 
> MJ> extensively filled out documents.  In particular, there are 181 data 
> MJ> sets in the KNB from the GCE LTER site that have extensive metadata, 
> MJ> including taxonomic, spatial,and temporal coverage, and full metadata on 
> MJ> the data tables.  And they have data available.  Starting with one or 
> MJ> more of these might be useful as a mapping exercise.  For example, this 
> MJ> data set --
> MJ> 
> MJ> http://metacat.nceas.ucsb.edu/knb/servlet/metacat?action=read&qformat=knb&docid=knb-lter-gce.23.4
> MJ> 
> MJ> -- (and others like it) contains abundance data that could be used by 
> MJ> the Garp algorithm if only a research could determine that the 
> MJ> relationships between what Garp requires and what is present in the data 
> MJ> set are met.
> MJ> 
> MJ> Matt
> MJ> 
> MJ> Rich Williams wrote:
> MJ> 
> >> 
> >>>> I agree that metadata to semantics is a generic issue, not just an issue for
> >>>> EML.  For example, I expect that we'll find it useful to grab the basic
> >>>> structure (syntax) of an actor from the MoML when semantically describing
> >>>> it.  For now, I think EML is particularly important since it's in use and
> >>>> has significant semantic content.  The ontologies currently provide a basic
> >>>> framework that should be able to handle the mapping, though I'm sure that
> >>>> implementing the mapping will reveal plenty of holes in the details.  I'm
> >>>> ready to work on it if there's consensus that this is an important
> >>>> direction.
> >>>> 
> >>>> Rich
> >>>> 
> >>>> 
> >>>> 
> >>>>> -----Original Message-----
> >>>>> From: Shawn Bowers [mailto:bowers at sdsc.edu]
> >>>>> Sent: Monday, April 05, 2004 9:32 PM
> >>>>> To: Rich Williams
> >>>>> Cc: seek-kr-sms at ecoinformatics.org; Dave Thau; Ilkay Altintas; Joseph
> >>>>> Goguen; Jenny Guilian WANG
> >>>>> Subject: Re: [seek-kr-sms] SMS Partial Design -- Request for Comments
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> This makes sense to me: do the metadata "harvesting" first to build the
> >>>>> initial "template" (or as you say, high-level mapping); then let the
> >>>>> data or service provider fill in the additional mapping as needed.
> >>>>> 
> >>>>> Note also that there may be different types of metadata: EML for
> >>>>> datasets (there are possibly others for datasets, but we seem focused on
> >>>>> EML) and MoML or WSDL for services.  Not sure how much can be obtained
> >>>>> from the service ones.
> >>>>> 
> >>>>> I also wonder if the ontologies are already close to being able to
> >>>>> handle the mapping from the high-level EML.  We should look into this.
> >>>>> 
> >>>>> Thanks,
> >>>>> Shawn
> >>>>> 
> >>>>> Rich Williams wrote:
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>>> Good stuff Shawn!  Here are a few comments on the registration
> >>>>> 
> >>>>> mapping part,
> >>>>> 
> >>>>> 
> >>>>>> mainly to do with EML.  I think it's important to leverage the
> >>>>> 
> >>>>> work done on
> >>>>> 
> >>>>> 
> >>>>>> EML and integrate it with the semantics.  We need to establish a mapping
> >>>>>> between EML and the OWL ontologies and capture the semantics that are
> >>>>>> implicit in EML.
> >>>>>> 
> >>>>>> I think that a lot of the semantic description of the dataset as a whole
> >>>>>> could be derived from the EML metadata, assuming it is
> >>>>> 
> >>>>> reasonably complete.
> >>>>> 
> >>>>> 
> >>>>>> For example, information about the spatial and temporal extent of the
> >>>>>> dataset and about the observed taxa should be in the metadata.
> >>>>> 
> >>>>> Then rather
> >>>>> 
> >>>>> 
> >>>>>> than handing the user an essentially empty mapping, we will
> >>>>> 
> >>>>> have initialized
> >>>>> 
> >>>>> 
> >>>>>> the mapping as far as possible from the EML metadata.
> >>>>>> 
> >>>>>> Given a data set with EML metadata, I see a two-stage semantic
> >>>>> 
> >>>>> registration:
> >>>>> 
> >>>>> 
> >>>>>> 1)	Automatically create high-level (data set and data table level) RDF
> >>>>>> individuals for an EML-described data set.  They will be useful
> >>>>> 
> >>>>> for allowing
> >>>>> 
> >>>>> 
> >>>>>> a high level search of a data set, which can be rejected if
> >>>>> 
> >>>>> there's nothing
> >>>>> 
> >>>>> 
> >>>>>> of interest in the RDF individuals before the more detailed semantic
> >>>>>> registration is used.
> >>>>>> 
> >>>>>> 2)	Create a lower-level semantic registration of individual
> >>>>> 
> >>>>> fields in a data
> >>>>> 
> >>>>> 
> >>>>>> table.  This will refer to the higher-level EML-based
> >>>>> 
> >>>>> individuals for parts
> >>>>> 
> >>>>> 
> >>>>>> of the context that do not change from field to field.  When doing a
> >>>>>> semantic query, these individuals will only need to be instantiated and
> >>>>>> queried if thtere is a higher-level match (#1 above).
> >>>>>> 
> >>>>>> Given this, in your document, I think it would make sense to
> >>>>> 
> >>>>> re-order the
> >>>>> 
> >>>>> 
> >>>>>> sequence proposed, so that step 6 happens before steps 2-5.
> >>>>>> 
> >>>>>> Rich
> >>>> 
> >>>> 
> >>>> _______________________________________________
> >>>> seek-kr-sms mailing list
> >>>> seek-kr-sms at ecoinformatics.org
> >>>> http://www.ecoinformatics.org/mailman/listinfo/seek-kr-sms
> >> 
> MJ> 
> MJ> 
> MJ> _______________________________________________
> MJ> seek-kr-sms mailing list
> MJ> seek-kr-sms at ecoinformatics.org
> MJ> http://www.ecoinformatics.org/mailman/listinfo/seek-kr-sms
> MJ> 
> MJ> -- 
> MJ> -------------------------------------------------------------------
> MJ> Matt Jones                                     jones at nceas.ucsb.edu
> MJ> http://www.nceas.ucsb.edu/    Fax: 425-920-2439    Ph: 907-789-0496
> MJ> National Center for Ecological Analysis and Synthesis (NCEAS)
> MJ> University of California Santa Barbara
> MJ> Interested in ecological informatics? http://www.ecoinformatics.org
> MJ> -------------------------------------------------------------------
> _______________________________________________
> seek-kr-sms mailing list
> seek-kr-sms at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/seek-kr-sms
--