[seek-dev] implement XQuery in EcoGrid and SRB

David Stockwell davids at sdsc.edu
Wed Jun 11 10:40:07 PDT 2003


Matt, others.

I'd like to bring OGC into the discussion. Would not the XML output of a 
GetCapabilities call
in an OGC WMS be another metadata standard to incorporate? 

For interest Yang has implemented Dave's WMS client  at 
http://landscape.sdsc.edu/~yyu/index.html
on a couple of test layers.  mgv0001 is coarse 0.167deg (~2MB) and 
hydro_dem is medium 0.01 deg (~2G)
for speed comparisons.

Now, do we need to install and input data into SRB to get the metadata 
functions?
It would be easier if the ecogrid infrastructure could just use the 
results of the GetCapabilities call.
Also, other OGC: WMS sites could be registered in the ecogrid.

Cheers

Matt Jones wrote:

> Bing,
>
> In earlier discussions about the ecogrid query language, we agreed 
> that we wanted to support multiple underlying metadata models.  So, 
> for example, both EML and Darwin Core.  The syntax we proposed before 
> explicitly allowed one to reference the model in the query.  For 
> example, in the listing below, we can explicitly differentiate 
> conditions that match EML  models and Dublin Core models:
>
> <egq:query queryId="test.1.1" system="test"
>     xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0alpha1">
>
>   <namespace prefix="eml" space="eml://ecoinformatics.org/eml-2.0.0"/>
>   <namespace prefix="dc"  space="dublinCoreNamespaceURI"/>
>   <title>Soils metadata query</title>
>   <AND>
>     <OR>
>       <condition operator="LIKE" concept="eml:title">%soil%</condition>
>       <condition operator="LIKE" concept="dc:title">%soil%</condition>
>     </OR>
>     <OR>
>       <condition operator="LIKE"
>                        concept="eml:surName">%Jones%</condition>
>       <condition operator="LIKE"
>                        concept="dc:Creator">%Vieglais%</condition>
>     </OR>
>   </AND>
> </egq:query>
>
> I agree with Dave that this is not a feature that we should lose if we 
> decide to move to an XQuery-based syntax.  A single common schema is 
> simply not realistic.  Here's an XQuery example that preserves this 
> capability, directly copied from the XQuery Use Case spec:
>
>   declare namespace xlink = "http://www.w3.org/1999/xlink"
>   <Q4>
>     {
>       for $hr in input()//@xlink:href
>       return <ns>{ $hr }</ns>
>     }
>   </Q4>
>
> We did discuss adding, at a later date, a query translation service 
> that makes use of the SMS to map a query expressed in terms of one 
> namespace (e.g., EML) to a query in another namespace (e.g., Dublin 
> Core), so that repositories that only support metadata in one format 
> might still be able to respond to the query.  But that is independent 
> of the query language design I think, and something that represents 
> second phase of design.
>
> On another issue, I think we need a better mechanism than a single 
> 'virtual' document to represent the various repositories.  The EcoGrid 
> registry will presumably have a (hopefully large) list of ecogrid 
> nodes and their corresponding capabilities (e.g., can export Darwin 
> Core records).  This registry should be able to be used to dynamically 
> determine which nodes to include in a query.  So, our query syntax 
> might need to support a mechanism for naming the nodes against which a 
> query should be run.  In our original spec we said the clients would 
> simply decide and invoke the query interface for each node at the 
> right URL. But the "IN" clause of XQuery is very closely related to 
> this.  In XQuery the concept of an "Input Sequence" is implementation 
> defined (see section 2.2 of the XQuery 1.0 spec).  This means that 
> each ecogrid node can decide how to implement the XQuery input 
> sequence functions (fn:input(), fn:collection(), and fn:doc()).  We'd 
> need to explore how these functions would be used to bind nodesets to 
> queries for several types of systems implementing the ecogrid 
> interface (such as srb and metacat and digir).
>
> For example, I think the "fn:collection()" function is closest in 
> spirit to what Bing is trying to accomplish in his example.  One could 
> imagine an XQuery like this:
>   declare namespace srb = "SRBMetadataURI"
>   for $e in
>     collection(srb://srb1.sdsc.edu//srb/home/bzhu.sdsc/designDocs)
>     where $e/@srb:objtype = “file” and $e/@srb:time lt date(“09-23-2002”)
>     return <dataname>$e/@srb:name</dataname>
>
> This would allow precise specification of the srb network to hit, 
> which collection to search, and the namespace bindings for metadata 
> query semantics and resultset semantics.
>
> Of course, this introduces substantial implementation overhead for 
> ecogrid implementors.  I'm still not convinced that going with XQuery 
> is a smart thing to do for us at this early stage of EcoGrid design.  
> This will certainly scare off most other implementors (e.g., we'll 
> probably be limited to srb, metacat, xanthoria, and digir 
> implementations, which isn't really our goal).  Our main problem is in 
> finding or designing a query language that doesn't require tremendous 
> rewriting of existing systems to accomodate new features that they 
> don't already support.
>
> Matt
>
> Dave Vieglais wrote:
>
>> Hi Bing,
>>
>> Your argument and examples below assume that all data sources 
>> contributing to the ecogrid shall conform to a common schema, which 
>> seems overly restrictive even though it does make building XQuery 
>> statements a bit easier since you are then working with a single schema.
>>
>> I'm not sure that this is the intent of the ecogrid implementation, 
>> but please correct me if I'm mistaken.
>>
>> regards,
>>   Dave V.
>>
>>
>> Bing Zhu wrote:
>>
>>> Peter and Jing,
>>>
>>> The whole EcoGrid data stored in metacat, SRB, (others), can be 
>>> viewed as a
>>> virtual XML document (or a DOM object). This DOM object has at least 
>>> two
>>> sub-nodes,
>>> metacat and srb.
>>>
>>> Since SRB is organized in collection and sub-collection hierarchical
>>> architecture, each collection is a subnode in the XML tree under
>>> (/EcogGrid)/srb. (Actually
>>> we can define a XML schema for our EcoGrid data. ). Thus we can 
>>> implement
>>> XQueries in
>>> our EcoGrid to act as a common query engine across different systems
>>> (metacat, SRB, etc.)
>>>
>>> I compiled some examples of XQueries to search documents in SRB.
>>>
>>> (1)    Search all design documents stored in SRB collection,
>>> /home/bzhu.sdsc/designDocs,
>>>       which were created before Sept 23, 2002.
>>>
>>>             for $e in 
>>> document(“EcoGrid.xml”)/srb/home/bzhu.sdsc/designDocs
>>>             where $e/@objtype = “file” and $e/@time lt 
>>> date(“09-23-2002”)
>>>             return <datasrc>SRB</datasrc><dataname>$e/@name</dataname>
>>>
>>> (2)    find all datasets stored in SRB which  have titles containing 
>>> “protein”
>>> and
>>>       are owned by Professor John whose user name is john.
>>>
>>>             for $e document(“EcoGrid.xml”)/srb/home//
>>>             where $e/@objtype = “file” and
>>>                   contains($e/@name like, “protein”) and
>>>                   contains($e/@owner, “john”)
>>>             return <datasrc>SRB</datasrc><dataname>$e/@name</dataname>
>>>
>>>
>>> Jing, Would you provide some examples (or info) regarding searching in
>>> metacat?
>>>
>>> And We also can start with designing a XML schema for whole EcoGrid 
>>> data
>>> model
>>> based on following (roughly).
>>>
>>> EcoGrid
>>>      Metacat
>>>>>>      SRB
>>>         collection (attribute: objtype, time, owner, …)
>>>              dataset (attribute: objtype, time, owner, size, container,
>>> resource, …)
>>>                   user-defined metadata
>>>      other data source
>>>
>>>
>>> Cheers,
>>> Bing
>>>
>>> =====================================================
>>> Bing Zhu
>>> San Diego Supercomputer Center
>>> bzhu at sdsc.edu
>>> (858)534-8373
>>> =====================================================
>>>
>>> _______________________________________________
>>> seek-dev mailing list
>>> seek-dev at ecoinformatics.org
>>> http://www.ecoinformatics.org/mailman/listinfo/seek-dev
>>>
>>
>


-- 
University of California, San Diego
NPACI/SDSC, MC 0505
9500 Gilman Dr, Bldg 109
La Jolla, CA 92093-0505
Tel: 858 8220942
Fax: 858 5345056
Web: http://biodi.sdsc.edu

An inconvenience is an adventure wrongly considered. 
--G. K. Chesterton 






More information about the Seek-dev mailing list