[seek-dev] RE: resultset question

Sun Apr 25 06:48:55 PDT 2004

Oops, it should have "name" instead of "path" (always "name")

Rod

Peter McCartney wrote:

> is it a typo or a meaningful difference that the metacat and srb 
> examples have a "name" attribute whereas the digir example has a 
> "path" attribute in the <returnField> element?
>  
>  
> Peter McCartney (peter.mccartney at asu.edu <mailto:peter.mccartney at asu.edu>)
> Center for Environmental-Studies
> Arizona State University
>  
>
>     -----Original Message-----
>     From: Rod Spears [mailto:rods at ku.edu]
>     Sent: Friday, April 23, 2004 11:00 AM
>     To: Chad Berkley
>     Cc: Peter McCartney; Saritha Bhandarkar; seek-dev; Jing Tao; Matt
>     Jones
>     Subject: Re: [seek-dev] RE: resultset question
>
>     Dave and I spent some time thinking about this and arrived at a
>     similar place as to #4, but took it a little further and changed
>     how the resultset is defines and made a minor change to the query.
>
>     The main issue has to do with the consumers of the resultset
>     coming back from an Ecogrid query.
>
>     How does a consumer interpret the results in a meaning way?
>     What can be done to help generic consumers and SMS?
>
>     The issue at the moment is that the contents of the <record>
>     element is basically a blob and anything goes. For example:
>     1) Metacat return a bunch of param elements contain the data
>     2) DiGIR contents a bunuch of namespace qualified elements
>     containing the data.
>     3) The SRB doesn't even have any data in the record, the
>     identifier attr is meaningful.
>
>     We need to provide a mechanism for the contents to be interpreted,
>     to do this we will add four things to the existing resultset schema:
>     1) One or more <namespace> elements the metadata - this will be
>     the namespace for the new <returnfield> element
>     2) Add a new element <returnfield>
>     3) A "name" attribute for the returnfield element (basically the
>     same as Peter 'xpath' att) which is a unique name within the
>     record and may be meaning for whereever the data came from.
>     4) A "type" attribute for the returnfield element that describe
>     the type of data contained in the returnfield
>
>     The most important and powerful part of the new additions is the
>     "type" attr. This enables the value to be interpreted. Most of the
>     time it can be described by a schema defintion type, for example
>     "xsi:string" etc. Or it could be an url that points to a schema
>     definition document. This means the value of the returnfield
>     element could be anything from a string or integer to an entire
>     XML document.
>
>     (Note that the namespace attr has been removed from the record
>     element)
>
>     The new namespace attrs in the metadata provide a way for the
>     value of the name attr and the type attr to be interpreted.
>
>     Here is an example of the a metacat resultset that is returned today:
>     <rs:resultset system="http://knb.ecoinformatics.org"
>     resultsetId="eml.001"
>       xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
>      
>     xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1
>     ../../src/xsd/resultset.xsd"> 
>       <resultsetMetadata>
>         <sendTime>2004-03-10T13:47:26-0600</sendTime>
>         <startRecord>1</startRecord>
>         <endRecord>14</endRecord>
>         <recordCount>14</recordCount>
>       </resultsetMetadata>
>       <record number="1"
>               system="http://dev.nceas.ucsb.edu"
>               identifier="obfs2.379.1"
>               namespace="eml://ecoinformatics.org/eml-2.0.0"
>               lastModifiedDate="2003-11-02T11:07:43-0600"
>               creationDate="2003-11-02T11:07:43-0600">
>           <param 
>     name="/eml/dataset/keywordSet/keyword">seasonality</param>
>           <param  name="/eml/dataset/keywordSet/keyword">macroalgal
>     bloom</param>
>           <param  name="/eml/dataset/keywordSet/keyword">green
>     tide</param>
>           <param  name="/eml/dataset/keywordSet/keyword">Ulva</param>
>           <param 
>     name="/eml/dataset/creator/individualName/surName">Nelson</param>
>           <param  name="/eml/dataset/keywordSet/keyword">biomass</param>
>           <param  name="/eml/dataset/keywordSet/keyword">algal
>     blooms</param>
>           <param  name="/eml/dataset/title">Armitage Bay Ulvoid Algal
>     Biomass and Species Composition</param>
>           <param 
>     name="/eml/dataset/keywordSet/keyword">Enteromorpha</param>
>           <param  name="/eml/dataset/keywordSet/keyword">Ulvaria</param>
>       </record>
>
>     Here is an example of the same resultset as described by the new
>     approach:
>     <rs:resultset system="http://knb.ecoinformatics.org"
>     resultsetId="eml.001"
>       xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
>      
>     xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1
>     ../../src/xsd/resultset.xsd"> 
>       <resultsetMetadata>
>         <sendTime>2004-03-10T13:47:26-0600</sendTime>
>         <startRecord>1</startRecord>
>         <endRecord>14</endRecord>
>         <recordCount>14</recordCount>
>         <namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
>         <namespace
>     prefix="xsi">http://www.w3.org/2001/XMLSchema-instance</namespace>
>       </resultsetMetadata>
>       <record number="1"
>               system="http://dev.nceas.ucsb.edu"
>               identifier="obfs2.379.1"
>               lastModifiedDate="2003-11-02T11:07:43-0600"
>               creationDate="2003-11-02T11:07:43-0600">
>           <returnfield name="/eml/dataset/keywordSet/keyword"
>     type="xsi:string">seasonality</returnfield>
>           <returnfield name="/eml/dataset/keywordSet/keyword"
>     type="xsi:string">macroalgal bloom</returnfield>
>           <returnfield name="/eml/dataset/keywordSet/keyword"
>     type="xsi:string">green tide</returnfield>
>           <returnfield name="/eml/dataset/keywordSet/keyword"
>     type="xsi:string">Ulva</returnfield>
>           <returnfield
>     name="/eml/dataset/creator/individualName/surName"
>     type="xsi:string">Nelson</returnfield>
>           <returnfield name="/eml/dataset/keywordSet/keyword"
>     type="xsi:string">biomass</returnfield>
>           <returnfield name="/eml/dataset/keywordSet/keyword"
>     type="xsi:string">algal blooms</returnfield>
>           <returnfield name="/eml/dataset/title"
>     type="xsi:string">Armitage Bay Ulvoid Algal Biomass and Species
>     Composition</returnfield>
>           <returnfield name="/eml/dataset/keywordSet/keyword"
>     type="xsi:string">Enteromorpha</returnfield>
>           <returnfield name="/eml/dataset/keywordSet/keyword"
>     type="xsi:string">Ulvaria</returnfield>
>       </record>
>
>     Note how we now can interpret the resultset in a much more
>     meaningful way. Also, note that there are two new namespace
>     elements, one contains a "prefix" attr the other does not. The one
>     without becaomes the default namespace for unqualified values in
>     the name and type attrs.
>
>     Here is the before and after for the DiGIR query:
>     Before:
>     <rs:resultset resultsetId="foo.1.1"
>         system="urn:not://sure/what/to/put/here"
>        
>     xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
>         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>        
>     xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1
>     ../../src/xsd/resultset.xsd">
>         <resultsetMetadata>
>             <sendTime>2003-05-02T16:45:50-09:00</sendTime>
>             <startRecord>1</startRecord>
>             <endRecord>2</endRecord>
>             <recordCount>2</recordCount>
>         </resultsetMetadata>
>          <record number="1"
>                 
>     system="http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2"
>
>                  identifier="mvz1"
>                 
>     namespace="http://digir.net/schema/conceptual/darwin/2003/1.0"
>                  lastModifiedDate="2003-03-03T10:42:13"
>                  creationDate="2003-03-03T10:42:13">
>             <darwin:ScientificName>PEROMYSCUS LEUCOPUS
>     NOVEBORACENSIS</darwin:ScientificName>
>             <darwin:Longitude>121</darwin:Longitude>
>             <darwin:Latitude>33</darwin:Latitude>
>          </record>
>
>     After:
>     <rs:resultset resultsetId="foo.1.1"
>         system="urn:not://sure/what/to/put/here"
>        
>     xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
>         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>        
>     xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1
>     ../../src/xsd/resultset.xsd">
>
>         <resultsetMetadata>
>             <sendTime>2003-05-02T16:45:50-09:00</sendTime>
>             <startRecord>1</startRecord>
>             <endRecord>2</endRecord>
>             <recordCount>2</recordCount>
>            
>     <namespace>http://digir.net/schema/conceptual/darwin/2003/1.0</namespace>
>             <namespace
>     prefix="xsi">http://www.w3.org/2001/XMLSchema-instance</namespace>
>         </resultsetMetadata>
>
>         <record number="1"
>                 
>     system="http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2"
>
>                  identifier="mvz1"
>                  lastModifiedDate="2003-03-03T10:42:13"
>                  creationDate="2003-03-03T10:42:13">
>             <returnfield path="ScientificName"
>     type="xsi:string">PEROMYSCUS LEUCOPUS NOVEBORACENSIS</returnfield>
>             <returnfield path="Longitude" type="xsi:int">121</returnfield>
>             <returnfield path="Latitude" type="xsi:int">33</returnfield>
>         </record>
>
>     Here is the SRB's before and after:
>     Before:
>     <rs:resultset system="http://knb.ecoinformatics.org"
>     resultsetId="SeekSRB_001"
>      xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1" 
>     >
>      <resultsetMetadata>
>        <sendTime>2004-04-16T11:02:12-0500</sendTime>
>        <startRecord>1</startRecord>
>        <endRecord>2</endRecord>
>        <recordCount>2</recordCount>
>      </resultsetMetadata>
>      <record number="1"
>              system="http://srb.sdsc.edu"
>              identifier="/home/testuser.sdsc/SeekTestArea/Lesli Model::0"
>              namespace="srb://srb.sdsc.edu"
>              lastModifiedDate="2003-11-30T13:04:59-0600"
>              creationDate="2003-11-30T13:04:58-0600">
>      </record>
>
>     After:
>     <rs:resultset system="http://knb.ecoinformatics.org"
>     resultsetId="SeekSRB_001"
>      xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1" 
>     >
>      <resultsetMetadata>
>        <sendTime>2004-04-16T11:02:12-0500</sendTime>
>        <startRecord>1</startRecord>
>        <endRecord>2</endRecord>
>        <recordCount>2</recordCount>
>        <namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
>      </resultsetMetadata>
>      <record number="1"
>              system="http://srb.sdsc.edu"
>              identifier="/home/testuser.sdsc/SeekTestArea/Lesli Model::0"
>              lastModifiedDate="2003-11-30T13:04:59-0600"
>              creationDate="2003-11-30T13:04:58-0600">
>       <returnfield name="location"
>     type="xsi:string">/home/testuser.sdsc/SeekTestArea/Lesli
>     Model::0</returnfield>
>      </record>
>     ------------------------------------------------------------------------
>     The Query
>     About the only difference between the old query and the new is
>     that is the returnfield value can concept attr values do not have
>     a namespace then the prefix should be dropped from the namespace
>     element , or they should have a namespace if there is a prefix in
>     the element. For example:
>
>     <?xml version="1.0" encoding="UTF-8"?>
>     <egq:query queryId="test.1.1" system="http://knb.ecoinformatics.org"
>         xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1"
>         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>        
>     xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1
>     ../../src/xsd/query.xsd">
>         <namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
>         <returnfield>/eml/dataset/title</returnfield>
>
>        
>     <returnfield>/eml/dataset/creator/individualName/surName</returnfield>
>         <returnfield>/eml/dataset/pubDate</returnfield>
>         <returnfield>/eml/dataset/keywordSet/keyword</returnfield>
>         <title>Soils metadata query</title>
>         <AND>
>             <OR>
>                 <condition operator="LIKE"
>     concept="title">%soil%</condition>
>                 <condition operator="NOT LIKE"
>     concept="title">%dirt%</condition>
>             </OR>
>             <OR>
>                 <condition operator="LIKE"
>     concept="surName">%Jones%</condition>
>                 <condition operator="LIKE"
>     concept="surName">%Vieglais%</condition>
>             </OR>
>         </AND>
>     </egq:query>
>     ------------------------------------------------------------------------
>
>     We can either discuss this via email, or think about it and
>     discuss it further during our phone meeting.
>
>     Rod
>
>
>     Chad Berkley wrote:
>
>>     Hi,
>>
>>     Sorry for my late reply...we've been busy with a morpho release. 
>>     thanks for getting me in gear, Rod.
>>
>>     In metacat, we only return leaf nodes (i.e. the text node child
>>     of a CDATA element like in response 4 below).  The returnfield
>>     functionality was originally meant as a convenient way to return
>>     enough information for a meaningful resultset to display, say, on
>>     a web page.  It was not meant to return whole document chunks for
>>     further processing.  I can see how this would be useful, but it
>>     would require returning a namespace defined chunk so that a
>>     parser would know what to do with it.  Metacat currently uses the
>>     returnfields to build the resultset table, then a request must be
>>     made for the whole document in order to do further processing.
>>
>>     Looking at the responses 1-3 below, to me, they are all invalid
>>     and potentially problematic.  without a namespace to parse those
>>     xml chunks off of, the parser is left to just do well-formedness
>>     checking and any query into these document chunks may fail
>>     because we don't know what to expect to get back before doing the
>>     processing (e.g. an xpath query).
>>
>>     So I guess to make a short answer long, I agree with Peter's
>>     assessment of sticking with response 4 (which is basically what
>>     metacat has done all along).
>>
>>     chad
>>
>>
>>     Rod Spears wrote:
>>
>>>     Is anyone better qualified than me, going to address Peter's
>>>     questions?
>>>
>>>     Please someone respond, thanks.
>>>
>>>     Rod
>>>
>>>
>>>     Peter McCartney wrote:
>>>
>>>>     it has to be well formed no matter what. so the question is
>>>>     really how can we identify a namespace for the result set when
>>>>     the content we stick in there has no hope of being valid?
>>>>     further, how can we define  a set of rules for how the results
>>>>     are to be evaluated against that namespace yet not be valid?
>>>>     request 1: '*/creator/individualName/surname', '/eml/dataset
>>>>      
>>>>     Rule1: "content must appear in minimal xml tree needed to
>>>>     accomodate the informaton"
>>>>      
>>>>     Rule2: "content must appear in a potentially valid xml tree
>>>>     that invalidates only due other required elements missing.
>>>>      
>>>>     rule 3 "conent must appear in a tree that placed in in correct
>>>>     node ancestry for the declared namespace.
>>>>      
>>>>      
>>>>     response 1: meets 1 and 3 and is well formed. Requires just
>>>>     knowledge of parent ancestry to build.
>>>>     <eml>
>>>>         <dataset>
>>>>         <creator>
>>>>             <individualName>
>>>>                     <surname>mccartney</surname>
>>>>                     <surname>jones</surname>
>>>>             </individualname>
>>>>         </creator>
>>>>     </dataset>
>>>>     <eml>
>>>>      
>>>>     response 2: meets 1, 2 and 3 and is well formed. Requires
>>>>     knowledge of ancestry and index (ie jones is in creator[2] of
>>>>     dataset[1] )
>>>>     <eml>
>>>>         <dataset>
>>>>         <creator>
>>>>             <individualName>
>>>>                     <surname>mccartney</surname>
>>>>             </individualname>
>>>>         </creator>
>>>>         <creator>
>>>>             <individualName>
>>>>                     <surname>jones</surname>
>>>>             </individualname>
>>>>         </creator>
>>>>       </dataset>
>>>>     <eml>
>>>>      
>>>>      
>>>>     response 3: meets 3 and is not well formed. rquires knowledge
>>>>     of ancestry.
>>>>      
>>>>     <eml>
>>>>         <dataset>
>>>>         <creator>
>>>>             <individualName>
>>>>                     <surname>mccartney</surname>
>>>>             </individualname>
>>>>         </creator>
>>>>     </dataset>
>>>>     <eml>
>>>>         <dataset>
>>>>         <creator>
>>>>             <individualName>
>>>>                     <surname>jones</surname>
>>>>             </individualname>
>>>>         </creator>
>>>>     </dataset>
>>>>     </eml>
>>>>      
>>>>     and just a reminder of where we originally started from
>>>>     (approximately)  
>>>>     reponse 4: meets no rule, cannot validated, but conveys all the
>>>>     information to generate format 1 or 3 above using a string
>>>>     tokenizer and a jDOM. but not option 2.
>>>>     <resultset namespace=eml......>
>>>>         <returnfield
>>>>     xpath="dataset/creator/individualname/surname">mccartney</returnfield>
>>>>
>>>>         <returnfield
>>>>     xpath="dataset/creator/individualname/surname">jones</returnfield>
>>>>     </resultset>
>>>>      
>>>>     I think we should really ask whether we are making ourselves
>>>>     deal with some very complicated rules for really no gain in
>>>>     functionality. None of the results will be valid according to
>>>>     the name space. All of them are valid if i make up my own
>>>>     namespace for the result set.  Unless we can hold our selves to
>>>>     the standard where any code or xsl written for the schema will
>>>>     successfuly process the result set (#2 is the closest to that,
>>>>     but depending on how loose the code is, all three could work or
>>>>     none could work), why shouldnt we opt for the easiest rule to
>>>>     comply with?
>>>>      
>>>>      
>>>>     Peter McCartney (peter.mccartney at asu.edu
>>>>     <mailto:peter.mccartney at asu.edu>)
>>>>     Center for Environmental-Studies
>>>>     Arizona State University
>>>>      
>>>>
>>>>         -----Original Message-----
>>>>         *From:* Saritha Bhandarkar
>>>>         *Sent:* Friday, April 09, 2004 10:28 AM
>>>>         *To:* 'seek-dev'
>>>>         *Cc:* Jing Tao; Peter McCartney; Saritha Bhandarkar
>>>>         *Subject:* resultset question
>>>>
>>>>         Hi,
>>>>
>>>>         I had a question about the resultset to be returned by
>>>>     Xanthoria.
>>>>
>>>>         The schema of the resultset specifies that a record is of type
>>>>         ?AnyRecordType? and optionally it may have some element
>>>>     content
>>>>         from the record. Now, my question here is, if I am to
>>>>     return the
>>>>         elements specified in the <returnfields> of the query, for
>>>>     the matching records (that is from the matching
>>>>         eml file), do I need to send it in eml format,  with only
>>>>     relevant
>>>>         values for requested fields and no values for the fields
>>>>     which are
>>>>         not requested? Or is it enough to return only the requested
>>>>     fields
>>>>         with their values, as well-formed xml? Can someone please
>>>>     brief me
>>>>         on the contents of a record in resultsetType?
>>>>
>>>>         Thanks,
>>>>
>>>>         Saritha
>>>>
>>>>         
>>>>         
>>>>         
>>>>         
>>>>         Saritha Bhandarkar
>>>>
>>>>         Research Assistant
>>>>
>>>>         Center for Environmental Studies
>>>>
>>>>         ASU-Tempe AZ
>>>>
>>>>         saritha.bhandarkar at asu.edu <mailto:saritha.bhandarkar at asu.edu>
>>>>
>>>>         
>>>>         
>>>
>>>
>>>     -- 
>>>     Rod Spears
>>>     Biodiversity Research Center
>>>     University of Kansas
>>>     1345 Jayhawk Boulevard
>>>     Lawrence, KS 66045, USA
>>>     Tel: 785 864-4082, Fax: 785 864-5335
>>>
>>
>>

-- 
	Rod Spears
Biodiversity Research Center
University of Kansas
1345 Jayhawk Boulevard
Lawrence, KS 66045, USA
Tel: 785 864-4082, Fax: 785 864-5335

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-dev/attachments/20040425/4dba39fe/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jayhawk.gif
Type: image/gif
Size: 4637 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-dev/attachments/20040425/4dba39fe/jayhawk.gif