[seek-dev] RE: resultset question

Mon Apr 26 09:08:33 PDT 2004

Ok thought so. 

Im not sure why we need the type attribute. I understand that the user
needs to know how to interpret the data type of the field since they are
all coming back as string in the xml, but dont they already know this?
this schema is for defining the return from a request that presumably
some agent (person or software) has constructed using knowledge of the
resources schema and has selected certain fields to be returned. Wouldnt
they thus have to already know the data types of the return fields in
order for them to have requested them in the first place? Thats
certainly been the case with the (limited) apps we've built using
xanthoria so far.

Peter McCartney (peter.mccartney at asu.edu)
Center for Environmental-Studies
Arizona State University

	-----Original Message-----
	From: Rod Spears [mailto:rods at ku.edu] 
	Sent: Sunday, April 25, 2004 6:49 AM
	To: Peter McCartney
	Cc: Chad Berkley; Saritha Bhandarkar; seek-dev; Jing Tao; Matt
Jones
	Subject: Re: [seek-dev] RE: resultset question

	Oops, it should have "name" instead of "path" (always "name")

	Rod

	Peter McCartney wrote:

		is it a typo or a meaningful difference that the metacat
and srb examples have a "name" attribute whereas the digir example has a
"path" attribute in the <returnField> element?

		Peter McCartney (peter.mccartney at asu.edu)
		Center for Environmental-Studies
		Arizona State University

			-----Original Message-----
			From: Rod Spears [mailto:rods at ku.edu] 
			Sent: Friday, April 23, 2004 11:00 AM
			To: Chad Berkley
			Cc: Peter McCartney; Saritha Bhandarkar;
seek-dev; Jing Tao; Matt Jones
			Subject: Re: [seek-dev] RE: resultset question

			Dave and I spent some time thinking about this
and arrived at a similar place as to #4, but took it a little further
and changed how the resultset is defines and made a minor change to the
query.

			The main issue has to do with the consumers of
the resultset coming back from an Ecogrid query. 

			How does a consumer interpret the results in a
meaning way?
			What can be done to help generic consumers and
SMS?

			The issue at the moment is that the contents of
the <record> element is basically a blob and anything goes. For example:
			1) Metacat return a bunch of param elements
contain the data
			2) DiGIR contents a bunuch of namespace
qualified elements containing the data.
			3) The SRB doesn't even have any data in the
record, the identifier attr is meaningful.

			We need to provide a mechanism for the contents
to be interpreted, to do this we will add four things to the existing
resultset schema:
			1) One or more <namespace> elements the metadata
- this will be the namespace for the new <returnfield> element
			2) Add a new element <returnfield>
			3) A "name" attribute for the returnfield
element (basically the same as Peter 'xpath' att) which is a unique name
within the record and may be meaning for whereever the data came from.
			4) A "type" attribute for the returnfield
element that describe the type of data contained in the returnfield 

			The most important and powerful part of the new
additions is the "type" attr. This enables the value to be interpreted.
Most of the time it can be described by a schema defintion type, for
example "xsi:string" etc. Or it could be an url that points to a schema
definition document. This means the value of the returnfield element
could be anything from a string or integer to an entire XML document.

			(Note that the namespace attr has been removed
from the record element)

			The new namespace attrs in the metadata provide
a way for the value of the name attr and the type attr to be
interpreted.

			Here is an example of the a metacat resultset
that is returned today:
			<rs:resultset system=
"http://knb.ecoinformatics.org" <http://knb.ecoinformatics.org>
resultsetId="eml.001"

xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"

xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0
beta1 ../../src/xsd/resultset.xsd">  
			  <resultsetMetadata>

<sendTime>2004-03-10T13:47:26-0600</sendTime>
			    <startRecord>1</startRecord>
			    <endRecord>14</endRecord>
			    <recordCount>14</recordCount>
			  </resultsetMetadata>
			  <record number="1"
			          system="http://dev.nceas.ucsb.edu"
<http://dev.nceas.ucsb.edu> 
			          identifier="obfs2.379.1"

namespace="eml://ecoinformatics.org/eml-2.0.0"

lastModifiedDate="2003-11-02T11:07:43-0600"

creationDate="2003-11-02T11:07:43-0600">
			      <param
name="/eml/dataset/keywordSet/keyword">seasonality</param>
			      <param
name="/eml/dataset/keywordSet/keyword">macroalgal bloom</param>
			      <param
name="/eml/dataset/keywordSet/keyword">green tide</param>
			      <param
name="/eml/dataset/keywordSet/keyword">Ulva</param>
			      <param
name="/eml/dataset/creator/individualName/surName">Nelson</param>
			      <param
name="/eml/dataset/keywordSet/keyword">biomass</param>
			      <param
name="/eml/dataset/keywordSet/keyword">algal blooms</param>
			      <param  name="/eml/dataset/title">Armitage
Bay Ulvoid Algal Biomass and Species Composition</param>
			      <param
name="/eml/dataset/keywordSet/keyword">Enteromorpha</param>
			      <param
name="/eml/dataset/keywordSet/keyword">Ulvaria</param>
			  </record>

			Here is an example of the same resultset as
described by the new approach:
			<rs:resultset system=
"http://knb.ecoinformatics.org" <http://knb.ecoinformatics.org>
resultsetId="eml.001"

xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"

xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0
beta1 ../../src/xsd/resultset.xsd">  
			  <resultsetMetadata>

<sendTime>2004-03-10T13:47:26-0600</sendTime>
			    <startRecord>1</startRecord>
			    <endRecord>14</endRecord>
			    <recordCount>14</recordCount>

<namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
			    <namespace prefix="xsi">
http://www.w3.org/2001/XMLSchema-instance</namespace>
			  </resultsetMetadata>
			  <record number="1"
			          system="http://dev.nceas.ucsb.edu"
<http://dev.nceas.ucsb.edu> 
			          identifier="obfs2.379.1"

lastModifiedDate="2003-11-02T11:07:43-0600"

creationDate="2003-11-02T11:07:43-0600">
			      <returnfield
name="/eml/dataset/keywordSet/keyword"
type="xsi:string">seasonality</returnfield>
			      <returnfield
name="/eml/dataset/keywordSet/keyword" type="xsi:string">macroalgal
bloom</returnfield>
			      <returnfield
name="/eml/dataset/keywordSet/keyword" type="xsi:string">green
tide</returnfield>
			      <returnfield
name="/eml/dataset/keywordSet/keyword"
type="xsi:string">Ulva</returnfield>
			      <returnfield
name="/eml/dataset/creator/individualName/surName"
type="xsi:string">Nelson</returnfield>
			      <returnfield
name="/eml/dataset/keywordSet/keyword"
type="xsi:string">biomass</returnfield>
			      <returnfield
name="/eml/dataset/keywordSet/keyword" type="xsi:string">algal
blooms</returnfield>
			      <returnfield name="/eml/dataset/title"
type="xsi:string">Armitage Bay Ulvoid Algal Biomass and Species
Composition</returnfield>
			      <returnfield
name="/eml/dataset/keywordSet/keyword"
type="xsi:string">Enteromorpha</returnfield>
			      <returnfield
name="/eml/dataset/keywordSet/keyword"
type="xsi:string">Ulvaria</returnfield>
			  </record>

			Note how we now can interpret the resultset in a
much more meaningful way. Also, note that there are two new namespace
elements, one contains a "prefix" attr the other does not. The one
without becaomes the default namespace for unqualified values in the
name and type attrs.

			Here is the before and after for the DiGIR
query:
			Before:
			<rs:resultset resultsetId="foo.1.1" 
			    system="urn:not://sure/what/to/put/here" 

xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
			    xmlns:xsi=
"http://www.w3.org/2001/XMLSchema-instance"
<http://www.w3.org/2001/XMLSchema-instance>  

xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0
beta1 ../../src/xsd/resultset.xsd">
			    <resultsetMetadata>

<sendTime>2003-05-02T16:45:50-09:00</sendTime>
			        <startRecord>1</startRecord>
			        <endRecord>2</endRecord>
			        <recordCount>2</recordCount>
			    </resultsetMetadata>
			     <record number="1" 
			             system=
"http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2"
<http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2>  
			             identifier="mvz1" 
			             namespace=
"http://digir.net/schema/conceptual/darwin/2003/1.0"
<http://digir.net/schema/conceptual/darwin/2003/1.0> 

lastModifiedDate="2003-03-03T10:42:13" 
			             creationDate="2003-03-03T10:42:13">
			        <darwin:ScientificName>PEROMYSCUS
LEUCOPUS NOVEBORACENSIS</darwin:ScientificName>
			        <darwin:Longitude>121</darwin:Longitude>
			        <darwin:Latitude>33</darwin:Latitude>
			     </record>

			After:
			<rs:resultset resultsetId="foo.1.1" 
			    system="urn:not://sure/what/to/put/here" 

xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
			    xmlns:xsi=
"http://www.w3.org/2001/XMLSchema-instance"
<http://www.w3.org/2001/XMLSchema-instance>  

xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0
beta1 ../../src/xsd/resultset.xsd">

			    <resultsetMetadata>

<sendTime>2003-05-02T16:45:50-09:00</sendTime>
			        <startRecord>1</startRecord>
			        <endRecord>2</endRecord>
			        <recordCount>2</recordCount>
			        <namespace>
http://digir.net/schema/conceptual/darwin/2003/1.0</namespace>
			        <namespace prefix="xsi">
http://www.w3.org/2001/XMLSchema-instance</namespace>
			    </resultsetMetadata>

			    <record number="1" 
			             system=
"http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2"
<http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2>  
			             identifier="mvz1" 

lastModifiedDate="2003-03-03T10:42:13" 
			             creationDate="2003-03-03T10:42:13">
			        <returnfield path="ScientificName"
type="xsi:string">PEROMYSCUS LEUCOPUS NOVEBORACENSIS</returnfield>
			        <returnfield path="Longitude"
type="xsi:int">121</returnfield>
			        <returnfield path="Latitude"
type="xsi:int">33</returnfield>
			    </record>

			Here is the SRB's before and after:
			Before:
			<rs:resultset system=
"http://knb.ecoinformatics.org" <http://knb.ecoinformatics.org>
resultsetId="SeekSRB_001" 

xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"  > 
			 <resultsetMetadata> 
			   <sendTime>2004-04-16T11:02:12-0500</sendTime>

			   <startRecord>1</startRecord> 
			   <endRecord>2</endRecord> 
			   <recordCount>2</recordCount> 
			 </resultsetMetadata> 
			 <record number="1" 
			         system="http://srb.sdsc.edu"
<http://srb.sdsc.edu>  

identifier="/home/testuser.sdsc/SeekTestArea/Lesli Model::0" 
			         namespace="srb://srb.sdsc.edu" 

lastModifiedDate="2003-11-30T13:04:59-0600" 

creationDate="2003-11-30T13:04:58-0600"> 
			 </record>

			After:
			<rs:resultset system=
"http://knb.ecoinformatics.org" <http://knb.ecoinformatics.org>
resultsetId="SeekSRB_001" 

xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"  > 
			 <resultsetMetadata> 
			   <sendTime>2004-04-16T11:02:12-0500</sendTime>

			   <startRecord>1</startRecord> 
			   <endRecord>2</endRecord> 
			   <recordCount>2</recordCount>

<namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
			 </resultsetMetadata> 
			 <record number="1" 
			         system="http://srb.sdsc.edu"
<http://srb.sdsc.edu>  

identifier="/home/testuser.sdsc/SeekTestArea/Lesli Model::0" 

lastModifiedDate="2003-11-30T13:04:59-0600" 

creationDate="2003-11-30T13:04:58-0600">
			  <returnfield name="location"
type="xsi:string">/home/testuser.sdsc/SeekTestArea/Lesli
Model::0</returnfield>
			 </record>

  _____  

			The Query
			About the only difference between the old query
and the new is that is the returnfield value can concept attr values do
not have a namespace then the prefix should be dropped from the
namespace element , or they should have a namespace if there is a prefix
in the element. For example:

			<?xml version="1.0" encoding="UTF-8"?>
			<egq:query queryId="test.1.1" system=
"http://knb.ecoinformatics.org" <http://knb.ecoinformatics.org>  

xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1" 
			    xmlns:xsi=
"http://www.w3.org/2001/XMLSchema-instance"
<http://www.w3.org/2001/XMLSchema-instance>  

xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta
1 ../../src/xsd/query.xsd">

<namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>

<returnfield>/eml/dataset/title</returnfield>

<returnfield>/eml/dataset/creator/individualName/surName</returnfield>

<returnfield>/eml/dataset/pubDate</returnfield>

<returnfield>/eml/dataset/keywordSet/keyword</returnfield>
			    <title>Soils metadata query</title>
			    <AND>
			        <OR>
			            <condition operator="LIKE"
concept="title">%soil%</condition>
			            <condition operator="NOT LIKE"
concept="title">%dirt%</condition>
			        </OR>
			        <OR>
			            <condition operator="LIKE"
concept="surName">%Jones%</condition>
			            <condition operator="LIKE"
concept="surName">%Vieglais%</condition>
			        </OR>
			    </AND>
			</egq:query>

  _____  

			We can either discuss this via email, or think
about it and discuss it further during our phone meeting.

			Rod

			Chad Berkley wrote:

				Hi, 

				Sorry for my late reply...we've been
busy with a morpho release.  thanks for getting me in gear, Rod. 

				In metacat, we only return leaf nodes
(i.e. the text node child of a CDATA element like in response 4 below).
The returnfield functionality was originally meant as a convenient way
to return enough information for a meaningful resultset to display, say,
on a web page.  It was not meant to return whole document chunks for
further processing.  I can see how this would be useful, but it would
require returning a namespace defined chunk so that a parser would know
what to do with it.  Metacat currently uses the returnfields to build
the resultset table, then a request must be made for the whole document
in order to do further processing. 

				Looking at the responses 1-3 below, to
me, they are all invalid and potentially problematic.  without a
namespace to parse those xml chunks off of, the parser is left to just
do well-formedness checking and any query into these document chunks may
fail because we don't know what to expect to get back before doing the
processing (e.g. an xpath query). 

				So I guess to make a short answer long,
I agree with Peter's assessment of sticking with response 4 (which is
basically what metacat has done all along). 

				chad 

				Rod Spears wrote: 

				Is anyone better qualified than me,
going to address Peter's questions? 

				Please someone respond, thanks. 

				Rod 

				Peter McCartney wrote: 

				it has to be well formed no matter what.
so the question is really how can we identify a namespace for the result
set when the content we stick in there has no hope of being valid?
further, how can we define  a set of rules for how the results are to be
evaluated against that namespace yet not be valid? 
				request 1:
'*/creator/individualName/surname', '/eml/dataset 

				Rule1: "content must appear in minimal
xml tree needed to accomodate the informaton" 

				Rule2: "content must appear in a
potentially valid xml tree that invalidates only due other required
elements missing. 

				rule 3 "conent must appear in a tree
that placed in in correct node ancestry for the declared namespace. 

				response 1: meets 1 and 3 and is well
formed. Requires just knowledge of parent ancestry to build. 
				<eml> 
				    <dataset> 
				    <creator> 
				        <individualName> 

<surname>mccartney</surname> 
				                <surname>jones</surname>

				        </individualname> 
				    </creator> 
				</dataset> 
				<eml> 

				response 2: meets 1, 2 and 3 and is well
formed. Requires knowledge of ancestry and index (ie jones is in
creator[2] of dataset[1] ) 
				<eml> 
				    <dataset> 
				    <creator> 
				        <individualName> 

<surname>mccartney</surname> 
				        </individualname> 
				    </creator> 
				    <creator> 
				        <individualName> 
				                <surname>jones</surname>

				        </individualname> 
				    </creator> 
				  </dataset> 
				<eml> 

				response 3: meets 3 and is not well
formed. rquires knowledge of ancestry. 

				<eml> 
				    <dataset> 
				    <creator> 
				        <individualName> 

<surname>mccartney</surname> 
				        </individualname> 
				    </creator> 
				</dataset> 
				<eml> 
				    <dataset> 
				    <creator> 
				        <individualName> 
				                <surname>jones</surname>

				        </individualname> 
				    </creator> 
				</dataset> 
				</eml> 

				and just a reminder of where we
originally started from (approximately)   
				reponse 4: meets no rule, cannot
validated, but conveys all the information to generate format 1 or 3
above using a string tokenizer and a jDOM. but not option 2. 
				<resultset namespace=eml......> 
				    <returnfield
xpath="dataset/creator/individualname/surname">mccartney</returnfield> 
				    <returnfield
xpath="dataset/creator/individualname/surname">jones</returnfield> 
				</resultset> 

				I think we should really ask whether we
are making ourselves deal with some very complicated rules for really no
gain in functionality. None of the results will be valid according to
the name space. All of them are valid if i make up my own namespace for
the result set.  Unless we can hold our selves to the standard where any
code or xsl written for the schema will successfuly process the result
set (#2 is the closest to that, but depending on how loose the code is,
all three could work or none could work), why shouldnt we opt for the
easiest rule to comply with? 

				Peter McCartney (peter.mccartney at asu.edu
<mailto:peter.mccartney at asu.edu> <mailto:peter.mccartney at asu.edu> ) 
				Center for Environmental-Studies 
				Arizona State University 

				    -----Original Message----- 
				    *From:* Saritha Bhandarkar 
				    *Sent:* Friday, April 09, 2004 10:28
AM 
				    *To:* 'seek-dev' 
				    *Cc:* Jing Tao; Peter McCartney;
Saritha Bhandarkar 
				    *Subject:* resultset question 

				    Hi, 

				    I had a question about the resultset
to be returned by Xanthoria. 

				    The schema of the resultset
specifies that a record is of type 
				    ?AnyRecordType? and optionally it
may have some element content 
				    from the record. Now, my question
here is, if I am to return the 
				    elements specified in the
<returnfields> of the query, for the matching records (that is from the
matching 
				    eml file), do I need to send it in
eml format,  with only relevant 
				    values for requested fields and no
values for the fields which are 
				    not requested? Or is it enough to
return only the requested fields 
				    with their values, as well-formed
xml? Can someone please brief me 
				    on the contents of a record in
resultsetType? 

				    Thanks, 

				    Saritha 

				    Saritha Bhandarkar 

				    Research Assistant 

				    Center for Environmental Studies 

				    ASU-Tempe AZ 

				    saritha.bhandarkar at asu.edu 
<mailto:saritha.bhandarkar at asu.edu> <mailto:saritha.bhandarkar at asu.edu>

				-- 
				Rod Spears 
				Biodiversity Research Center 
				University of Kansas 
				1345 Jayhawk Boulevard 
				Lawrence, KS 66045, USA 
				Tel: 785 864-4082, Fax: 785 864-5335 

	-- 

  	Rod Spears
Biodiversity Research Center
University of Kansas
1345 Jayhawk Boulevard
Lawrence, KS 66045, USA
Tel: 785 864-4082, Fax: 785 864-5335 	

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-dev/attachments/20040426/5654b3e8/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 4637 bytes
Desc: jayhawk.gif
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-dev/attachments/20040426/5654b3e8/attachment.gif