[seek-dev] RE: resultset question
Peter McCartney
peter.mccartney at asu.edu
Tue May 4 12:25:14 PDT 2004
Heres what i was just saying...
<rs:resultset system="http://knb.ecoinformatics.org"
<http://knb.ecoinformatics.org/> resultsetId="eml.001"
xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0
beta1 ../../src/xsd/resultset.xsd">
<resultsetMetadata>
<sendTime>2004-03-10T13:47:26-0600</sendTime>
<startRecord>1</startRecord>
<endRecord>14</endRecord>
<recordCount>14</recordCount>
<namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
<namespace prefix="xsi">http://www.w3.org/2001/XMLSchema-instance
<http://www.w3.org/2001/XMLSchema-instance> </namespace>
<recordStructure namespace="eml://ecoinformatics.org/eml-2.0.0">
<namespace
prefix="xsi">http://www.w3.org/2001/XMLSchema-instance
<http://www.w3.org/2001/XMLSchema-instance> </namespace>
<returnfield id="field1" name="/eml/dataset/keywordSet/keyword"
<returnfield id="field2"
name="/eml/dataset/creator/individualname/surname" type="xsi:string"/>
</recordStructure>
</resultsetMetadata>
<record number="1"
system="http://dev.nceas.ucsb.edu"
<http://dev.nceas.ucsb.edu/>
identifier="obfs2.379.1"
lastModifiedDate="2003-11-02T11:07:43-0600"
creationDate="2003-11-02T11:07:43-0600">
<returnfield id="field1">arizona</returnfield>
<returnfield id="field1">chandler</returnfield>
<returnfield id="field2">McCartney</returnfield>
<
</record>
Peter McCartney (peter.mccartney at asu.edu)
Center for Environmental-Studies
Arizona State University
-----Original Message-----
From: seek-dev-admin at ecoinformatics.org
[mailto:seek-dev-admin at ecoinformatics.org] On Behalf Of Peter McCartney
Sent: Friday, April 23, 2004 11:42 AM
To: Rod Spears; Chad Berkley
Cc: Saritha Bhandarkar; seek-dev; Jing Tao; Matt Jones
Subject: RE: [seek-dev] RE: resultset question
is it a typo or a meaningful difference that the metacat and srb
examples have a "name" attribute whereas the digir example has a "path"
attribute in the <returnField> element?
Peter McCartney (peter.mccartney at asu.edu)
Center for Environmental-Studies
Arizona State University
-----Original Message-----
From: Rod Spears [mailto:rods at ku.edu]
Sent: Friday, April 23, 2004 11:00 AM
To: Chad Berkley
Cc: Peter McCartney; Saritha Bhandarkar; seek-dev; Jing
Tao; Matt Jones
Subject: Re: [seek-dev] RE: resultset question
Dave and I spent some time thinking about this and
arrived at a similar place as to #4, but took it a little further and
changed how the resultset is defines and made a minor change to the
query.
The main issue has to do with the consumers of the
resultset coming back from an Ecogrid query.
How does a consumer interpret the results in a meaning
way?
What can be done to help generic consumers and SMS?
The issue at the moment is that the contents of the
<record> element is basically a blob and anything goes. For example:
1) Metacat return a bunch of param elements contain the
data
2) DiGIR contents a bunuch of namespace qualified
elements containing the data.
3) The SRB doesn't even have any data in the record, the
identifier attr is meaningful.
We need to provide a mechanism for the contents to be
interpreted, to do this we will add four things to the existing
resultset schema:
1) One or more <namespace> elements the metadata - this
will be the namespace for the new <returnfield> element
2) Add a new element <returnfield>
3) A "name" attribute for the returnfield element
(basically the same as Peter 'xpath' att) which is a unique name within
the record and may be meaning for whereever the data came from.
4) A "type" attribute for the returnfield element that
describe the type of data contained in the returnfield
The most important and powerful part of the new
additions is the "type" attr. This enables the value to be interpreted.
Most of the time it can be described by a schema defintion type, for
example "xsi:string" etc. Or it could be an url that points to a schema
definition document. This means the value of the returnfield element
could be anything from a string or integer to an entire XML document.
(Note that the namespace attr has been removed from the
record element)
The new namespace attrs in the metadata provide a way
for the value of the name attr and the type attr to be interpreted.
Here is an example of the a metacat resultset that is
returned today:
<rs:resultset system="http://knb.ecoinformatics.org"
<http://knb.ecoinformatics.org> resultsetId="eml.001"
xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0
beta1 ../../src/xsd/resultset.xsd">
<resultsetMetadata>
<sendTime>2004-03-10T13:47:26-0600</sendTime>
<startRecord>1</startRecord>
<endRecord>14</endRecord>
<recordCount>14</recordCount>
</resultsetMetadata>
<record number="1"
system="http://dev.nceas.ucsb.edu"
<http://dev.nceas.ucsb.edu>
identifier="obfs2.379.1"
namespace="eml://ecoinformatics.org/eml-2.0.0"
lastModifiedDate="2003-11-02T11:07:43-0600"
creationDate="2003-11-02T11:07:43-0600">
<param
name="/eml/dataset/keywordSet/keyword">seasonality</param>
<param
name="/eml/dataset/keywordSet/keyword">macroalgal bloom</param>
<param
name="/eml/dataset/keywordSet/keyword">green tide</param>
<param
name="/eml/dataset/keywordSet/keyword">Ulva</param>
<param
name="/eml/dataset/creator/individualName/surName">Nelson</param>
<param
name="/eml/dataset/keywordSet/keyword">biomass</param>
<param
name="/eml/dataset/keywordSet/keyword">algal blooms</param>
<param name="/eml/dataset/title">Armitage Bay
Ulvoid Algal Biomass and Species Composition</param>
<param
name="/eml/dataset/keywordSet/keyword">Enteromorpha</param>
<param
name="/eml/dataset/keywordSet/keyword">Ulvaria</param>
</record>
Here is an example of the same resultset as described by
the new approach:
<rs:resultset system="http://knb.ecoinformatics.org"
<http://knb.ecoinformatics.org> resultsetId="eml.001"
xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0
beta1 ../../src/xsd/resultset.xsd">
<resultsetMetadata>
<sendTime>2004-03-10T13:47:26-0600</sendTime>
<startRecord>1</startRecord>
<endRecord>14</endRecord>
<recordCount>14</recordCount>
<namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
<namespace
prefix="xsi">http://www.w3.org/2001/XMLSchema-instance</namespace>
</resultsetMetadata>
<record number="1"
system="http://dev.nceas.ucsb.edu"
<http://dev.nceas.ucsb.edu>
identifier="obfs2.379.1"
lastModifiedDate="2003-11-02T11:07:43-0600"
creationDate="2003-11-02T11:07:43-0600">
<returnfield
name="/eml/dataset/keywordSet/keyword"
type="xsi:string">seasonality</returnfield>
<returnfield
name="/eml/dataset/keywordSet/keyword" type="xsi:string">macroalgal
bloom</returnfield>
<returnfield
name="/eml/dataset/keywordSet/keyword" type="xsi:string">green
tide</returnfield>
<returnfield
name="/eml/dataset/keywordSet/keyword"
type="xsi:string">Ulva</returnfield>
<returnfield
name="/eml/dataset/creator/individualName/surName"
type="xsi:string">Nelson</returnfield>
<returnfield
name="/eml/dataset/keywordSet/keyword"
type="xsi:string">biomass</returnfield>
<returnfield
name="/eml/dataset/keywordSet/keyword" type="xsi:string">algal
blooms</returnfield>
<returnfield name="/eml/dataset/title"
type="xsi:string">Armitage Bay Ulvoid Algal Biomass and Species
Composition</returnfield>
<returnfield
name="/eml/dataset/keywordSet/keyword"
type="xsi:string">Enteromorpha</returnfield>
<returnfield
name="/eml/dataset/keywordSet/keyword"
type="xsi:string">Ulvaria</returnfield>
</record>
Note how we now can interpret the resultset in a much
more meaningful way. Also, note that there are two new namespace
elements, one contains a "prefix" attr the other does not. The one
without becaomes the default namespace for unqualified values in the
name and type attrs.
Here is the before and after for the DiGIR query:
Before:
<rs:resultset resultsetId="foo.1.1"
system="urn:not://sure/what/to/put/here"
xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
<http://www.w3.org/2001/XMLSchema-instance>
xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0
beta1 ../../src/xsd/resultset.xsd">
<resultsetMetadata>
<sendTime>2003-05-02T16:45:50-09:00</sendTime>
<startRecord>1</startRecord>
<endRecord>2</endRecord>
<recordCount>2</recordCount>
</resultsetMetadata>
<record number="1"
system="http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2"
<http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2>
identifier="mvz1"
namespace="http://digir.net/schema/conceptual/darwin/2003/1.0"
<http://digir.net/schema/conceptual/darwin/2003/1.0>
lastModifiedDate="2003-03-03T10:42:13"
creationDate="2003-03-03T10:42:13">
<darwin:ScientificName>PEROMYSCUS LEUCOPUS
NOVEBORACENSIS</darwin:ScientificName>
<darwin:Longitude>121</darwin:Longitude>
<darwin:Latitude>33</darwin:Latitude>
</record>
After:
<rs:resultset resultsetId="foo.1.1"
system="urn:not://sure/what/to/put/here"
xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
<http://www.w3.org/2001/XMLSchema-instance>
xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0
beta1 ../../src/xsd/resultset.xsd">
<resultsetMetadata>
<sendTime>2003-05-02T16:45:50-09:00</sendTime>
<startRecord>1</startRecord>
<endRecord>2</endRecord>
<recordCount>2</recordCount>
<namespace>http://digir.net/schema/conceptual/darwin/2003/1.0</namespace
>
<namespace
prefix="xsi">http://www.w3.org/2001/XMLSchema-instance</namespace>
</resultsetMetadata>
<record number="1"
system="http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2"
<http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2>
identifier="mvz1"
lastModifiedDate="2003-03-03T10:42:13"
creationDate="2003-03-03T10:42:13">
<returnfield path="ScientificName"
type="xsi:string">PEROMYSCUS LEUCOPUS NOVEBORACENSIS</returnfield>
<returnfield path="Longitude"
type="xsi:int">121</returnfield>
<returnfield path="Latitude"
type="xsi:int">33</returnfield>
</record>
Here is the SRB's before and after:
Before:
<rs:resultset system="http://knb.ecoinformatics.org"
<http://knb.ecoinformatics.org> resultsetId="SeekSRB_001"
xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1" >
<resultsetMetadata>
<sendTime>2004-04-16T11:02:12-0500</sendTime>
<startRecord>1</startRecord>
<endRecord>2</endRecord>
<recordCount>2</recordCount>
</resultsetMetadata>
<record number="1"
system="http://srb.sdsc.edu"
<http://srb.sdsc.edu>
identifier="/home/testuser.sdsc/SeekTestArea/Lesli Model::0"
namespace="srb://srb.sdsc.edu"
lastModifiedDate="2003-11-30T13:04:59-0600"
creationDate="2003-11-30T13:04:58-0600">
</record>
After:
<rs:resultset system="http://knb.ecoinformatics.org"
<http://knb.ecoinformatics.org> resultsetId="SeekSRB_001"
xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1" >
<resultsetMetadata>
<sendTime>2004-04-16T11:02:12-0500</sendTime>
<startRecord>1</startRecord>
<endRecord>2</endRecord>
<recordCount>2</recordCount>
<namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
</resultsetMetadata>
<record number="1"
system="http://srb.sdsc.edu"
<http://srb.sdsc.edu>
identifier="/home/testuser.sdsc/SeekTestArea/Lesli Model::0"
lastModifiedDate="2003-11-30T13:04:59-0600"
creationDate="2003-11-30T13:04:58-0600">
<returnfield name="location"
type="xsi:string">/home/testuser.sdsc/SeekTestArea/Lesli
Model::0</returnfield>
</record>
_____
The Query
About the only difference between the old query and the
new is that is the returnfield value can concept attr values do not have
a namespace then the prefix should be dropped from the namespace element
, or they should have a namespace if there is a prefix in the element.
For example:
<?xml version="1.0" encoding="UTF-8"?>
<egq:query queryId="test.1.1"
system="http://knb.ecoinformatics.org" <http://knb.ecoinformatics.org>
xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
<http://www.w3.org/2001/XMLSchema-instance>
xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta
1 ../../src/xsd/query.xsd">
<namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
<returnfield>/eml/dataset/title</returnfield>
<returnfield>/eml/dataset/creator/individualName/surName</returnfield>
<returnfield>/eml/dataset/pubDate</returnfield>
<returnfield>/eml/dataset/keywordSet/keyword</returnfield>
<title>Soils metadata query</title>
<AND>
<OR>
<condition operator="LIKE"
concept="title">%soil%</condition>
<condition operator="NOT LIKE"
concept="title">%dirt%</condition>
</OR>
<OR>
<condition operator="LIKE"
concept="surName">%Jones%</condition>
<condition operator="LIKE"
concept="surName">%Vieglais%</condition>
</OR>
</AND>
</egq:query>
_____
We can either discuss this via email, or think about it
and discuss it further during our phone meeting.
Rod
Chad Berkley wrote:
Hi,
Sorry for my late reply...we've been busy with a
morpho release. thanks for getting me in gear, Rod.
In metacat, we only return leaf nodes (i.e. the
text node child of a CDATA element like in response 4 below). The
returnfield functionality was originally meant as a convenient way to
return enough information for a meaningful resultset to display, say, on
a web page. It was not meant to return whole document chunks for
further processing. I can see how this would be useful, but it would
require returning a namespace defined chunk so that a parser would know
what to do with it. Metacat currently uses the returnfields to build
the resultset table, then a request must be made for the whole document
in order to do further processing.
Looking at the responses 1-3 below, to me, they
are all invalid and potentially problematic. without a namespace to
parse those xml chunks off of, the parser is left to just do
well-formedness checking and any query into these document chunks may
fail because we don't know what to expect to get back before doing the
processing (e.g. an xpath query).
So I guess to make a short answer long, I agree
with Peter's assessment of sticking with response 4 (which is basically
what metacat has done all along).
chad
Rod Spears wrote:
Is anyone better qualified than me,
going to address Peter's questions?
Please someone respond, thanks.
Rod
Peter McCartney wrote:
it has to be well formed no matter what.
so the question is really how can we identify a namespace for the result
set when the content we stick in there has no hope of being valid?
further, how can we define a set of rules for how the results are to be
evaluated against that namespace yet not be valid?
request 1:
'*/creator/individualName/surname', '/eml/dataset
Rule1: "content must appear in minimal
xml tree needed to accomodate the informaton"
Rule2: "content must appear in a
potentially valid xml tree that invalidates only due other required
elements missing.
rule 3 "conent must appear in a tree
that placed in in correct node ancestry for the declared namespace.
response 1: meets 1 and 3 and is well
formed. Requires just knowledge of parent ancestry to build.
<eml>
<dataset>
<creator>
<individualName>
<surname>mccartney</surname>
<surname>jones</surname>
</individualname>
</creator>
</dataset>
<eml>
response 2: meets 1, 2 and 3 and is well
formed. Requires knowledge of ancestry and index (ie jones is in
creator[2] of dataset[1] )
<eml>
<dataset>
<creator>
<individualName>
<surname>mccartney</surname>
</individualname>
</creator>
<creator>
<individualName>
<surname>jones</surname>
</individualname>
</creator>
</dataset>
<eml>
response 3: meets 3 and is not well
formed. rquires knowledge of ancestry.
<eml>
<dataset>
<creator>
<individualName>
<surname>mccartney</surname>
</individualname>
</creator>
</dataset>
<eml>
<dataset>
<creator>
<individualName>
<surname>jones</surname>
</individualname>
</creator>
</dataset>
</eml>
and just a reminder of where we
originally started from (approximately)
reponse 4: meets no rule, cannot
validated, but conveys all the information to generate format 1 or 3
above using a string tokenizer and a jDOM. but not option 2.
<resultset namespace=eml......>
<returnfield
xpath="dataset/creator/individualname/surname">mccartney</returnfield>
<returnfield
xpath="dataset/creator/individualname/surname">jones</returnfield>
</resultset>
I think we should really ask whether we
are making ourselves deal with some very complicated rules for really no
gain in functionality. None of the results will be valid according to
the name space. All of them are valid if i make up my own namespace for
the result set. Unless we can hold our selves to the standard where any
code or xsl written for the schema will successfuly process the result
set (#2 is the closest to that, but depending on how loose the code is,
all three could work or none could work), why shouldnt we opt for the
easiest rule to comply with?
Peter McCartney (peter.mccartney at asu.edu
<mailto:peter.mccartney at asu.edu> <mailto:peter.mccartney at asu.edu> )
Center for Environmental-Studies
Arizona State University
-----Original Message-----
*From:* Saritha Bhandarkar
*Sent:* Friday, April 09, 2004 10:28
AM
*To:* 'seek-dev'
*Cc:* Jing Tao; Peter McCartney;
Saritha Bhandarkar
*Subject:* resultset question
Hi,
I had a question about the resultset
to be returned by Xanthoria.
The schema of the resultset
specifies that a record is of type
?AnyRecordType? and optionally it
may have some element content
from the record. Now, my question
here is, if I am to return the
elements specified in the
<returnfields> of the query, for the matching records (that is from the
matching
eml file), do I need to send it in
eml format, with only relevant
values for requested fields and no
values for the fields which are
not requested? Or is it enough to
return only the requested fields
with their values, as well-formed
xml? Can someone please brief me
on the contents of a record in
resultsetType?
Thanks,
Saritha
Saritha Bhandarkar
Research Assistant
Center for Environmental Studies
ASU-Tempe AZ
saritha.bhandarkar at asu.edu
<mailto:saritha.bhandarkar at asu.edu> <mailto:saritha.bhandarkar at asu.edu>
--
Rod Spears
Biodiversity Research Center
University of Kansas
1345 Jayhawk Boulevard
Lawrence, KS 66045, USA
Tel: 785 864-4082, Fax: 785 864-5335
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-dev/attachments/20040504/1c5af5d9/attachment.htm
More information about the Seek-dev
mailing list