[seek-dev] RE: resultset question
Rod Spears
rods at ku.edu
Sun Apr 25 06:48:55 PDT 2004
Oops, it should have "name" instead of "path" (always "name")
Rod
Peter McCartney wrote:
> is it a typo or a meaningful difference that the metacat and srb
> examples have a "name" attribute whereas the digir example has a
> "path" attribute in the <returnField> element?
>
>
> Peter McCartney (peter.mccartney at asu.edu <mailto:peter.mccartney at asu.edu>)
> Center for Environmental-Studies
> Arizona State University
>
>
> -----Original Message-----
> From: Rod Spears [mailto:rods at ku.edu]
> Sent: Friday, April 23, 2004 11:00 AM
> To: Chad Berkley
> Cc: Peter McCartney; Saritha Bhandarkar; seek-dev; Jing Tao; Matt
> Jones
> Subject: Re: [seek-dev] RE: resultset question
>
> Dave and I spent some time thinking about this and arrived at a
> similar place as to #4, but took it a little further and changed
> how the resultset is defines and made a minor change to the query.
>
> The main issue has to do with the consumers of the resultset
> coming back from an Ecogrid query.
>
> How does a consumer interpret the results in a meaning way?
> What can be done to help generic consumers and SMS?
>
> The issue at the moment is that the contents of the <record>
> element is basically a blob and anything goes. For example:
> 1) Metacat return a bunch of param elements contain the data
> 2) DiGIR contents a bunuch of namespace qualified elements
> containing the data.
> 3) The SRB doesn't even have any data in the record, the
> identifier attr is meaningful.
>
> We need to provide a mechanism for the contents to be interpreted,
> to do this we will add four things to the existing resultset schema:
> 1) One or more <namespace> elements the metadata - this will be
> the namespace for the new <returnfield> element
> 2) Add a new element <returnfield>
> 3) A "name" attribute for the returnfield element (basically the
> same as Peter 'xpath' att) which is a unique name within the
> record and may be meaning for whereever the data came from.
> 4) A "type" attribute for the returnfield element that describe
> the type of data contained in the returnfield
>
> The most important and powerful part of the new additions is the
> "type" attr. This enables the value to be interpreted. Most of the
> time it can be described by a schema defintion type, for example
> "xsi:string" etc. Or it could be an url that points to a schema
> definition document. This means the value of the returnfield
> element could be anything from a string or integer to an entire
> XML document.
>
> (Note that the namespace attr has been removed from the record
> element)
>
> The new namespace attrs in the metadata provide a way for the
> value of the name attr and the type attr to be interpreted.
>
> Here is an example of the a metacat resultset that is returned today:
> <rs:resultset system="http://knb.ecoinformatics.org"
> resultsetId="eml.001"
> xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
>
> xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1
> ../../src/xsd/resultset.xsd">
> <resultsetMetadata>
> <sendTime>2004-03-10T13:47:26-0600</sendTime>
> <startRecord>1</startRecord>
> <endRecord>14</endRecord>
> <recordCount>14</recordCount>
> </resultsetMetadata>
> <record number="1"
> system="http://dev.nceas.ucsb.edu"
> identifier="obfs2.379.1"
> namespace="eml://ecoinformatics.org/eml-2.0.0"
> lastModifiedDate="2003-11-02T11:07:43-0600"
> creationDate="2003-11-02T11:07:43-0600">
> <param
> name="/eml/dataset/keywordSet/keyword">seasonality</param>
> <param name="/eml/dataset/keywordSet/keyword">macroalgal
> bloom</param>
> <param name="/eml/dataset/keywordSet/keyword">green
> tide</param>
> <param name="/eml/dataset/keywordSet/keyword">Ulva</param>
> <param
> name="/eml/dataset/creator/individualName/surName">Nelson</param>
> <param name="/eml/dataset/keywordSet/keyword">biomass</param>
> <param name="/eml/dataset/keywordSet/keyword">algal
> blooms</param>
> <param name="/eml/dataset/title">Armitage Bay Ulvoid Algal
> Biomass and Species Composition</param>
> <param
> name="/eml/dataset/keywordSet/keyword">Enteromorpha</param>
> <param name="/eml/dataset/keywordSet/keyword">Ulvaria</param>
> </record>
>
> Here is an example of the same resultset as described by the new
> approach:
> <rs:resultset system="http://knb.ecoinformatics.org"
> resultsetId="eml.001"
> xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
>
> xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1
> ../../src/xsd/resultset.xsd">
> <resultsetMetadata>
> <sendTime>2004-03-10T13:47:26-0600</sendTime>
> <startRecord>1</startRecord>
> <endRecord>14</endRecord>
> <recordCount>14</recordCount>
> <namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
> <namespace
> prefix="xsi">http://www.w3.org/2001/XMLSchema-instance</namespace>
> </resultsetMetadata>
> <record number="1"
> system="http://dev.nceas.ucsb.edu"
> identifier="obfs2.379.1"
> lastModifiedDate="2003-11-02T11:07:43-0600"
> creationDate="2003-11-02T11:07:43-0600">
> <returnfield name="/eml/dataset/keywordSet/keyword"
> type="xsi:string">seasonality</returnfield>
> <returnfield name="/eml/dataset/keywordSet/keyword"
> type="xsi:string">macroalgal bloom</returnfield>
> <returnfield name="/eml/dataset/keywordSet/keyword"
> type="xsi:string">green tide</returnfield>
> <returnfield name="/eml/dataset/keywordSet/keyword"
> type="xsi:string">Ulva</returnfield>
> <returnfield
> name="/eml/dataset/creator/individualName/surName"
> type="xsi:string">Nelson</returnfield>
> <returnfield name="/eml/dataset/keywordSet/keyword"
> type="xsi:string">biomass</returnfield>
> <returnfield name="/eml/dataset/keywordSet/keyword"
> type="xsi:string">algal blooms</returnfield>
> <returnfield name="/eml/dataset/title"
> type="xsi:string">Armitage Bay Ulvoid Algal Biomass and Species
> Composition</returnfield>
> <returnfield name="/eml/dataset/keywordSet/keyword"
> type="xsi:string">Enteromorpha</returnfield>
> <returnfield name="/eml/dataset/keywordSet/keyword"
> type="xsi:string">Ulvaria</returnfield>
> </record>
>
> Note how we now can interpret the resultset in a much more
> meaningful way. Also, note that there are two new namespace
> elements, one contains a "prefix" attr the other does not. The one
> without becaomes the default namespace for unqualified values in
> the name and type attrs.
>
> Here is the before and after for the DiGIR query:
> Before:
> <rs:resultset resultsetId="foo.1.1"
> system="urn:not://sure/what/to/put/here"
>
> xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>
> xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1
> ../../src/xsd/resultset.xsd">
> <resultsetMetadata>
> <sendTime>2003-05-02T16:45:50-09:00</sendTime>
> <startRecord>1</startRecord>
> <endRecord>2</endRecord>
> <recordCount>2</recordCount>
> </resultsetMetadata>
> <record number="1"
>
> system="http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2"
>
> identifier="mvz1"
>
> namespace="http://digir.net/schema/conceptual/darwin/2003/1.0"
> lastModifiedDate="2003-03-03T10:42:13"
> creationDate="2003-03-03T10:42:13">
> <darwin:ScientificName>PEROMYSCUS LEUCOPUS
> NOVEBORACENSIS</darwin:ScientificName>
> <darwin:Longitude>121</darwin:Longitude>
> <darwin:Latitude>33</darwin:Latitude>
> </record>
>
> After:
> <rs:resultset resultsetId="foo.1.1"
> system="urn:not://sure/what/to/put/here"
>
> xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>
> xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1
> ../../src/xsd/resultset.xsd">
>
> <resultsetMetadata>
> <sendTime>2003-05-02T16:45:50-09:00</sendTime>
> <startRecord>1</startRecord>
> <endRecord>2</endRecord>
> <recordCount>2</recordCount>
>
> <namespace>http://digir.net/schema/conceptual/darwin/2003/1.0</namespace>
> <namespace
> prefix="xsi">http://www.w3.org/2001/XMLSchema-instance</namespace>
> </resultsetMetadata>
>
> <record number="1"
>
> system="http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2"
>
> identifier="mvz1"
> lastModifiedDate="2003-03-03T10:42:13"
> creationDate="2003-03-03T10:42:13">
> <returnfield path="ScientificName"
> type="xsi:string">PEROMYSCUS LEUCOPUS NOVEBORACENSIS</returnfield>
> <returnfield path="Longitude" type="xsi:int">121</returnfield>
> <returnfield path="Latitude" type="xsi:int">33</returnfield>
> </record>
>
> Here is the SRB's before and after:
> Before:
> <rs:resultset system="http://knb.ecoinformatics.org"
> resultsetId="SeekSRB_001"
> xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
> >
> <resultsetMetadata>
> <sendTime>2004-04-16T11:02:12-0500</sendTime>
> <startRecord>1</startRecord>
> <endRecord>2</endRecord>
> <recordCount>2</recordCount>
> </resultsetMetadata>
> <record number="1"
> system="http://srb.sdsc.edu"
> identifier="/home/testuser.sdsc/SeekTestArea/Lesli Model::0"
> namespace="srb://srb.sdsc.edu"
> lastModifiedDate="2003-11-30T13:04:59-0600"
> creationDate="2003-11-30T13:04:58-0600">
> </record>
>
> After:
> <rs:resultset system="http://knb.ecoinformatics.org"
> resultsetId="SeekSRB_001"
> xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
> >
> <resultsetMetadata>
> <sendTime>2004-04-16T11:02:12-0500</sendTime>
> <startRecord>1</startRecord>
> <endRecord>2</endRecord>
> <recordCount>2</recordCount>
> <namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
> </resultsetMetadata>
> <record number="1"
> system="http://srb.sdsc.edu"
> identifier="/home/testuser.sdsc/SeekTestArea/Lesli Model::0"
> lastModifiedDate="2003-11-30T13:04:59-0600"
> creationDate="2003-11-30T13:04:58-0600">
> <returnfield name="location"
> type="xsi:string">/home/testuser.sdsc/SeekTestArea/Lesli
> Model::0</returnfield>
> </record>
> ------------------------------------------------------------------------
> The Query
> About the only difference between the old query and the new is
> that is the returnfield value can concept attr values do not have
> a namespace then the prefix should be dropped from the namespace
> element , or they should have a namespace if there is a prefix in
> the element. For example:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <egq:query queryId="test.1.1" system="http://knb.ecoinformatics.org"
> xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>
> xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1
> ../../src/xsd/query.xsd">
> <namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
> <returnfield>/eml/dataset/title</returnfield>
>
>
> <returnfield>/eml/dataset/creator/individualName/surName</returnfield>
> <returnfield>/eml/dataset/pubDate</returnfield>
> <returnfield>/eml/dataset/keywordSet/keyword</returnfield>
> <title>Soils metadata query</title>
> <AND>
> <OR>
> <condition operator="LIKE"
> concept="title">%soil%</condition>
> <condition operator="NOT LIKE"
> concept="title">%dirt%</condition>
> </OR>
> <OR>
> <condition operator="LIKE"
> concept="surName">%Jones%</condition>
> <condition operator="LIKE"
> concept="surName">%Vieglais%</condition>
> </OR>
> </AND>
> </egq:query>
> ------------------------------------------------------------------------
>
> We can either discuss this via email, or think about it and
> discuss it further during our phone meeting.
>
> Rod
>
>
> Chad Berkley wrote:
>
>> Hi,
>>
>> Sorry for my late reply...we've been busy with a morpho release.
>> thanks for getting me in gear, Rod.
>>
>> In metacat, we only return leaf nodes (i.e. the text node child
>> of a CDATA element like in response 4 below). The returnfield
>> functionality was originally meant as a convenient way to return
>> enough information for a meaningful resultset to display, say, on
>> a web page. It was not meant to return whole document chunks for
>> further processing. I can see how this would be useful, but it
>> would require returning a namespace defined chunk so that a
>> parser would know what to do with it. Metacat currently uses the
>> returnfields to build the resultset table, then a request must be
>> made for the whole document in order to do further processing.
>>
>> Looking at the responses 1-3 below, to me, they are all invalid
>> and potentially problematic. without a namespace to parse those
>> xml chunks off of, the parser is left to just do well-formedness
>> checking and any query into these document chunks may fail
>> because we don't know what to expect to get back before doing the
>> processing (e.g. an xpath query).
>>
>> So I guess to make a short answer long, I agree with Peter's
>> assessment of sticking with response 4 (which is basically what
>> metacat has done all along).
>>
>> chad
>>
>>
>> Rod Spears wrote:
>>
>>> Is anyone better qualified than me, going to address Peter's
>>> questions?
>>>
>>> Please someone respond, thanks.
>>>
>>> Rod
>>>
>>>
>>> Peter McCartney wrote:
>>>
>>>> it has to be well formed no matter what. so the question is
>>>> really how can we identify a namespace for the result set when
>>>> the content we stick in there has no hope of being valid?
>>>> further, how can we define a set of rules for how the results
>>>> are to be evaluated against that namespace yet not be valid?
>>>> request 1: '*/creator/individualName/surname', '/eml/dataset
>>>>
>>>> Rule1: "content must appear in minimal xml tree needed to
>>>> accomodate the informaton"
>>>>
>>>> Rule2: "content must appear in a potentially valid xml tree
>>>> that invalidates only due other required elements missing.
>>>>
>>>> rule 3 "conent must appear in a tree that placed in in correct
>>>> node ancestry for the declared namespace.
>>>>
>>>>
>>>> response 1: meets 1 and 3 and is well formed. Requires just
>>>> knowledge of parent ancestry to build.
>>>> <eml>
>>>> <dataset>
>>>> <creator>
>>>> <individualName>
>>>> <surname>mccartney</surname>
>>>> <surname>jones</surname>
>>>> </individualname>
>>>> </creator>
>>>> </dataset>
>>>> <eml>
>>>>
>>>> response 2: meets 1, 2 and 3 and is well formed. Requires
>>>> knowledge of ancestry and index (ie jones is in creator[2] of
>>>> dataset[1] )
>>>> <eml>
>>>> <dataset>
>>>> <creator>
>>>> <individualName>
>>>> <surname>mccartney</surname>
>>>> </individualname>
>>>> </creator>
>>>> <creator>
>>>> <individualName>
>>>> <surname>jones</surname>
>>>> </individualname>
>>>> </creator>
>>>> </dataset>
>>>> <eml>
>>>>
>>>>
>>>> response 3: meets 3 and is not well formed. rquires knowledge
>>>> of ancestry.
>>>>
>>>> <eml>
>>>> <dataset>
>>>> <creator>
>>>> <individualName>
>>>> <surname>mccartney</surname>
>>>> </individualname>
>>>> </creator>
>>>> </dataset>
>>>> <eml>
>>>> <dataset>
>>>> <creator>
>>>> <individualName>
>>>> <surname>jones</surname>
>>>> </individualname>
>>>> </creator>
>>>> </dataset>
>>>> </eml>
>>>>
>>>> and just a reminder of where we originally started from
>>>> (approximately)
>>>> reponse 4: meets no rule, cannot validated, but conveys all the
>>>> information to generate format 1 or 3 above using a string
>>>> tokenizer and a jDOM. but not option 2.
>>>> <resultset namespace=eml......>
>>>> <returnfield
>>>> xpath="dataset/creator/individualname/surname">mccartney</returnfield>
>>>>
>>>> <returnfield
>>>> xpath="dataset/creator/individualname/surname">jones</returnfield>
>>>> </resultset>
>>>>
>>>> I think we should really ask whether we are making ourselves
>>>> deal with some very complicated rules for really no gain in
>>>> functionality. None of the results will be valid according to
>>>> the name space. All of them are valid if i make up my own
>>>> namespace for the result set. Unless we can hold our selves to
>>>> the standard where any code or xsl written for the schema will
>>>> successfuly process the result set (#2 is the closest to that,
>>>> but depending on how loose the code is, all three could work or
>>>> none could work), why shouldnt we opt for the easiest rule to
>>>> comply with?
>>>>
>>>>
>>>> Peter McCartney (peter.mccartney at asu.edu
>>>> <mailto:peter.mccartney at asu.edu>)
>>>> Center for Environmental-Studies
>>>> Arizona State University
>>>>
>>>>
>>>> -----Original Message-----
>>>> *From:* Saritha Bhandarkar
>>>> *Sent:* Friday, April 09, 2004 10:28 AM
>>>> *To:* 'seek-dev'
>>>> *Cc:* Jing Tao; Peter McCartney; Saritha Bhandarkar
>>>> *Subject:* resultset question
>>>>
>>>> Hi,
>>>>
>>>> I had a question about the resultset to be returned by
>>>> Xanthoria.
>>>>
>>>> The schema of the resultset specifies that a record is of type
>>>> ?AnyRecordType? and optionally it may have some element
>>>> content
>>>> from the record. Now, my question here is, if I am to
>>>> return the
>>>> elements specified in the <returnfields> of the query, for
>>>> the matching records (that is from the matching
>>>> eml file), do I need to send it in eml format, with only
>>>> relevant
>>>> values for requested fields and no values for the fields
>>>> which are
>>>> not requested? Or is it enough to return only the requested
>>>> fields
>>>> with their values, as well-formed xml? Can someone please
>>>> brief me
>>>> on the contents of a record in resultsetType?
>>>>
>>>> Thanks,
>>>>
>>>> Saritha
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Saritha Bhandarkar
>>>>
>>>> Research Assistant
>>>>
>>>> Center for Environmental Studies
>>>>
>>>> ASU-Tempe AZ
>>>>
>>>> saritha.bhandarkar at asu.edu <mailto:saritha.bhandarkar at asu.edu>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Rod Spears
>>> Biodiversity Research Center
>>> University of Kansas
>>> 1345 Jayhawk Boulevard
>>> Lawrence, KS 66045, USA
>>> Tel: 785 864-4082, Fax: 785 864-5335
>>>
>>
>>
--
Rod Spears
Biodiversity Research Center
University of Kansas
1345 Jayhawk Boulevard
Lawrence, KS 66045, USA
Tel: 785 864-4082, Fax: 785 864-5335
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-dev/attachments/20040425/4dba39fe/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jayhawk.gif
Type: image/gif
Size: 4637 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-dev/attachments/20040425/4dba39fe/jayhawk.gif
More information about the Seek-dev
mailing list