[seek-dev] RE: resultset question

Mon Apr 26 09:40:29 PDT 2004

Rod Spears wrote:
> See comments below.... and please comment on my comments.

Comments below on your comments... (Thanks for responding to my orginal 
mail so quickly)

Also, you should comment on my comments on your comments ;-)

> 
> Shawn Bowers wrote:
> 
>>On Fri, 23 Apr 2004, Rod Spears wrote:
>>
>>[snip ...]
>>
>>  
>>
>>>What can be done to help generic consumers and SMS?
>>>    
>>>
>>
>>I have some opinions/observations about what Ecogrid can provide for SMS 
>>(assuming you mean Semantic Mediation System). No one actually asked for 
>>my opinion, but the door is opened by the question, and I thought I'd 
>>barge in :-)
>>
>>Here is what I see SMS needing. (Note that this might be a lot different
>>than what ecogrid actually intends to provide -- these items are more
>>aligned with the architecture of traditional integration systems and
>>systems being developed like for GEON.)
>>
>>1). Every resource registered in the Ecogrid should have a persistent, 
>>Ecogrid-relative unique identifier. 
>>
> Each does today. It has a unique name.

I thought that it did.  This list of operations and data structures is 
just to say what SMS needs from Ecogrid -- I assumed much of this was 
implemented by Ecogrid already.

>>2). Every resource registered in the Ecogrid should fill-in two Ecogrid
>>metadata tags (dublin-core style). The first is the type of resource
>>registered, e.g., the type could be "dataset", "web service", "shape
>>file", "source code", "PDF document", "ontology", etc. (These should be
>>controlled values, i.e., come from a predefined list.) 
>>
> Dave and I were just talking about this. We hoped we could get by 
> without an extra identifier. Meaning the "type" could be derived from 
> the service's location (or the interfaces it implements). But maybe we 
> will need a simple field for easier indentification.

I think the assumption that the location determines the resource type is 
not general enough (and also not extensible).  For example, if we have 
an SRB repository used within Ecogrid for storing datasets as well as 
PDF documents and ontologies, then a namespace would have to capture all 
three of these types.  I believe that with many of these underlying 
systems, like SRB and Metacat, there is no requirement that all 
resources stored must be of the same type.

I stated above there should be a metadata tag for storing the type 
information, but it could just as easily be an operation (or query). 
For example, getResourceType : ResourceID -> ResourceType is a partial 
function, where ResourceID is the set of all possible Ecogrid resources 
identifiers and ResourceType is the set of all resource types known by 
Ecogrid ("dataset", "web service", and so on). So, for a given 
resource-id r, getResourceType(r) returns the associated resource type 
of r.  If Ecogrid calculates this op based on where r is stored, and 
that is really a valid assumption, that seems fine.  Note that the 
operation could also be expressed as a query, as opposed to a function 
or a metadata tag.

>>The other tag
>>states the available (and Ecogrid accessible) standards-based metadata for
>>the resource, e.g., for a dataset this might include "FGDC", "EML", "XML
>>Schema" (for datasets stored in XML), "SQL DDL"; and for a web service,
>>"WSDL"; and so on. (Again, these should be controlled values.) Other tags
>>that might be useful (but not required by SMS) are quality of resource
>>(who registered it, whether it has been deemed "accepted", and so on) and
>>whether it is curated (stored by some Ecogrid db) or stored externally
>>(e.g., in the PNW database).
>>
> Would a namespace be enough to be able to specify "how" the metadata was 
> stored?

In this case, I don't think a namespace is enough.  Any given resource 
may have multiple metadata specifications. For example, if a given 
resource-id r happens to be a dataset, then there very easily could be 
both an FGDC and an EML metadata file for r.  So, what SMS needs is a 
(partial) function getMetadataType : ResourceID -> MetadataType^2, which 
takes a resource id and returns a set (^2 means powerset) of metadata 
types (e.g., "SQL DDL", "EML", and so on).

One question I have about the Ecogrid, and probably a misconception I 
have, is that it seems like what is searched *for* is metadata (like EML 
files), and not the actual resource. This was what prompted my earlier 
post on how to get all resources from the Ecogrid... do I have to first 
query for all the metadata associated with the resource, then look in 
these files to see where each resource is actually being stored? Like I 
said, this might be a misconception I have -- it seems like this 
metadata-centric view represents the only examples I've seen for 
Ecogrid. I would like for SMS to have resource-centric access for 
datasets; the resource is what is of interest (I give an example in my 
next comment below). The same should be true for Kepler -- datasets can 
be processed in a workflow, not the EML files of the datasets (there is 
a caveat to this; both Chad's EML ingestor and in some ways, Iklay's web 
service actor, take metadata files, but their purpose is to get from the 
metadata to the actual resource, I believe).   Of course, for web 
services (as an example), SMS doesn't need the actual resource, and only 
needs the WSDL description (which happens to be all that is needed to 
execute the web service).  However, conceptually, it is still the 
web-service that is the resource -- the web-service implementation is 
what is of interest, and the WSDL could be viewed as just a by-product 
of the implementation. In fact, there could be many WSDL descriptions of 
the same implementation.  There may be some disagreement about this 
notion of Ecogrid being resource-centric, but I would argue it is the 
more general semantics.

Does that make sense?

>>3). Ecogrid should support an operation to retrieve the metadata
>>definition for a resource. For example, if a dataset is stored through the
>>Ecogrid, and the resource has an EML description (which we know from 2),
>>then the operation would return the corresponding EML file (of course,
>>although not likely, there is nothing that would prevent a resource from
>>having multiple EML files).
>>
> Seems reasonable.
> 
>>
>>4). Ecogrid should support an operation to retrieve the actual resource
>>(the thing managed by the ecogrid; either a dataset, a web service, a
>>"code", or whatever).  Also, datasets should be returned using a standard
>>representation. For example, the canonical XML representation for
>>relational data or CSV.  I believe EML-tools already provide some support
>>for this for relational data. Thus, at least for datasets, the Ecogrid
>>should serve as a standard wrapper service as used in distributed dbs and
>>in information-integration architectures. This service I see as useful for
>>both SMS and for Kepler in general.
>>
> It's either doing this, or I don't quite understand the question.

Here is an example.  I am a scientist, and I have a dataset (a single 
relation) stored in an Access database. I also have an FGDC file that I 
created to describe my dataset. They are both living on my laptop.  I 
want to store my dataset on the Ecogrid. I create an Ecogrid resource-id 
for the dataset, ecogrid:042604, and I register the resource-id for the 
dataset. That is, I upload the Access database to some Ecogrid 
repository as well as the FGDC file, and I tell Ecogrid that the FGDC 
file should be used as the metadata file for the dataset.

Later, SMS needs to integrate the dataset with some other dataset. SMS 
knows the resource-id for both datasets. To do the integration, SMS 
needs access to both datasets. To get access to the datasets, SMS calls 
the Ecogrid function getResource(ecogrid:042604, "CSV"), which returns 
the dataset as a comma-separated-value text-file representation. 
Alternatively (and preferred), SMS could call 
getResource(ecogrid:042604, "RelationalXML"), which returns the same 
exact dataset using the standard relational to XML mapping.

Does Ecogrid already provide something like getResource? (If so that 
would be awesome!)

Thanks,
Shawn

> 
>>
>>5). Optionally (at least for SMS, these aren't required), Ecogrid can
>>offer a query-routing/execution service and/or web service invocation.  
>>The purpose of offering query or invocation services would be for
>>optimization (in some cases) and to enable such operations for clients
>>that cannot perform these locally. 
>>
> 
> I think this functionality is one of the benefits of using Globus.
> 
>>
>>I believe that items 1-4 are the only things really needed by SMS from the
>>Ecogrid. In particular, for SMS, it doesn't really matter how or where the
>>resource is stored (metacat, src, digir, etc.), and it doesn't need
>>services to query the catalog entries of those systems.  If people bypass
>>the SMS system, then I guess these types of things are needed.
>>
>>Items 1-3 seem relatively straightforward. Item 4 seems harder, although 
>>EML-tools exist for much of this I guess -- I am not really sure.
>>
>>
>>Shawn
>>
>>
>>  
>>
>>>The issue at the moment is that the contents of the <record> element is 
>>>basically a blob and anything goes. For example:
>>>1) Metacat return a bunch of param elements contain the data
>>>2) DiGIR contents a bunuch of namespace qualified elements containing 
>>>the data.
>>>3) The SRB doesn't even have any data in the record, the identifier attr 
>>>is meaningful.
>>>
>>>We need to provide a mechanism for the contents to be interpreted, to do 
>>>this we will add four things to the existing resultset schema:
>>>1) One or more <namespace> elements the metadata - this will be the 
>>>namespace for the new <returnfield> element
>>>2) Add a new element <returnfield>
>>>3) A "name" attribute for the returnfield element (basically the same as 
>>>Peter 'xpath' att) which is a unique name within the record and may be 
>>>meaning for whereever the data came from.
>>>4) A "type" attribute for the returnfield element that describe the type 
>>>of data contained in the returnfield
>>>
>>>The most important and powerful part of the new additions is the "type" 
>>>attr. This enables the value to be interpreted. Most of the time it can 
>>>be described by a schema defintion type, for example "xsi:string" etc. 
>>>Or it could be an url that points to a schema definition document. This 
>>>means the value of the returnfield element could be anything from a 
>>>string or integer to an entire XML document.
>>>
>>>(Note that the namespace attr has been removed from the record element)
>>>
>>>The new namespace attrs in the metadata provide a way for the value of 
>>>the name attr and the type attr to be interpreted.
>>>
>>>Here is an example of the a metacat resultset that is returned today:
>>><rs:resultset system="http://knb.ecoinformatics.org" resultsetId="eml.001"
>>>  xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
>>>  
>>>xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1 
>>>../../src/xsd/resultset.xsd"> 
>>>  <resultsetMetadata>
>>>    <sendTime>2004-03-10T13:47:26-0600</sendTime>
>>>    <startRecord>1</startRecord>
>>>    <endRecord>14</endRecord>
>>>    <recordCount>14</recordCount>
>>>  </resultsetMetadata>
>>>  <record number="1"
>>>          system="http://dev.nceas.ucsb.edu"
>>>          identifier="obfs2.379.1"
>>>          namespace="eml://ecoinformatics.org/eml-2.0.0"
>>>          lastModifiedDate="2003-11-02T11:07:43-0600"
>>>          creationDate="2003-11-02T11:07:43-0600">
>>>      <param  name="/eml/dataset/keywordSet/keyword">seasonality</param>
>>>      <param  name="/eml/dataset/keywordSet/keyword">macroalgal 
>>>bloom</param>
>>>      <param  name="/eml/dataset/keywordSet/keyword">green tide</param>
>>>      <param  name="/eml/dataset/keywordSet/keyword">Ulva</param>
>>>      <param  
>>>name="/eml/dataset/creator/individualName/surName">Nelson</param>
>>>      <param  name="/eml/dataset/keywordSet/keyword">biomass</param>
>>>      <param  name="/eml/dataset/keywordSet/keyword">algal blooms</param>
>>>      <param  name="/eml/dataset/title">Armitage Bay Ulvoid Algal 
>>>Biomass and Species Composition</param>
>>>      <param  name="/eml/dataset/keywordSet/keyword">Enteromorpha</param>
>>>      <param  name="/eml/dataset/keywordSet/keyword">Ulvaria</param>
>>>  </record>
>>>
>>>Here is an example of the same resultset as described by the new approach:
>>><rs:resultset system="http://knb.ecoinformatics.org" resultsetId="eml.001"
>>>  xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
>>>  
>>>xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1 
>>>../../src/xsd/resultset.xsd"> 
>>>  <resultsetMetadata>
>>>    <sendTime>2004-03-10T13:47:26-0600</sendTime>
>>>    <startRecord>1</startRecord>
>>>    <endRecord>14</endRecord>
>>>    <recordCount>14</recordCount>
>>>    <namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
>>>    <namespace 
>>>prefix="xsi">http://www.w3.org/2001/XMLSchema-instance</namespace>
>>>  </resultsetMetadata>
>>>  <record number="1"
>>>          system="http://dev.nceas.ucsb.edu"
>>>          identifier="obfs2.379.1"
>>>          lastModifiedDate="2003-11-02T11:07:43-0600"
>>>          creationDate="2003-11-02T11:07:43-0600">
>>>      <returnfield name="/eml/dataset/keywordSet/keyword" 
>>>type="xsi:string">seasonality</returnfield>
>>>      <returnfield name="/eml/dataset/keywordSet/keyword" 
>>>type="xsi:string">macroalgal bloom</returnfield>
>>>      <returnfield name="/eml/dataset/keywordSet/keyword" 
>>>type="xsi:string">green tide</returnfield>
>>>      <returnfield name="/eml/dataset/keywordSet/keyword" 
>>>type="xsi:string">Ulva</returnfield>
>>>      <returnfield name="/eml/dataset/creator/individualName/surName" 
>>>type="xsi:string">Nelson</returnfield>
>>>      <returnfield name="/eml/dataset/keywordSet/keyword" 
>>>type="xsi:string">biomass</returnfield>
>>>      <returnfield name="/eml/dataset/keywordSet/keyword" 
>>>type="xsi:string">algal blooms</returnfield>
>>>      <returnfield name="/eml/dataset/title" type="xsi:string">Armitage 
>>>Bay Ulvoid Algal Biomass and Species Composition</returnfield>
>>>      <returnfield name="/eml/dataset/keywordSet/keyword" 
>>>type="xsi:string">Enteromorpha</returnfield>
>>>      <returnfield name="/eml/dataset/keywordSet/keyword" 
>>>type="xsi:string">Ulvaria</returnfield>
>>>  </record>
>>>
>>>Note how we now can interpret the resultset in a much more meaningful 
>>>way. Also, note that there are two new namespace elements, one contains 
>>>a "prefix" attr the other does not. The one without becaomes the default 
>>>namespace for unqualified values in the name and type attrs.
>>>
>>>Here is the before and after for the DiGIR query:
>>>Before:
>>><rs:resultset resultsetId="foo.1.1"
>>>    system="urn:not://sure/what/to/put/here"
>>>    xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
>>>    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>    
>>>xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1 
>>>../../src/xsd/resultset.xsd">
>>>    <resultsetMetadata>
>>>        <sendTime>2003-05-02T16:45:50-09:00</sendTime>
>>>        <startRecord>1</startRecord>
>>>        <endRecord>2</endRecord>
>>>        <recordCount>2</recordCount>
>>>    </resultsetMetadata>
>>>     <record number="1"
>>>             
>>>system="http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2"
>>>             identifier="mvz1"
>>>             namespace="http://digir.net/schema/conceptual/darwin/2003/1.0"
>>>             lastModifiedDate="2003-03-03T10:42:13"
>>>             creationDate="2003-03-03T10:42:13">
>>>        <darwin:ScientificName>PEROMYSCUS LEUCOPUS 
>>>NOVEBORACENSIS</darwin:ScientificName>
>>>        <darwin:Longitude>121</darwin:Longitude>
>>>        <darwin:Latitude>33</darwin:Latitude>
>>>     </record>
>>>
>>>After:
>>><rs:resultset resultsetId="foo.1.1"
>>>    system="urn:not://sure/what/to/put/here"
>>>    xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"
>>>    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>    
>>>xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1 
>>>../../src/xsd/resultset.xsd">
>>>
>>>    <resultsetMetadata>
>>>        <sendTime>2003-05-02T16:45:50-09:00</sendTime>
>>>        <startRecord>1</startRecord>
>>>        <endRecord>2</endRecord>
>>>        <recordCount>2</recordCount>
>>>        
>>><namespace>http://digir.net/schema/conceptual/darwin/2003/1.0</namespace>
>>>        <namespace 
>>>prefix="xsi">http://www.w3.org/2001/XMLSchema-instance</namespace>
>>>    </resultsetMetadata>
>>>
>>>    <record number="1"
>>>             
>>>system="http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2"
>>>             identifier="mvz1"
>>>             lastModifiedDate="2003-03-03T10:42:13"
>>>             creationDate="2003-03-03T10:42:13">
>>>        <returnfield path="ScientificName" type="xsi:string">PEROMYSCUS 
>>>LEUCOPUS NOVEBORACENSIS</returnfield>
>>>        <returnfield path="Longitude" type="xsi:int">121</returnfield>
>>>        <returnfield path="Latitude" type="xsi:int">33</returnfield>
>>>    </record>
>>>
>>>Here is the SRB's before and after:
>>>Before:
>>><rs:resultset system="http://knb.ecoinformatics.org" 
>>>resultsetId="SeekSRB_001"
>>> xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"  >
>>> <resultsetMetadata>
>>>   <sendTime>2004-04-16T11:02:12-0500</sendTime>
>>>   <startRecord>1</startRecord>
>>>   <endRecord>2</endRecord>
>>>   <recordCount>2</recordCount>
>>> </resultsetMetadata>
>>> <record number="1"
>>>         system="http://srb.sdsc.edu"
>>>         identifier="/home/testuser.sdsc/SeekTestArea/Lesli Model::0"
>>>         namespace="srb://srb.sdsc.edu"
>>>         lastModifiedDate="2003-11-30T13:04:59-0600"
>>>         creationDate="2003-11-30T13:04:58-0600">
>>> </record>
>>>
>>>After:
>>><rs:resultset system="http://knb.ecoinformatics.org" 
>>>resultsetId="SeekSRB_001"
>>> xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1"  >
>>> <resultsetMetadata>
>>>   <sendTime>2004-04-16T11:02:12-0500</sendTime>
>>>   <startRecord>1</startRecord>
>>>   <endRecord>2</endRecord>
>>>   <recordCount>2</recordCount>
>>>   <namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
>>> </resultsetMetadata>
>>> <record number="1"
>>>         system="http://srb.sdsc.edu"
>>>         identifier="/home/testuser.sdsc/SeekTestArea/Lesli Model::0"
>>>         lastModifiedDate="2003-11-30T13:04:59-0600"
>>>         creationDate="2003-11-30T13:04:58-0600">
>>>  <returnfield name="location" 
>>>type="xsi:string">/home/testuser.sdsc/SeekTestArea/Lesli 
>>>Model::0</returnfield>
>>> </record>
>>>------------------------------------------------------------------------
>>>The Query
>>>About the only difference between the old query and the new is that is 
>>>the returnfield value can concept attr values do not have a namespace 
>>>then the prefix should be dropped from the namespace element , or they 
>>>should have a namespace if there is a prefix in the element. For example:
>>>
>>><?xml version="1.0" encoding="UTF-8"?>
>>><egq:query queryId="test.1.1" system="http://knb.ecoinformatics.org"
>>>    xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1"
>>>    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>    
>>>xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1 
>>>../../src/xsd/query.xsd">
>>>    <namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
>>>    <returnfield>/eml/dataset/title</returnfield>
>>>
>>>    <returnfield>/eml/dataset/creator/individualName/surName</returnfield>
>>>    <returnfield>/eml/dataset/pubDate</returnfield>
>>>    <returnfield>/eml/dataset/keywordSet/keyword</returnfield>
>>>    <title>Soils metadata query</title>
>>>    <AND>
>>>        <OR>
>>>            <condition operator="LIKE" concept="title">%soil%</condition>
>>>            <condition operator="NOT LIKE" 
>>>concept="title">%dirt%</condition>
>>>        </OR>
>>>        <OR>
>>>            <condition operator="LIKE" concept="surName">%Jones%</condition>
>>>            <condition operator="LIKE" 
>>>concept="surName">%Vieglais%</condition>
>>>        </OR>
>>>    </AND>
>>></egq:query>
>>>------------------------------------------------------------------------
>>>
>>>We can either discuss this via email, or think about it and discuss it 
>>>further during our phone meeting.
>>>
>>>Rod
>>>
>>>
>>>Chad Berkley wrote:
>>>
>>>    
>>>
>>>>Hi,
>>>>
>>>>Sorry for my late reply...we've been busy with a morpho release.  
>>>>thanks for getting me in gear, Rod.
>>>>
>>>>In metacat, we only return leaf nodes (i.e. the text node child of a 
>>>>CDATA element like in response 4 below).  The returnfield 
>>>>functionality was originally meant as a convenient way to return 
>>>>enough information for a meaningful resultset to display, say, on a 
>>>>web page.  It was not meant to return whole document chunks for 
>>>>further processing.  I can see how this would be useful, but it would 
>>>>require returning a namespace defined chunk so that a parser would 
>>>>know what to do with it.  Metacat currently uses the returnfields to 
>>>>build the resultset table, then a request must be made for the whole 
>>>>document in order to do further processing.
>>>>
>>>>Looking at the responses 1-3 below, to me, they are all invalid and 
>>>>potentially problematic.  without a namespace to parse those xml 
>>>>chunks off of, the parser is left to just do well-formedness checking 
>>>>and any query into these document chunks may fail because we don't 
>>>>know what to expect to get back before doing the processing (e.g. an 
>>>>xpath query).
>>>>
>>>>So I guess to make a short answer long, I agree with Peter's 
>>>>assessment of sticking with response 4 (which is basically what 
>>>>metacat has done all along).
>>>>
>>>>chad
>>>>
>>>>
>>>>Rod Spears wrote:
>>>>
>>>>      
>>>>
>>>>>Is anyone better qualified than me, going to address Peter's questions?
>>>>>
>>>>>Please someone respond, thanks.
>>>>>
>>>>>Rod
>>>>>
>>>>>
>>>>>Peter McCartney wrote:
>>>>>
>>>>>        
>>>>>
>>>>>>it has to be well formed no matter what. so the question is really 
>>>>>>how can we identify a namespace for the result set when the content 
>>>>>>we stick in there has no hope of being valid? further, how can we 
>>>>>>define  a set of rules for how the results are to be evaluated 
>>>>>>against that namespace yet not be valid?
>>>>>>request 1: '*/creator/individualName/surname', '/eml/dataset
>>>>>> 
>>>>>>Rule1: "content must appear in minimal xml tree needed to accomodate 
>>>>>>the informaton"
>>>>>> 
>>>>>>Rule2: "content must appear in a potentially valid xml tree that 
>>>>>>invalidates only due other required elements missing.
>>>>>> 
>>>>>>rule 3 "conent must appear in a tree that placed in in correct node 
>>>>>>ancestry for the declared namespace.
>>>>>> 
>>>>>> 
>>>>>>response 1: meets 1 and 3 and is well formed. Requires just 
>>>>>>knowledge of parent ancestry to build.
>>>>>><eml>
>>>>>>    <dataset>
>>>>>>    <creator>
>>>>>>        <individualName>
>>>>>>                <surname>mccartney</surname>
>>>>>>                <surname>jones</surname>
>>>>>>        </individualname>
>>>>>>    </creator>
>>>>>></dataset>
>>>>>><eml>
>>>>>> 
>>>>>>response 2: meets 1, 2 and 3 and is well formed. Requires knowledge 
>>>>>>of ancestry and index (ie jones is in creator[2] of dataset[1] )
>>>>>><eml>
>>>>>>    <dataset>
>>>>>>    <creator>
>>>>>>        <individualName>
>>>>>>                <surname>mccartney</surname>
>>>>>>        </individualname>
>>>>>>    </creator>
>>>>>>    <creator>
>>>>>>        <individualName>
>>>>>>                <surname>jones</surname>
>>>>>>        </individualname>
>>>>>>    </creator>
>>>>>>  </dataset>
>>>>>><eml>
>>>>>> 
>>>>>> 
>>>>>>response 3: meets 3 and is not well formed. rquires knowledge of 
>>>>>>ancestry.
>>>>>> 
>>>>>><eml>
>>>>>>    <dataset>
>>>>>>    <creator>
>>>>>>        <individualName>
>>>>>>                <surname>mccartney</surname>
>>>>>>        </individualname>
>>>>>>    </creator>
>>>>>></dataset>
>>>>>><eml>
>>>>>>    <dataset>
>>>>>>    <creator>
>>>>>>        <individualName>
>>>>>>                <surname>jones</surname>
>>>>>>        </individualname>
>>>>>>    </creator>
>>>>>></dataset>
>>>>>></eml>
>>>>>> 
>>>>>>and just a reminder of where we originally started from 
>>>>>>(approximately)  
>>>>>>reponse 4: meets no rule, cannot validated, but conveys all the 
>>>>>>information to generate format 1 or 3 above using a string tokenizer 
>>>>>>and a jDOM. but not option 2.
>>>>>><resultset namespace=eml......>
>>>>>>    <returnfield 
>>>>>>xpath="dataset/creator/individualname/surname">mccartney</returnfield>
>>>>>>    <returnfield 
>>>>>>xpath="dataset/creator/individualname/surname">jones</returnfield>
>>>>>></resultset>
>>>>>> 
>>>>>>I think we should really ask whether we are making ourselves deal 
>>>>>>with some very complicated rules for really no gain in 
>>>>>>functionality. None of the results will be valid according to the 
>>>>>>name space. All of them are valid if i make up my own namespace for 
>>>>>>the result set.  Unless we can hold our selves to the standard where 
>>>>>>any code or xsl written for the schema will successfuly process the 
>>>>>>result set (#2 is the closest to that, but depending on how loose 
>>>>>>the code is, all three could work or none could work), why shouldnt 
>>>>>>we opt for the easiest rule to comply with?
>>>>>> 
>>>>>> 
>>>>>>Peter McCartney (peter.mccartney at asu.edu 
>>>>>><mailto:peter.mccartney at asu.edu>)
>>>>>>Center for Environmental-Studies
>>>>>>Arizona State University
>>>>>> 
>>>>>>
>>>>>>    -----Original Message-----
>>>>>>    *From:* Saritha Bhandarkar
>>>>>>    *Sent:* Friday, April 09, 2004 10:28 AM
>>>>>>    *To:* 'seek-dev'
>>>>>>    *Cc:* Jing Tao; Peter McCartney; Saritha Bhandarkar
>>>>>>    *Subject:* resultset question
>>>>>>
>>>>>>    Hi,
>>>>>>
>>>>>>    I had a question about the resultset to be returned by Xanthoria.
>>>>>>
>>>>>>    The schema of the resultset specifies that a record is of type
>>>>>>    ?AnyRecordType? and optionally it may have some element content
>>>>>>    from the record. Now, my question here is, if I am to return the
>>>>>>    elements specified in the <returnfields> of the query, for the 
>>>>>>matching records (that is from the matching
>>>>>>    eml file), do I need to send it in eml format,  with only relevant
>>>>>>    values for requested fields and no values for the fields which are
>>>>>>    not requested? Or is it enough to return only the requested fields
>>>>>>    with their values, as well-formed xml? Can someone please brief me
>>>>>>    on the contents of a record in resultsetType?
>>>>>>
>>>>>>    Thanks,
>>>>>>
>>>>>>    Saritha
>>>>>>
>>>>>>    
>>>>>>    
>>>>>>    
>>>>>>    
>>>>>>    Saritha Bhandarkar
>>>>>>
>>>>>>    Research Assistant
>>>>>>
>>>>>>    Center for Environmental Studies
>>>>>>
>>>>>>    ASU-Tempe AZ
>>>>>>
>>>>>>    saritha.bhandarkar at asu.edu <mailto:saritha.bhandarkar at asu.edu>
>>>>>>
>>>>>>    
>>>>>>    
>>>>>>          
>>>>>>
>>>>>-- 
>>>>>Rod Spears
>>>>>Biodiversity Research Center
>>>>>University of Kansas
>>>>>1345 Jayhawk Boulevard
>>>>>Lawrence, KS 66045, USA
>>>>>Tel: 785 864-4082, Fax: 785 864-5335
>>>>>
>>>>>        
>>>>>
>>>>      
>>>>
>>
>>  
>>