No subject


Tue Mar 22 16:40:45 PST 2005


what Manu and I have been working on are some ontologies that will guide
how we can choose the right transformation operation based on a
comparison of two eml descriptions. For example, if I have a dataset
described in eml that I want to input into a model that has its input
requirements described in eml, I can compare those two sets of attribute
and spatial representation descrptions and find out that one is a point
layer in UTM coverin arizona, and the other requires a grid, rectified
to state plane, extending to the maricopa county extent. From this
comparison, I want to recognize the class membership of these to
descriptions in my ontology, and find the appropriate transofmration
operations I need to do. Then, I need to work back out the other way,
searching my ecogrid registry for actual app instances whose metadata
descriptions indicates that they correspond to this operation concept
from my ontology.

Now, the question of how central EML should be is a good one. Its true
that the entire world is not going to adopt EML (although I must say,
Henry Gholz has done yeoman's service educating all directorates at NSF
about EML). However, the very reason we created EML was because other
common meetadata standards failed to encode the kind of information we
felt necessary to enable some of these kinds of tasks. I could not do
dynamic rendering with data described in FGDC unless I build in some
additional code to actually open the data and try to guess some of those
things. So I guess I did have a mental image that what we build for
ecogrid would infact be eml-aware, and that wrappers would do what
translation they could. If we think we can accomplish automated
processing and semantic extension with other kinds of metadata
standards, we should have some discussion of how we will do that. We
also should be prepared with an answer when the LTER network sites ask
if this means that they can publish their spatial metadata in fgdc
rather than eml...


-----Original Message-----
From: seek-dev-admin at ecoinformatics.org
[mailto:seek-dev-admin at ecoinformatics.org] On Behalf Of Shawn Bowers
Sent: Wednesday, April 28, 2004 10:39 AM
To: Bertram Ludaescher
Cc: Peter McCartney; bzhu at sdsc.edu; Rod Spears; Chad Berkley; Saritha
Bhandarkar; seek-dev; Jing Tao; Matt Jones
Subject: Re: [seek-dev] RE: resultset question



Peter (and all),

One of the things that we have been doing, or working towards, is=20
exploiting EML descriptions for datasets to automatically populate part=20
of the ontology annotation of the dataset (the "semantic registration").

  Rich Williams has written Java code that takes part of the=20
description, e.g., for location coverage, and outputs the corresponding=20
OWL instances.

I have looked through the EML documentation on line, and like Bertram,=20
would like to see something that shows the "power" of EML.

In sifting through the XML schema for EML, I undestand there are=20
multiple types of resources EML can handle.  However, in practice, I=20
don't see these being used. In fact, from the LTER sites that I have=20
been pointed to, it looks like the majority of the EML files don't=20
include much metadata at all -- I have only seen EML files for datasets,

and most of these don't even describe the schema of the resource. They=20
typically have some other format that they use to describe the fields,=20
e.g.  It would be great to see some examples, whether real or=20
"cooked-up" that show EML for describing web-services (i.e.,=20
eml-software), spatial raster, spatial vector, literature, and so on.

In general, I agree with your comment below that we need to align=20
Ecogrid and SMS (assuming there is a desire to add ontology-based=20
resource discovery, integration, and transformation to Kepler).

Maybe I am missing something, but your email below seems to suggest that

we should think of EML as the "interface" between Ecogrid and SMS.  My=20
concern, however, with focusing (and relying) on EML as the *only*=20
metadata language, and building Ecogrid around EML, is that it severly=20
limits the ability of data providers to add existing resources to the=20
Ecogrid, and clients for accessing resources.  For example, in the=20
current BEAM mammals use case, the data they wish to use doesn't have=20
EML metadata (data from IPCC) -- and it would be a major effort to=20
create the EML metadata for these files. Note also that SMS doesn't=20
require or rely on EML, although, if present, we want to exploit it as=20
well as any other metadata associated with a resource.

My other concern is whether Ecogrid provides the most basic operations=20
required to access resources -- which are necessary for SMS, and this=20
concern is what I was trying to describe in my previous emails.  Simply=20
getting resources by id (in a uniform format), determining what type of=20
resource it is, and getting all of the associated metadata files for a=20
resource all seem to be fundamental operations that the Ecogrid should=20
support.  And, providing these operations would enable "naive" clients,=20
e.g., that don't understand EML, to still leverage the Ecogrid and be=20
able to generically access stored resources.

What are your thoughts on this?

Shawn



Bertram Ludaescher wrote:

> Peter:
>=20
> Good points!
>=20
> Is there a tutorial or reference paper that describes the fundamental=20
> principles in EML?
>=20
> Bertram
>=20
>=20
>>>>>>"PM" =3D=3D Peter McCartney <peter.mccartney at asu.edu> writes:
>=20
> PM>=20
> PM> I think it is clear that we need a joint ecogrid SMS/KR meeting.=20
> PM> On the SMS side im hearing still some unfamiliarity with what=20
> PM> information has been designed into EML (ie - the low level class=20
> PM> distinctions between resource types is built into the extensible=20
> PM> resource module) on the ecogrid side, im seeing a need to get some

> PM> input on where to couple our existing data with the new semantic=20
> PM> ontologies. Hopefully you will all get to talk about some of this=20
> PM> in scotland, which I unfortunately will miss.
> PM>=20
> PM> -----Original Message-----
> PM> From: Bing Zhu [mailto:bzhu at sdsc.edu]
> PM> Sent: Tuesday, April 27, 2004 11:52 PM
> PM> To: Shawn Bowers; Rod Spears
> PM> Cc: Chad Berkley; Peter McCartney; Saritha Bhandarkar; seek-dev;
Jing
> PM> Tao; Matt Jones; Bertram Ludaescher
> PM> Subject: RE: [seek-dev] RE: resultset question
> PM>=20
> PM>=20
> PM> Ecogrid builds low level tools for SEEK users to export, import=20
> PM> and query datasets and metadata. For storing datasets or building=20
> PM> registry for "web service", "shape file", "source code", "PDF=20
> PM> document", "ontology", etc,  we need some data modeling work. e.g.

> PM> in SRB, you can create and organize datasets in different=20
> PM> collections. With this approach, you can have a web service=20
> PM> registry collection in which we can create and store datasets=20
> PM> serving as our web service registry.
> PM>=20
> PM> It seems to me that a perfect design for our Ecogrid resultset=20
> PM> needs to use some knowledge from ontologies. I am not sure if it=20
> PM> is appropriate to mix the ontology layer within Ecogrid software=20
> PM> layer.
> PM>=20
> PM> Bing
> PM>=20
> PM>=20
> PM>=20
> PM> -----Original Message-----
> PM> From: seek-dev-admin at ecoinformatics.org=20
> PM> [mailto:seek-dev-admin at ecoinformatics.org]On Behalf Of Shawn=20
> PM> Bowers
> PM> Sent: Monday, April 26, 2004 9:40 AM
> PM> To: Rod Spears
> PM> Cc: Chad Berkley; Peter McCartney; Saritha Bhandarkar; seek-dev;
Jing
> PM> Tao; Matt Jones; Bertram Ludaescher
> PM> Subject: Re: [seek-dev] RE: resultset question
> PM>=20
> PM>=20
> PM>=20
> PM>=20
> PM> Rod Spears wrote:
>=20
>>>See comments below.... and please comment on my comments.
>=20
> PM>=20
> PM> Comments below on your comments... (Thanks for responding to my=20
> PM> orginal mail so quickly)
> PM>=20
> PM> Also, you should comment on my comments on your comments ;-)
> PM>=20
>=20
>>>Shawn Bowers wrote:
>>>
>>>
>>>>On Fri, 23 Apr 2004, Rod Spears wrote:
>>>>
>>>>[snip ...]
>>>>
>>>>
>>>>
>>>>
>>>>>What can be done to help generic consumers and SMS?
>>>>>
>>>>>
>>>>
>>>>I have some opinions/observations about what Ecogrid can provide for
>>>>SMS (assuming you mean Semantic Mediation System). No one actually=20
>>>>asked for my opinion, but the door is opened by the question, and I=20
>>>>thought I'd barge in :-)
>>>>
>>>>Here is what I see SMS needing. (Note that this might be a lot
>>>>different than what ecogrid actually intends to provide -- these
items
>=20
> PM>=20
>=20
>>>>are more aligned with the architecture of traditional integration
>>>>systems and systems being developed like for GEON.)
>>>>
>>>>1). Every resource registered in the Ecogrid should have a=20
>>>>persistent,
>=20
> PM>=20
>=20
>>>>Ecogrid-relative unique identifier.
>>>>
>>>
>>>Each does today. It has a unique name.
>=20
> PM>=20
> PM>=20
> PM> I thought that it did.  This list of operations and data=20
> PM> structures is just to say what SMS needs from Ecogrid -- I assumed

> PM> much of this was implemented by Ecogrid already.
> PM>=20
> PM>=20
>=20
>>>>2). Every resource registered in the Ecogrid should fill-in two
>>>>Ecogrid metadata tags (dublin-core style). The first is the type of=20
>>>>resource registered, e.g., the type could be "dataset", "web
service",
>=20
> PM>=20
>=20
>>>>"shape file", "source code", "PDF document", "ontology", etc. (These
>>>>should be controlled values, i.e., come from a predefined list.)
>>>>
>>>
>>>Dave and I were just talking about this. We hoped we could get by
>>>without an extra identifier. Meaning the "type" could be derived from

>>>the service's location (or the interfaces it implements). But maybe
we
>=20
> PM>=20
>=20
>>>will need a simple field for easier indentification.
>=20
> PM>=20
> PM>=20
> PM> I think the assumption that the location determines the resource=20
> PM> type is not general enough (and also not extensible).  For=20
> PM> example, if we have an SRB repository used within Ecogrid for=20
> PM> storing datasets as well as PDF documents and ontologies, then a=20
> PM> namespace would have to capture all three of these types.  I=20
> PM> believe that with many of these underlying systems, like SRB and=20
> PM> Metacat, there is no requirement that all resources stored must be

> PM> of the same type.
> PM>=20
> PM> I stated above there should be a metadata tag for storing the type

> PM> information, but it could just as easily be an operation (or=20
> PM> query). For example, getResourceType : ResourceID -> ResourceType=20
> PM> is a partial function, where ResourceID is the set of all possible

> PM> Ecogrid resources identifiers and ResourceType is the set of all=20
> PM> resource types known by Ecogrid ("dataset", "web service", and so=20
> PM> on). So, for a given resource-id r, getResourceType(r) returns the

> PM> associated resource type of r.  If Ecogrid calculates this op=20
> PM> based on where r is stored, and that is really a valid assumption,

> PM> that seems fine.  Note that the operation could also be expressed=20
> PM> as a query, as opposed to a function or a metadata tag.
> PM>=20
>=20
>>>>The other tag
>>>>states the available (and Ecogrid accessible) standards-based=20
>>>>metadata
>=20
> PM>=20
>=20
>>>>for the resource, e.g., for a dataset this might include "FGDC",
>>>>"EML", "XML Schema" (for datasets stored in XML), "SQL DDL"; and for
a
>=20
> PM>=20
>=20
>>>>web service, "WSDL"; and so on. (Again, these should be controlled
>>>>values.) Other tags that might be useful (but not required by SMS)
are
>=20
> PM>=20
>=20
>>>>quality of resource (who registered it, whether it has been deemed
>>>>"accepted", and so on) and whether it is curated (stored by some=20
>>>>Ecogrid db) or stored externally (e.g., in the PNW database).
>>>>
>>>
>>>Would a namespace be enough to be able to specify "how" the metadata
>>>was stored?
>=20
> PM>=20
> PM> In this case, I don't think a namespace is enough.  Any given=20
> PM> resource may have multiple metadata specifications. For example,=20
> PM> if a given resource-id r happens to be a dataset, then there very=20
> PM> easily could be both an FGDC and an EML metadata file for r.  So,=20
> PM> what SMS needs is a
> PM> (partial) function getMetadataType : ResourceID -> MetadataType^2,
which
> PM> takes a resource id and returns a set (^2 means powerset) of
metadata
> PM> types (e.g., "SQL DDL", "EML", and so on).
> PM>=20
> PM> One question I have about the Ecogrid, and probably a=20
> PM> misconception I have, is that it seems like what is searched *for*

> PM> is metadata (like EML files), and not the actual resource. This=20
> PM> was what prompted my earlier post on how to get all resources from

> PM> the Ecogrid... do I have to first query for all the metadata=20
> PM> associated with the resource, then look in these files to see=20
> PM> where each resource is actually being stored? Like I said, this=20
> PM> might be a misconception I have -- it seems like this=20
> PM> metadata-centric view represents the only examples I've seen for=20
> PM> Ecogrid. I would like for SMS to have resource-centric access for=20
> PM> datasets; the resource is what is of interest (I give an example=20
> PM> in my next comment below). The same should be true for Kepler --=20
> PM> datasets can be processed in a workflow, not the EML files of the=20
> PM> datasets (there is a caveat to this; both Chad's EML ingestor and
in some ways, Iklay's web service actor, take metadata files, but their
purpose is to get from the
> PM> metadata to the actual resource, I believe).   Of course, for web
> PM> services (as an example), SMS doesn't need the actual resource,=20
> PM> and only needs the WSDL description (which happens to be all that=20
> PM> is needed to execute the web service).  However, conceptually, it=20
> PM> is still the web-service that is the resource -- the web-service=20
> PM> implementation is what is of interest, and the WSDL could be=20
> PM> viewed as just a by-product of the implementation. In fact, there=20
> PM> could be many WSDL descriptions of the same implementation.  There

> PM> may be some disagreement about this notion of Ecogrid being=20
> PM> resource-centric, but I would argue it is the more general=20
> PM> semantics.
> PM>=20
> PM> Does that make sense?
> PM>=20
> PM>=20
>=20
>>>>3). Ecogrid should support an operation to retrieve the metadata
>>>>definition for a resource. For example, if a dataset is stored
through
>=20
> PM>=20
>=20
>>>>the Ecogrid, and the resource has an EML description (which we know
>>>>from 2), then the operation would return the corresponding EML file=20
>>>>(of course, although not likely, there is nothing that would prevent
a
>=20
> PM>=20
>=20
>>>>resource from having multiple EML files).
>>>>
>>>
>>>Seems reasonable.
>>>
>>>
>>>>4). Ecogrid should support an operation to retrieve the actual
>>>>resource (the thing managed by the ecogrid; either a dataset, a web=20
>>>>service, a "code", or whatever).  Also, datasets should be returned=20
>>>>using a standard representation. For example, the canonical XML=20
>>>>representation for relational data or CSV.  I believe EML-tools=20
>>>>already provide some support for this for relational data. Thus, at=20
>>>>least for datasets, the Ecogrid should serve as a standard wrapper=20
>>>>service as used in distributed dbs and in information-integration=20
>>>>architectures. This service I see as useful for both SMS and for=20
>>>>Kepler in general.
>>>>
>>>
>>>It's either doing this, or I don't quite understand the question.
>=20
> PM>=20
> PM> Here is an example.  I am a scientist, and I have a dataset (a=20
> PM> single
> PM> relation) stored in an Access database. I also have an FGDC file
that I
> PM> created to describe my dataset. They are both living on my laptop.
I
> PM> want to store my dataset on the Ecogrid. I create an Ecogrid
resource-id
> PM> for the dataset, ecogrid:042604, and I register the resource-id
for the
> PM> dataset. That is, I upload the Access database to some Ecogrid
> PM> repository as well as the FGDC file, and I tell Ecogrid that the
FGDC
> PM> file should be used as the metadata file for the dataset.
> PM>=20
> PM> Later, SMS needs to integrate the dataset with some other dataset.

> PM> SMS knows the resource-id for both datasets. To do the=20
> PM> integration, SMS needs access to both datasets. To get access to=20
> PM> the datasets, SMS calls the Ecogrid function=20
> PM> getResource(ecogrid:042604, "CSV"), which returns the dataset as a

> PM> comma-separated-value text-file representation. Alternatively (and

> PM> preferred), SMS could call getResource(ecogrid:042604,=20
> PM> "RelationalXML"), which returns the same exact dataset using the=20
> PM> standard relational to XML mapping.
> PM>=20
> PM> Does Ecogrid already provide something like getResource? (If so=20
> PM> that would be awesome!)
> PM>=20
> PM>=20
> PM> Thanks,
> PM> Shawn
> PM>=20
>=20
>>>>5). Optionally (at least for SMS, these aren't required), Ecogrid=20
>>>>can
>>>>offer a query-routing/execution service and/or web service
invocation.
>=20
> PM>=20
>=20
>>>>The purpose of offering query or invocation services would be for
>>>>optimization (in some cases) and to enable such operations for
clients
>=20
> PM>=20
>=20
>>>>that cannot perform these locally.
>>>>
>>>
>>>I think this functionality is one of the benefits of using Globus.
>>>
>>>
>>>>I believe that items 1-4 are the only things really needed by SMS=20
>>>>from
>=20
> PM>=20
>=20
>>>>the Ecogrid. In particular, for SMS, it doesn't really matter how or
>>>>where the resource is stored (metacat, src, digir, etc.), and it=20
>>>>doesn't need services to query the catalog entries of those systems.

>>>>If people bypass the SMS system, then I guess these types of things=20
>>>>are needed.
>>>>
>>>>Items 1-3 seem relatively straightforward. Item 4 seems harder,
>>>>although EML-tools exist for much of this I guess -- I am not really

>>>>sure.
>>>>
>>>>
>>>>Shawn
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>The issue at the moment is that the contents of the <record>=20
>>>>>element
>>>>>is basically a blob and anything goes. For example:
>>>>>1) Metacat return a bunch of param elements contain the data
>>>>>2) DiGIR contents a bunuch of namespace qualified elements
containing
>=20
> PM>=20
>=20
>>>>>the data.
>>>>>3) The SRB doesn't even have any data in the record, the identifier
>>>>>attr is meaningful.
>>>>>
>>>>>We need to provide a mechanism for the contents to be interpreted,=20
>>>>>to
>=20
> PM>=20
>=20
>>>>>do this we will add four things to the existing resultset schema:
>>>>>1) One or more <namespace> elements the metadata - this will be the
>>>>>namespace for the new <returnfield> element
>>>>>2) Add a new element <returnfield>
>>>>>3) A "name" attribute for the returnfield element (basically the
same
>=20
> PM>=20
>=20
>>>>>as Peter 'xpath' att) which is a unique name within the record and
>>>>>may be meaning for whereever the data came from.
>>>>>4) A "type" attribute for the returnfield element that describe the

>>>>>type of data contained in the returnfield
>>>>>
>>>>>The most important and powerful part of the new additions is the
>>>>>"type" attr. This enables the value to be interpreted. Most of the=20
>>>>>time it can be described by a schema defintion type, for example=20
>>>>>"xsi:string" etc. Or it could be an url that points to a schema=20
>>>>>definition document. This means the value of the returnfield
element=20
>>>>>could be anything from a string or integer to an entire XML
document.
>>>>>
>>>>>(Note that the namespace attr has been removed from the record
>>>>>element)
>>>>>
>>>>>The new namespace attrs in the metadata provide a way for the value
>>>>>of the name attr and the type attr to be interpreted.
>>>>>
>>>>>Here is an example of the a metacat resultset that is returned=20
>>>>>today:
>=20
> PM>=20
>=20
>>>>><rs:resultset system=3D"http://knb.ecoinformatics.org"
>=20
> PM> resultsetId=3D"eml.001"
>=20
>>>>>xmlns:rs=3D"ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta=
1
>>>>>"
>>>>>
>>>>>xsi:schemaLocation=3D"ecogrid://ecoinformatics.org/ecogrid-resultset=
-
>>>>>1.
>>>>>0.0b
>=20
> PM> eta1
>=20
>>>>>../../src/xsd/resultset.xsd">
>>>>><resultsetMetadata> <sendTime>2004-03-10T13:47:26-0600</sendTime>
>>>>><startRecord>1</startRecord>
>>>>><endRecord>14</endRecord>
>>>>><recordCount>14</recordCount>
>>>>></resultsetMetadata>
>>>>><record number=3D"1"
>>>>>system=3D"http://dev.nceas.ucsb.edu"
>>>>>identifier=3D"obfs2.379.1"
>>>>>namespace=3D"eml://ecoinformatics.org/eml-2.0.0"
>>>>>lastModifiedDate=3D"2003-11-02T11:07:43-0600"
>>>>>creationDate=3D"2003-11-02T11:07:43-0600">
>>>>><param
>=20
> PM> name=3D"/eml/dataset/keywordSet/keyword">seasonality</param>
>=20
>>>>><param  name=3D"/eml/dataset/keywordSet/keyword">macroalgal
>>>>>bloom</param>
>>>>><param  name=3D"/eml/dataset/keywordSet/keyword">green
>=20
> PM> tide</param>
>=20
>>>>><param  name=3D"/eml/dataset/keywordSet/keyword">Ulva</param>
>>>>><param
>>>>>name=3D"/eml/dataset/creator/individualName/surName">Nelson</param>
>>>>><param  name=3D"/eml/dataset/keywordSet/keyword">biomass</param>
>>>>><param  name=3D"/eml/dataset/keywordSet/keyword">algal
>=20
> PM> blooms</param>
>=20
>>>>><param  name=3D"/eml/dataset/title">Armitage Bay Ulvoid Algal
>>>>>Biomass and Species Composition</param>
>>>>><param
>=20
> PM> name=3D"/eml/dataset/keywordSet/keyword">Enteromorpha</param>
>=20
>>>>><param  name=3D"/eml/dataset/keywordSet/keyword">Ulvaria</param>
>>>>></record>
>>>>>
>>>>>Here is an example of the same resultset as described by the new
>=20
> PM> approach:
>=20
>>>>><rs:resultset system=3D"http://knb.ecoinformatics.org"
>=20
> PM> resultsetId=3D"eml.001"
>=20
>>>>>xmlns:rs=3D"ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta=
1
>>>>>"
>>>>>
>>>>>xsi:schemaLocation=3D"ecogrid://ecoinformatics.org/ecogrid-resultset=
-
>>>>>1.
>>>>>0.0b
>=20
> PM> eta1
>=20
>>>>>../../src/xsd/resultset.xsd">
>>>>><resultsetMetadata> <sendTime>2004-03-10T13:47:26-0600</sendTime>
>>>>><startRecord>1</startRecord>
>>>>><endRecord>14</endRecord>
>>>>><recordCount>14</recordCount>
>>>>><namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
>>>>><namespace=20
>>>>>prefix=3D"xsi">http://www.w3.org/2001/XMLSchema-instance</namespace>=

>>>>></resultsetMetadata>
>>>>><record number=3D"1"
>>>>>system=3D"http://dev.nceas.ucsb.edu"
>>>>>identifier=3D"obfs2.379.1"
>>>>>lastModifiedDate=3D"2003-11-02T11:07:43-0600"
>>>>>creationDate=3D"2003-11-02T11:07:43-0600">
>>>>><returnfield name=3D"/eml/dataset/keywordSet/keyword"
>>>>>type=3D"xsi:string">seasonality</returnfield>
>>>>><returnfield name=3D"/eml/dataset/keywordSet/keyword"
>>>>>type=3D"xsi:string">macroalgal bloom</returnfield>
>>>>><returnfield name=3D"/eml/dataset/keywordSet/keyword"
>>>>>type=3D"xsi:string">green tide</returnfield>
>>>>><returnfield name=3D"/eml/dataset/keywordSet/keyword"
>>>>>type=3D"xsi:string">Ulva</returnfield>
>>>>><returnfield name=3D"/eml/dataset/creator/individualName/surName"
>>>>>type=3D"xsi:string">Nelson</returnfield>
>>>>><returnfield name=3D"/eml/dataset/keywordSet/keyword"
>>>>>type=3D"xsi:string">biomass</returnfield>
>>>>><returnfield name=3D"/eml/dataset/keywordSet/keyword"
>>>>>type=3D"xsi:string">algal blooms</returnfield>
>>>>><returnfield name=3D"/eml/dataset/title"=20
>>>>>type=3D"xsi:string">Armitage Bay Ulvoid Algal Biomass and Species
>=20
> PM> Composition</returnfield>
>=20
>>>>><returnfield name=3D"/eml/dataset/keywordSet/keyword"
>>>>>type=3D"xsi:string">Enteromorpha</returnfield>
>>>>><returnfield name=3D"/eml/dataset/keywordSet/keyword"
>>>>>type=3D"xsi:string">Ulvaria</returnfield>
>>>>></record>
>>>>>
>>>>>Note how we now can interpret the resultset in a much more=20
>>>>>meaningful
>=20
> PM>=20
>=20
>>>>>way. Also, note that there are two new namespace elements, one
>>>>>contains a "prefix" attr the other does not. The one without
becaomes
>=20
> PM>=20
>=20
>>>>>the default namespace for unqualified values in the name and type
>>>>>attrs.
>>>>>
>>>>>Here is the before and after for the DiGIR query:
>>>>>Before:
>>>>><rs:resultset resultsetId=3D"foo.1.1"=20
>>>>>system=3D"urn:not://sure/what/to/put/here"
>>>>>
>=20
> PM> =
xmlns:rs=3D"ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta
> PM> 1"
>=20
>>>>>xmlns:xsi=3D"http://www.w3.org/2001/XMLSchema-instance"
>>>>>
>>>>>xsi:schemaLocation=3D"ecogrid://ecoinformatics.org/ecogrid-resultset=
-
>>>>>1.
>>>>>0.0b
>=20
> PM> eta1
>=20
>>>>>../../src/xsd/resultset.xsd">
>>>>><resultsetMetadata> <sendTime>2003-05-02T16:45:50-09:00</sendTime>
>>>>><startRecord>1</startRecord>
>>>>><endRecord>2</endRecord>
>>>>><recordCount>2</recordCount>
>>>>></resultsetMetadata>
>>>>><record number=3D"1"
>>>>>
>>>>>system=3D"http://speciesanalyst.net/digir/DiGIR.php?resource=3DMamma=
lsD
>>>>>wC
>=20
> PM> 2"
>=20
>>>>>identifier=3D"mvz1"
>>>>>
>=20
> PM> namespace=3D"http://digir.net/schema/conceptual/darwin/2003/1.0"
>=20
>>>>>lastModifiedDate=3D"2003-03-03T10:42:13"
>>>>>creationDate=3D"2003-03-03T10:42:13">
>>>>><darwin:ScientificName>PEROMYSCUS LEUCOPUS
>>>>>NOVEBORACENSIS</darwin:ScientificName>
>>>>><darwin:Longitude>121</darwin:Longitude>
>>>>><darwin:Latitude>33</darwin:Latitude>
>>>>></record>
>>>>>
>>>>>After:
>>>>><rs:resultset resultsetId=3D"foo.1.1"=20
>>>>>system=3D"urn:not://sure/what/to/put/here"
>>>>>
>=20
> PM> =
xmlns:rs=3D"ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta
> PM> 1"
>=20
>>>>>xmlns:xsi=3D"http://www.w3.org/2001/XMLSchema-instance"
>>>>>
>>>>>xsi:schemaLocation=3D"ecogrid://ecoinformatics.org/ecogrid-resultset=
-
>>>>>1.
>>>>>0.0b
>=20
> PM> eta1
>=20
>>>>>../../src/xsd/resultset.xsd">
>>>>>
>>>>><resultsetMetadata> <sendTime>2003-05-02T16:45:50-09:00</sendTime>
>>>>><startRecord>1</startRecord>
>>>>><endRecord>2</endRecord>
>>>>><recordCount>2</recordCount>
>>>>>
>>>>><namespace>http://digir.net/schema/conceptual/darwin/2003/1.0</name
>>>>>sp
>=20
> ace>=20
>=20
>>>>><namespace
>>>>>prefix=3D"xsi">http://www.w3.org/2001/XMLSchema-instance</namespace>=

>>>>></resultsetMetadata>
>>>>>
>>>>><record number=3D"1"
>>>>>
>>>>>system=3D"http://speciesanalyst.net/digir/DiGIR.php?resource=3DMamma=
lsD
>>>>>wC
>=20
> PM> 2"
>=20
>>>>>identifier=3D"mvz1"
>>>>>lastModifiedDate=3D"2003-03-03T10:42:13"
>>>>>creationDate=3D"2003-03-03T10:42:13">
>>>>><returnfield path=3D"ScientificName"
>>>>>type=3D"xsi:string">PEROMYSCUS LEUCOPUS =
NOVEBORACENSIS</returnfield>
>>>>><returnfield path=3D"Longitude"
>=20
> PM> type=3D"xsi:int">121</returnfield>
>=20
>>>>><returnfield path=3D"Latitude" type=3D"xsi:int">33</returnfield>=20
>>>>></record>
>>>>>
>>>>>Here is the SRB's before and after:
>>>>>Before:
>>>>><rs:resultset system=3D"http://knb.ecoinformatics.org"
>>>>>resultsetId=3D"SeekSRB_001"
>>>>>xmlns:rs=3D"ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta=
1
"
>=20
> PM>=20
>=20
>>>>>><resultsetMetadata>
>>>>>
>>>>><sendTime>2004-04-16T11:02:12-0500</sendTime>
>>>>><startRecord>1</startRecord>
>>>>><endRecord>2</endRecord>
>>>>><recordCount>2</recordCount>
>>>>></resultsetMetadata>
>>>>><record number=3D"1"
>>>>>system=3D"http://srb.sdsc.edu"=20
>>>>>identifier=3D"/home/testuser.sdsc/SeekTestArea/Lesli Model::0"=20
>>>>>namespace=3D"srb://srb.sdsc.edu"
>>>>>lastModifiedDate=3D"2003-11-30T13:04:59-0600"
>>>>>creationDate=3D"2003-11-30T13:04:58-0600">
>>>>></record>
>>>>>
>>>>>After:
>>>>><rs:resultset system=3D"http://knb.ecoinformatics.org"
>>>>>resultsetId=3D"SeekSRB_001"
>>>>>xmlns:rs=3D"ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta=
1
"
>=20
> PM>=20
>=20
>>>>>><resultsetMetadata>
>>>>>
>>>>><sendTime>2004-04-16T11:02:12-0500</sendTime>
>>>>><startRecord>1</startRecord>
>>>>><endRecord>2</endRecord>
>>>>><recordCount>2</recordCount>=20
>>>>><namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
>>>>></resultsetMetadata>
>>>>><record number=3D"1"
>>>>>system=3D"http://srb.sdsc.edu"=20
>>>>>identifier=3D"/home/testuser.sdsc/SeekTestArea/Lesli Model::0"=20
>>>>>lastModifiedDate=3D"2003-11-30T13:04:59-0600"
>>>>>creationDate=3D"2003-11-30T13:04:58-0600">
>>>>><returnfield name=3D"location"
>>>>>type=3D"xsi:string">/home/testuser.sdsc/SeekTestArea/Lesli
>>>>>Model::0</returnfield>
>>>>></record>
>>>>>-------------------------------------------------------------------
--
>>>>>---
>>>>>The Query
>>>>>About the only difference between the old query and the new is that
>=20
> PM> is
>=20
>>>>>the returnfield value can concept attr values do not have a=20
>>>>>namespace then the prefix should be dropped from the namespace=20
>>>>>element , or
>=20
> PM> they
>=20
>>>>>should have a namespace if there is a prefix in the element. For
>=20
> PM> example:
>=20
>>>>><?xml version=3D"1.0" encoding=3D"UTF-8"?>
>>>>><egq:query queryId=3D"test.1.1"=20
>>>>>system=3D"http://knb.ecoinformatics.org"
>>>>>xmlns:egq=3D"ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1"
>>>>>xmlns:xsi=3D"http://www.w3.org/2001/XMLSchema-instance"
>>>>>
>>>>>xsi:schemaLocation=3D"ecogrid://ecoinformatics.org/ecogrid-query-1.0=
.
>>>>>0b
>>>>>eta1
>>>>>../../src/xsd/query.xsd">
>>>>><namespace>eml://ecoinformatics.org/eml-2.0.0</namespace>
>>>>><returnfield>/eml/dataset/title</returnfield>
>>>>>
>>>>>
> PM> <returnfield>/eml/dataset/creator/individualName/surName</returnfi
> PM> eld>
>=20
>>>>><returnfield>/eml/dataset/pubDate</returnfield>
>>>>><returnfield>/eml/dataset/keywordSet/keyword</returnfield>
>>>>><title>Soils metadata query</title>
>>>>><AND>
>>>>><OR>
>>>>><condition operator=3D"LIKE"
>=20
> PM> concept=3D"title">%soil%</condition>
>=20
>>>>><condition operator=3D"NOT LIKE"
>>>>>concept=3D"title">%dirt%</condition>
>>>>></OR>
>>>>><OR>
>>>>><condition operator=3D"LIKE"
>=20
> PM> concept=3D"surName">%Jones%</condition>
>=20
>>>>><condition operator=3D"LIKE"
>>>>>concept=3D"surName">%Vieglais%</condition>
>>>>></OR>
>>>>></AND>
>>>>></egq:query>
>>>>>-------------------------------------------------------------------
--
>>>>>---
>>>>>
>>>>>We can either discuss this via email, or think about it and discuss
>>>>>it further during our phone meeting.
>>>>>
>>>>>Rod
>>>>>
>>>>>
>>>>>Chad Berkley wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Hi,
>>>>>>
>>>>>>Sorry for my late reply...we've been busy with a morpho release.
>>>>>>thanks for getting me in gear, Rod.
>>>>>>
>>>>>>In metacat, we only return leaf nodes (i.e. the text node child of

>>>>>>a
>=20
> PM>=20
>=20
>>>>>>CDATA element like in response 4 below).  The returnfield
>>>>>>functionality was originally meant as a convenient way to return=20
>>>>>>enough information for a meaningful resultset to display, say, on
a=20
>>>>>>web page.  It was not meant to return whole document chunks for=20
>>>>>>further processing.  I can see how this would be useful, but it=20
>>>>>>would require returning a namespace defined chunk so that a parser

>>>>>>would know what to do with it.  Metacat currently uses the=20
>>>>>>returnfields to build the resultset table, then a request must be=20
>>>>>>made for the whole document in order to do further processing.
>>>>>>
>>>>>>Looking at the responses 1-3 below, to me, they are all invalid=20
>>>>>>and
>>>>>>potentially problematic.  without a namespace to parse those xml=20
>>>>>>chunks off of, the parser is left to just do well-formedness=20
>>>>>>checking and any query into these document chunks may fail because

>>>>>>we don't know what to expect to get back before doing the
processing
>=20
> PM>=20
>=20
>>>>>>(e.g. an xpath query).
>>>>>>
>>>>>>So I guess to make a short answer long, I agree with Peter's
>>>>>>assessment of sticking with response 4 (which is basically what=20
>>>>>>metacat has done all along).
>>>>>>
>>>>>>chad
>>>>>>
>>>>>>
>>>>>>Rod Spears wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>Is anyone better qualified than me, going to address Peter's
>>>>>>>questions?
>>>>>>>
>>>>>>>Please someone respond, thanks.
>>>>>>>
>>>>>>>Rod
>>>>>>>
>>>>>>>
>>>>>>>Peter McCartney wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>it has to be well formed no matter what. so the question is=20
>>>>>>>>really
>=20
> PM>=20
>=20
>>>>>>>>how can we identify a namespace for the result set when the
>>>>>>>>content we stick in there has no hope of being valid? further,
how
>=20
> PM>=20
>=20
>>>>>>>>can we define  a set of rules for how the results are to be
>>>>>>>>evaluated against that namespace yet not be valid? request 1:=20
>>>>>>>>'*/creator/individualName/surname', '/eml/dataset
>>>>>>>>
>>>>>>>>Rule1: "content must appear in minimal xml tree needed to
>>>>>>>>accomodate the informaton"
>>>>>>>>
>>>>>>>>Rule2: "content must appear in a potentially valid xml tree that
>>>>>>>>invalidates only due other required elements missing.
>>>>>>>>
>>>>>>>>rule 3 "conent must appear in a tree that placed in in correct
>>>>>>>>node ancestry for the declared namespace.
>>>>>>>>
>>>>>>>>
>>>>>>>>response 1: meets 1 and 3 and is well formed. Requires just
>>>>>>>>knowledge of parent ancestry to build. <eml>
>>>>>>>><dataset>
>>>>>>>><creator>
>>>>>>>><individualName>
>>>>>>>><surname>mccartney</surname>
>>>>>>>><surname>jones</surname>
>>>>>>>></individualname>
>>>>>>>></creator>
>>>>>>>></dataset>
>>>>>>>><eml>
>>>>>>>>
>>>>>>>>response 2: meets 1, 2 and 3 and is well formed. Requires
>>>>>>>>knowledge of ancestry and index (ie jones is in creator[2] of=20
>>>>>>>>dataset[1] ) <eml>
>>>>>>>><dataset>
>>>>>>>><creator>
>>>>>>>><individualName>
>>>>>>>><surname>mccartney</surname>
>>>>>>>></individualname>
>>>>>>>></creator>
>>>>>>>><creator>
>>>>>>>><individualName>
>>>>>>>><surname>jones</surname>
>>>>>>>></individualname>
>>>>>>>></creator>
>>>>>>>></dataset>
>>>>>>>><eml>
>>>>>>>>
>>>>>>>>
>>>>>>>>response 3: meets 3 and is not well formed. rquires knowledge of
>>>>>>>>ancestry.
>>>>>>>>
>>>>>>>><eml>
>>>>>>>><dataset>
>>>>>>>><creator>
>>>>>>>><individualName>
>>>>>>>><surname>mccartney</surname>
>>>>>>>></individualname>
>>>>>>>></creator>
>>>>>>>></dataset>
>>>>>>>><eml>
>>>>>>>><dataset>
>>>>>>>><creator>
>>>>>>>><individualName>
>>>>>>>><surname>jones</surname>
>>>>>>>></individualname>
>>>>>>>></creator>
>>>>>>>></dataset>
>>>>>>>></eml>
>>>>>>>>
>>>>>>>>and just a reminder of where we originally started from
>>>>>>>>(approximately)
>>>>>>>>reponse 4: meets no rule, cannot validated, but conveys all the
>>>>>>>>information to generate format 1 or 3 above using a string=20
>>>>>>>>tokenizer and a jDOM. but not option 2. <resultset=20
>>>>>>>>namespace=3Deml......>
>>>>>>>><returnfield=20
>>>>>>>>xpath=3D"dataset/creator/individualname/surname">mccartney</retur=
n
fi
>=20
> eld>=20
>=20
>>>>>>>><returnfield
>>>>>>>>xpath=3D"dataset/creator/individualname/surname">jones</returnfie=
l
d>
>>>>>>>></resultset>
>>>>>>>>
>>>>>>>>I think we should really ask whether we are making ourselves=20
>>>>>>>>deal
>>>>>>>>with some very complicated rules for really no gain in=20
>>>>>>>>functionality. None of the results will be valid according to
the=20
>>>>>>>>name space. All of them are valid if i make up my own namespace=20
>>>>>>>>for the result set.  Unless we can hold our selves to the
standard
>=20
> PM>=20
>=20
>>>>>>>>where any code or xsl written for the schema will successfuly
>>>>>>>>process the result set (#2 is the closest to that, but depending

>>>>>>>>on how loose the code is, all three could work or none could=20
>>>>>>>>work), why shouldnt we opt for the easiest rule to comply with?
>>>>>>>>
>>>>>>>>
>>>>>>>>Peter McCartney (peter.mccartney at asu.edu
>>>>>>>><mailto:peter.mccartney at asu.edu>)
>>>>>>>>Center for Environmental-Studies
>>>>>>>>Arizona State University
>>>>>>>>
>>>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>*From:* Saritha Bhandarkar
>>>>>>>>*Sent:* Friday, April 09, 2004 10:28 AM
>>>>>>>>*To:* 'seek-dev'
>>>>>>>>*Cc:* Jing Tao; Peter McCartney; Saritha Bhandarkar
>>>>>>>>*Subject:* resultset question
>>>>>>>>
>>>>>>>>Hi,
>>>>>>>>
>>>>>>>>I had a question about the resultset to be returned by
>>>>>>>>Xanthoria.
>>>>>>>>
>>>>>>>>The schema of the resultset specifies that a record is of type=20
>>>>>>>>?AnyRecordType? and optionally it may have some element
>=20
> PM> content
>=20
>>>>>>>>from the record. Now, my question here is, if I am to return
>=20
> PM> the
>=20
>>>>>>>>elements specified in the <returnfields> of the query, for the
>=20
> PM>=20
>=20
>>>>>>>>matching records (that is from the matching
>>>>>>>>eml file), do I need to send it in eml format,  with only
>=20
> PM> relevant
>=20
>>>>>>>>values for requested fields and no values for the fields which
>=20
> PM> are
>=20
>>>>>>>>not requested? Or is it enough to return only the requested
>=20
> PM> fields
>=20
>>>>>>>>with their values, as well-formed xml? Can someone please
>=20
> PM> brief me
>=20
>>>>>>>>on the contents of a record in resultsetType?
>>>>>>>>
>>>>>>>>Thanks,
>>>>>>>>
>>>>>>>>Saritha
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>Saritha Bhandarkar
>>>>>>>>
>>>>>>>>Research Assistant
>>>>>>>>
>>>>>>>>Center for Environmental Studies
>>>>>>>>
>>>>>>>>ASU-Tempe AZ
>>>>>>>>
>>>>>>>>saritha.bhandarkar at asu.edu <mailto:saritha.bhandarkar at asu.edu>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>--
>>>>>>>Rod Spears
>>>>>>>Biodiversity Research Center
>>>>>>>University of Kansas
>>>>>>>1345 Jayhawk Boulevard
>>>>>>>Lawrence, KS 66045, USA
>>>>>>>Tel: 785 864-4082, Fax: 785 864-5335
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
> PM>=20
> PM> _______________________________________________
> PM> seek-dev mailing list
> PM> seek-dev at ecoinformatics.org=20
> PM> http://www.ecoinformatics.org/mailman/listinfo/seek-dev

_______________________________________________
seek-dev mailing list
seek-dev at ecoinformatics.org
http://www.ecoinformatics.org/mailman/listinfo/seek-dev



More information about the Seek-dev mailing list