connection question
Matt Jones
jones at nceas.ucsb.edu
Fri Oct 18 14:36:23 PDT 2002
Hi Peter,
Peter McCartney wrote:
> Hi Matt
>
> we had a planning meeting last week on our data access tool kit and have
> decided that there are three kinds of distribution that we mostly likly
> want to describe and are trying to figure out how to do it.
>
> These are
>
> 1) direct access information via a url or connection def that is open
> only to very restricted individuals (service apps and local lab users
> for example)
>
> 2) WSDL information about an online service that provides public access
> to the data via a web service. used by someone like Don Henshaw who
> wants to build a harvesting application for pulling data automatically
> from a service. over time, we see this as becoming the preferred method
> for publishing data as it gives the adminstrator more logging and
> accounting control than a direct URL might.
>
>
> 3) a web url that points the user to an interactive site that will walk
> them through communication with the web service described in #2. this is
> the url that would be distributed with public eml documents and in most
> cases would propbably be pointing to the very online data catalog the
> person is using to read the EML.
>
>
> the latter would propbably appear only in eml-dataset/distribtion . my
> only choice for function is download or information. We're not sure
> either is fully explanatory since its not direct download, but will
> give you the data with some interaction that cannot be automated.
well, actually, it can be automated if you understand the web interface
well enough, but that of course is a big if. Any web interaction over
http is just a series of request/responses that, if you understand the
application model, can be incorporated into a script. At this point I
would argue it is a purely "information" URL in the sense that the URL
alone does not provide the details on how to download the data, but
rather requires substantially more elaborate system/application
knowledge to acheive it. The URL is just the gateway into a complex
application, and isn't the whole application.
> the first is pretty much covered in the current content model
>
> the second is a bit unclear to us. a simple wsdl might tell you to go to
> send a soap message to x.x.edu with the paramters datasetID, entityID,
> etc, but could be a very elaborate one with lots of methods for
> different retrival functions and processing options. We could stick the
> wsdl in additionalMetadata or are considering publishing it as a uddi
> and then just putting the url to the uddi entry in the URL field. But if
> we did that, shouldnt we have a better function flag than "information"
> since this is informtion that conforms to a structured standard (WSDL)?
>
I would argue this kind of WSDL information represents a "connection",
not a URL, in our current parlance. I would add it as part of the
connectionDef definition, possibly in a CDATA structure.
As we said in our extensive earlier conversations, our current model for
describing complex applications utilizing even common protocols is
completely inadequate. Nevertheless, if you don't expect to universally
machine process anything but the simplest GET http URLS, then the
connectionDef structure works reasonably for conveying some important
information about the applications needs. It might be sufficient for
some software to make automated connections, but probably not for all
applications. The fact that we spent so much time talking about WSDL is
because we felt it was a potentially more powerful approach to
describing applications, but as a group we did not spend the time to
effectively evaluate it. So, I at least do not feel qualified to design
this structure intelligently at this point in time -- consequently, I
agreed to the less powerful, limited functionality (but more
understandable) of the connectionDef.
At this point, I would argue that EML does not, and should not, have
formal support for WSDL descriptions. We need to do a lot more research
and thinking about how to generically support those types of complex
applications before we just hack something into EML. I'm not sure WSDL
is really it. So, in that sense, if you put a WSDL def into EML, or
reference one with a URL, it is just information. Anything else would
presuppose that we've thought this through, which we haven't. It would
probably be best if we even tested an implementation across sites before
releasing such a beast in EML because it would have such important
ramifications for how software is designed.
Of course, I truly think this is an important area for us to focus on,
especially as we get SEEK going and start talking more about application
needs for accessing data directly. But it shouldn't be in this release
of EML, as its far too complicated to just take a precursory stab at it.
Matt
--
*******************************************************************
Matt Jones jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/ Fax: 425-920-2439 Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
Interested in ecological informatics? http://www.ecoinformatics.org
*******************************************************************
More information about the Eml-dev
mailing list