connection question

Fri Oct 18 14:36:23 PDT 2002

Hi Peter,

Peter McCartney wrote:
> Hi Matt
> 
> we had a planning meeting last week on our data access tool kit and have 
> decided that there are three kinds of distribution that we mostly likly 
> want to describe and are trying to figure out how to do it.
> 
> These are
> 
> 1) direct access information via a url or connection def that is open 
> only to very restricted individuals (service apps and local lab users 
> for example)
> 
> 2) WSDL information about an online service that provides public access 
> to the data via a web service. used by someone like Don Henshaw who 
> wants to build a harvesting application for pulling data  automatically 
> from  a service. over time, we see this as becoming the preferred method 
> for publishing data as it gives the adminstrator more logging and 
> accounting control than a direct URL might.
> 
>  
> 3) a web url that points the user to an interactive site that will walk 
> them through communication with the web service described in #2. this is 
> the url that would be distributed with public eml documents and in most 
> cases would propbably be pointing to the very online data catalog the 
> person is using to read the EML.
> 
> 
> the latter would propbably appear only in eml-dataset/distribtion . my 
> only choice for function is download or information. We're not sure 
> either is fully explanatory since its not direct download,  but will 
> give you the data with some interaction that cannot be automated.

well, actually, it can be automated if you understand the web interface 
well enough, but that of course is a big if.  Any web interaction over 
http is just a series of request/responses that, if you understand the 
application model, can be incorporated into a script.  At this point I 
would argue it is a purely "information" URL in the sense that the URL 
alone does not provide the details on how to download the data, but 
rather requires substantially more elaborate system/application 
knowledge to acheive it.  The URL is just the gateway into a complex 
application, and isn't the whole application.

> the first is pretty much covered in the current content model
> 
> the second is a bit unclear to us. a simple wsdl might tell you to go to 
> send a soap message to x.x.edu with the paramters datasetID, entityID, 
> etc, but could be a very elaborate one with lots of methods for 
> different retrival functions and processing options. We could stick the 
> wsdl in additionalMetadata or are considering publishing it as a uddi 
> and then just putting the url to the uddi entry in the URL field. But if 
> we did that, shouldnt we have a better function flag than "information" 
> since this is informtion that conforms to a structured standard (WSDL)?
> 

I would argue this kind of WSDL information represents a "connection", 
not a URL, in our current parlance.  I would add it as part of the 
connectionDef definition, possibly in a CDATA structure.

As we said in our extensive earlier conversations, our current model for 
describing complex applications utilizing even common protocols is 
completely inadequate.  Nevertheless, if you don't expect to universally 
machine process anything but the simplest GET http URLS, then the 
connectionDef structure works reasonably for conveying some important 
information about the applications needs.  It might be sufficient for 
some software to make automated connections, but probably not for all 
applications.  The fact that we spent so much time talking about WSDL is 
because we felt it was a potentially more powerful approach to 
describing applications, but as a group we did not spend the time to 
effectively evaluate it.  So, I at least do not feel qualified to design 
this structure intelligently at this point in time -- consequently, I 
agreed to the less powerful, limited functionality (but more 
understandable) of the connectionDef.

At this point, I would argue that EML does not, and should not, have 
formal support for WSDL descriptions.  We need to do a lot more research 
and thinking about how to generically support those types of complex 
applications before we just hack something into EML.  I'm not sure WSDL 
is really it.  So, in that sense, if you put a WSDL def into EML, or 
reference one with a URL, it is just information. Anything else would 
presuppose that we've thought this through, which we haven't.  It would 
probably be best if we even tested an implementation across sites before 
releasing such a beast in EML because it would have such important 
ramifications for how software is designed.

Of course, I truly think this is an important area for us to focus on, 
especially as we get SEEK going and start talking more about application 
needs for accessing data directly.  But it shouldn't be in this release 
of EML, as its far too complicated to just take a precursory stab at it.

Matt

-- 
*******************************************************************
Matt Jones                                    jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Fax: 425-920-2439   Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)

Interested in ecological informatics? http://www.ecoinformatics.org
*******************************************************************