distribution element issues
Peter McCartney
peter.mccartney at asu.edu
Wed May 22 11:45:41 PDT 2002
Ive been looking at ISO 19115.3 in the context of the spatial modules, but
noted that they also use URL as the sole means of online addressing.
However, they do somethings that parallel some suggestions i made in my last
email. in addition to URL, they provide a protocol element and an
applicationProfile element. the latter might be a place to put the driver
name for which the url is compatible since we agreed that this is not
intuitively obvious from the url. (see B.2.3.4)
They also provide a function code which has a domain like "access",
"additionalInformation", "download", "search" etc. which give you some idea
what you can expect to do at that url. this would replace the directURL |
indirectURL distincton i had suggested to let us know when a url would
require interactive user input or not.
These changes might brighten my dim expectations of urls for service
connections.
Peter McCartney (peter.mccartney at asu.edu)
Center for Environmental Studies
Arizona State University
480-965-6791
-----Original Message-----
From: Matt Jones [mailto:jones at nceas.ucsb.edu]
Sent: Thursday, May 16, 2002 10:37 AM
To: Peter McCartney
Cc: eml-dev
Subject: Re: distribution element issues
Hi Peter,
Thanks for the well-reasoned response. My comments are inline...
Peter McCartney wrote:
> Well this was the very reason why i proposed providing a choice of
> parameter models that were specific to schemes. I appreciate the
> ambiguity of your "database" example, but only because i dont recognize
> the scheme. By asking you to pick the scheme (MS sql server) from a
> controlled list that is documented in EML i could then force you to
> enter version = 7.0, host=maricopa, port=1433, networkProtocol=named
> pipes, database=arthropods. There would be no ambiguity as to semantics
> and you would not have to know the exact syntax of how to build the url
> string for whatever driver i wish have available to use.
So, you seem to want a finite number of well-defined connection types,
each of which would have its own set of parameters. In theory I think
this is fine, but in practice I think it will only work for you (because
you'll pick the schemes and parameters that are right for your systems).
The diversity of connection types is large and is growing, and I don't
think we can hope to enumerate even a part of them. In addition, for
any given protocol, the details of the connection are complex, and are
far beyond our ability to enumerate the parameters and their semantics.
Take, for example, smb connections. The IETF working draft that
enumerates the URL syntax and semantics for these well-known connections
is 17 pages long
(http://www.ietf.org/internet-drafts/draft-crhertel-smb-url-02.txt
). I have read the detailed mailing list archives on this topic, and
the subtleties in the parameters are deep, especially when
differentiating smb connections from cifs connections (people want a
separate url scheme for cifs), even though most clients like windows
machines handle the two protocols with one user interface. Personally,
I am not up to the task of even providing a comprehensive list of
parameters for the simple protocols like http, https, ftp, sftp, smb,
cifs. I can't fathom the complexity of jdbc and odbc, or the oracle
call interface, or sde.
I would far prefer to not implement something in EML that is a partial,
hacked-together solution when there is a standardized mechanism for
providing well-structured information for a protocol (IETF URL schemes).
If a protocol is sufficiently well known in the community then someone
should have developed a URL scheme for describing connection
information. If they haven't, I don't think we can really actually
develop the spec for that connection type.
> To blow this problem off in favor of just using urls I think puts us
> (almost) back where we were a year ago.
I'm not trying to blow this off at all. I am very concerned with
implementing a partial, ambiguous solution to a very complex problem.
> I do see more utility in URLs
> after our discussion of providing urls for a specific driver. But is
> still a problem for users who want to use my data and neither have that
> particular driver nor know how to rewrite the url string into an
> equivalent one for an alternate driver (although I can mitigate that
> somewhat by trying to provide urls for as many different connection
> protocols out there that i can anticipate). There is also a problem in
> that I do not see evidence that ALL online connections can indeed be
> described by a structured URL string (i can't find one for an SDE
> connection, although i did find one for modem dialups). Im also not
> convinced that urls carry enough information. If i give you
> file://maricopa.asu.edu/proj/lter/filename.txt its a crap shoot whether
> it will work for you because i havent told you that maricopa.asu.edu is
> an NT server located in the LTER domain.
That's becuase you used the wrong URL format. You should have used the
"smb" URL if it is on an SMB server like NT or SAMBA.
> Similarly, with JDBC there is a
> keyword for the the driver in the url string, but jdbc isnt smart enough
> to parse the url and figure out what driver to use - you need to
> separately provide the class name of the driver that the url is for.
That is an interesting issue for JDBC. Its actually a bit of a chick &
egg, because the driver class is actually what is determining how to
interpret the driver-specific parameters. I don't think there is even a
registry of JDBC driver names (although I could be wrong on that --
haven't looked).
> Finally, I really question whether users can be expected to know the
> proper structure for providing a url string for most service connections
> - we will have to provide wizards to help them with that. Those wizards
> will have to be based on content models of parameters for each known
> scheme, so why the heck dont we make them part of EML in the first place?
Turning that on its head -- the syntax for encoding the parameters
(e.g., a URL) has nothing to do with the user interface presentation.
If a user needs to input the information for, say, an LDAP over TLS
connection, will we have those paramters in place for EML? I think not.
Seems to me that all of this user-input stuff is going to have to be
application generated independently of the EML schemas -- we just want
EML to be able to encode it in a standared way for transport.
> Part of the problem i think we're having here is the difference between
> connection info we share with the world vesus connection information we
> want to use locally. I need a metadata format that allows me to generate
> a display in our data catalog for local users (or my local web
> application) to know how to find a file while they are sitting in the
> lab (eg.... network protocol: MS windows networking, domain: LTER,
> server:maricopa, folder: proj\lter\po10\, filename:xxxx). perhaps one
> solution is to make URL a required connection type but provide some form
> of parameter model as an option. editors could generate the url version
> from the parameters but the parameters would remain in the metadata for
> local applications.
I agree. Public connection info versus private connection info seems to
be the crux of the matter. Maybe making url required but still providing
the other fields would work. But I suspect it'll cause problems for you
for those connections that you say don't have a URL representation.
> I certainly agree that if there is an unambiguous way of describing a
> url to a connection, that should be preferred. But I'm pretty sure that
> if this is the only way of defining a connection in EML, many sites
> using server connections or local file system addresses (myself
> included) will wind up extending EML with their own locally defined
> connection description schemas to solve some of the problems I mention
> above. If im on my own on this, then im likely to just locally use my
> original content models for each kind of connection scheme we use at CAP
> and simply build URLs in XSL when generating valid EML documents. Now
> maybe this isnt so bad if I am not inclined to show that detailed info
> to the public anyway. I guess it all depends on how much we want EML to
> set standards for managing metadata at the internal site level, but I
> see some advantages to a solution that is itself part of EML so that we
> dont have a bazillon different solutions to the same problem.
I'm open to discussion on the extent to which EML is used for
data/metadata exchange versus internal site management. I think it
makes sense for a public exchange mechanism. I'm not sure it is as
compelling for site-specific details, but I can see your argument for it.
> Before we drop this, has anyone looked at how the SRB MCAT stores
> connection information? it seems like it has a similar problem in having
> to deal with a lot of different kinds of connections. Does it manage to
> do all this with a single URL field?
It is very proprietary, and thus somewhat limited in terms of
extensibility. It does not use URLs. Rather, there is a C driver for
each type of physical resource connection (UNIX filesystesm, ftp, http,
Oracle, DB2, etc), and configuration info for each driver is stored
partly in text config files and partly in the database schema. It does
not give generalized access to databases in the way we are discussing --
rather, it gives access to particular hardcoded SQL queries.
> On a totally separate note, i like the idea of token substitutions for
> defining url's in such a way that they can be used more generically -
> this neatly allows you to define the host and path of an ftp connection
> once, and then substitute the filename for datasets that have several
> files on one ftp site. So i say add that feature, regardless of how we
> resolved the url/parameter debate.
OK, I'll try to develop this further for the next checkin. I'm not quite
sure how it would work fully. ANybody got any further suggestions/insights?
> But this feature begs another question. For web apps that dont expose
> their form parameters in the url via GET, the token substitution trick
> still won't help us automate running these applications. How do we
> reference an online application for which further interactive user input
> cannot be avoided in order to get the data. Do we enter these under
> "connections" or is an onlineApplicationURL different from an onlineURL?
There seem to be 2 issues here. First, some applications do not expose
a GET interface over HTTP, but rather only allow POST. They need a
different parameter encoding than the GET request, which isn't satisfied
by a URL. It is interesting that an HTTP url implies a GET, when in fact
it is only 1 of several possible http methods. I'll have to think about
this some more.
Second, some applications absolutely require user interaction to get to
the data, so there is no way to provide complete connection information.
I think these are out of scope for us, meaning that someone can
provide an informational URL, but its not going to get us to the data.
In that case, I do not think it belongs in the distribution element, but
rathe rin some other more descriptive metadata section.
Well, that's about it. Hopefully some of the other people on this list
will chime in and help out with these discussions. Thanks again for the
thoughtful comments, Peter.
Matt
> Peter McCartney (peter.mccartney at asu.edu)
> Center for Environmental Studies
> Arizona State University
> 480-965-6791
>
> -----Original Message-----
> From: Chad Berkley [mailto:berkley at nceas.ucsb.edu]
> Sent: Wednesday, May 15, 2002 1:35 PM
> To: Matt Jones
> Cc: eml-dev at ecoinformatics.org
> Subject: Re: distribution element issues
>
>
> I think we should eliminate the parameters altogether. I don't see the
> point of them since all of the information that they can encode can be
> more precisely encoded in a URL.
>
> chad
>
> On Wed, 2002-05-15 at 12:58, Matt Jones wrote:
> > Hey --
> >
> > I pointed out some problems with the "distribution" element that I am
> > trying to resolve in my second comment on bug 480:
> > http://bugzilla.ecoinformatics.org/show_bug.cgi?id=480#c2
> >
> > I could really use some feedback on this to see what others think
before
> > I finalize the changes. This is a plea for help! Thanks.
> >
> > Matt
> >
> > --
> > *******************************************************************
> > Matt Jones jones at nceas.ucsb.edu
> > http://www.nceas.ucsb.edu/ Fax: 425-920-2439 Ph: 907-789-0496
> > National Center for Ecological Analysis and Synthesis (NCEAS)
> >
> > Interested in ecological informatics? http://www.ecoinformatics.org
> > *******************************************************************
> >
> > _______________________________________________
> > eml-dev mailing list
> > eml-dev at ecoinformatics.org
> > http://www.ecoinformatics.org/mailman/listinfo/eml-dev
> --
>
> _______________________________________________
> eml-dev mailing list
> eml-dev at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/eml-dev
>
--
*******************************************************************
Matt Jones jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/ Fax: 425-920-2439 Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
Interested in ecological informatics? http://www.ecoinformatics.org
*******************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20020522/0ecc1389/attachment.htm
More information about the Eml-dev
mailing list