[kepler-dev] [eml-dev] Question about EML-based file access in Kepler

Jing Tao tao at nceas.ucsb.edu
Tue Mar 18 15:33:39 PDT 2008


Yeah, I think we should consider this as a bug. I will input one into 
bugzilla.

I am not exactly sure how hard the query will be implemented, but I don't 
think it will be too hard. However, i have a concern - we add more 
conditions into query, the performance will be get worse.

Alternative way maybe is: query will get back packages with urls having 
both "download" and "information" attributes (which is same as we do now). 
However, when user drag a package from search resultset panel to canvas, 
an alert window will show up if the data distrubtion url in this eml 
package has the attribute value "information".

Any comment will be appreciated.

Thanks,

Jing


Jing Tao
National Center for Ecological
Analysis and Synthesis (NCEAS)
735 State St. Suite 204
Santa Barbara, CA 93101

On Tue, 18 Mar 2008, Matthew Jones wrote:

> I agree, I don't think it does handle this, but this is a bug in my opinion. 
> It should distinguish these URL types.  The intention of the "function" 
> attribute in EML was to handle exactly what Wade is trying to do, so Kepler 
> should look for it and only really try to parse and download data from 
> 'download' URLs.  If a "function" attribute has not been provided on the URL, 
> then maybe it should try to download it as well, but that is open to 
> discussion.  I've been looking for the query specification in Kepler -- but 
> to no avail.  Any idea how hard this would be to implement in our query, 
> Jing?
>
> Matt
>
> Jing Tao wrote:
>> Hi, Wade:
>> 
>> Base on my knowledge, I don't think kepler disguishes the "information" and 
>> "download" attributes. It will grab the content of the given url.
>> 
>> Hope this is helpful.
>> 
>> Jing
>> 
>> Jing Tao
>> National Center for Ecological
>> Analysis and Synthesis (NCEAS)
>> 735 State St. Suite 204
>> Santa Barbara, CA 93101
>> 
>> On Tue, 18 Mar 2008, Wade Sheldon wrote:
>> 
>>> Hi Matt,
>>> 
>>> I'm in the process of rolling out a new GCE website so I've been reviewing 
>>> and updating web application code for xml/xhtml compatibility, etc. As 
>>> part of this process I'm also making some minor changes to the GCE EML 
>>> implementation, including how data access urls are encoded for data sets 
>>> that aren't yet publicly downloadable. I just wanted to run these changes 
>>> by you to check for potential impact on Kepler users accessing our docs 
>>> via Metacat.
>>> 
>>> In our original implementation I omitted the 
>>> dataTable/physical/distribution node entirely for unreleased data sets, 
>>> but as a consequence users viewing an outdated metadata document would not 
>>> easily be able to find the data object after it becomes publicly 
>>> accessible. This is particularly an issue for the EcoTrends project, 
>>> because we're providing pre-release data and EML for the static web page 
>>> and book they are producing, and the legacy metadata will be retained and 
>>> potentially accessed in the future (i.e. outside of Metacat).
>>> 
>>> In the new implementation, I will still include direct pass-through links 
>>> to data objects in EML in Metacat for public data sets, but I will now 
>>> include urls for private datasets as well. These private data urls will 
>>> point to a web page that will either allow the user to register and 
>>> download the data after it is public, or will inform them of the private 
>>> status and allow them to fill out a form to request the data in advance of 
>>> the release date. In order to distinguish between these different 
>>> endpoints I am explicitly setting the distribution/online/url function 
>>> attribute to "download" or "information" as appropriate for data or a web 
>>> page.
>>> 
>>> My question for you is how does Kepler handle dataTable distribution urls 
>>> in EML with the function="information" attribute? Because I differentially 
>>> generate EML for Metacat I could revert to the old practice to prevent 
>>> problems, but I'd prefer to use the same approach for both GCE-centric and 
>>> KNB-centric metadata to prevent confusion.
>>> 
>>> Here's a link to an example document with the new implementation for a 
>>> private data set:
>>> http://gce-nas.marsci.uga.edu/public/app/send_eml.asp?detail=full&missing=NaN&delimiter=tab&metacat=yes&accession=INV-GCEM-0705c2 
>>> 
>>> Thanks in advance for any input.
>>> 
>>> -Wade Sheldon
>>> 
>>> 
>>> -- 
>>> ______________________________________________________________________________ 
>>> 
>>> Wade M. Sheldon
>>> GCE-LTER Information Manager/SIMO Database Administrator
>>> School of Marine Programs
>>> University of Georgia
>>> Athens, GA 30602-3636
>>> Email: sheldon at uga.edu
>>> WWW: 
>>> http://gce-lter.marsci.uga.edu/public/app/personnel_bios.asp?id=wsheldon
>>> 
>>> _______________________________________________
>>> Eml-dev mailing list
>>> Eml-dev at ecoinformatics.org
>>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>> 
>>> 
>> 
>
> -- 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Matthew B. Jones
> Director of Informatics Research and Development
> National Center for Ecological Analysis and Synthesis (NCEAS)
> UC Santa Barbara
> jones at nceas.ucsb.edu                       Ph: 1-907-523-1960
> http://www.nceas.ucsb.edu/ecoinfo
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>


More information about the Kepler-dev mailing list