[seek-dev] data

Deana Pennington dpennington at lternet.edu
Mon Mar 29 07:50:57 PST 2004


Matt,

I'll plan on downloading the data from the IPCC website and creating eml 
for them.  I need a place to put them on the SRB.  They are not LTER 
data, and should not go in the LTER SRB directory.  I'll do the first 
set, but the data will be maintained by Town Peterson and the ENM 
community. 

We will have to do some programming to get the data into the first actor 
in Chad's pipeline.  Someone already has a java program that might do 
part of it...he has agreed to share his code.  As soon as I get his 
source and figure out what else is needed, I'll let you know. 

Deana


Matt Jones wrote:

> Hi Shawn,
>
> Those requirements have not been formally defined.  To use data 
> generally within the context of SEEK and Kepler, we almost certainly 
> need a good metadata description in order to interpret the data 
> correctly.  I think it makes sense for that to be an EML description, 
> or maybe EML with some SEEK extensions (e.g., for semantic labeling).  
> EML of course is quite loose about which metadata are required.  If 
> someone omits the physical and logical descriptions of the data, it 
> would be hard to build automated ingestion tools.  Our work so far in 
> Kepler for automatically ingesting arbitrary data sources is that they 
> have a complete entity/attribute description in EML.  There certainly 
> other ways to provide this information that I would not want to rule 
> out as options, but I think the EML route is a sensible one for SEEK.
>
> One of the other Kepler developers (Efrat from GEON) created a data 
> ingestion actor based on JDBC.  You provide an endpoint and a SQL 
> query as input and it exposes the records as output.  This is another 
> way of getting data into Kepler.  Its problem is that there is no 
> formal relationship between the SQL query and the datatypes of the 
> output port(s).  I think we could consolidate some of this code with 
> other code (such as Chad's EML ingestion actor) and come up with a 
> more general approach that is extensible.  Here's what I've been 
> thinking...
>
> Using data in Kepler involves 1) transporting the data to the machine, 
> 2) filtering the data to produce a subset (potentially ona remote 
> machine before (1)), and 3) exposing the resulting data as 
> strongly-typed ports in Kepler.  The first (1) is accomplished now 
> through jdbc, file system access, grid access, and (soon) ecogrid 
> access.  The second (2) is part of the proposal we've made for ecogrid 
> access (a generic means of expressing filter conditions) and is part 
> of Efrat's jdbc actor (via sql).  The third (3) is currently handled 
> by Chad's (somewhat incomplete) EML ingestion actor, although I think 
> it could be generalized to support other metadata sources as well.  We 
> (the EcoGrid team) will be continuing to explore these issues in more 
> detail as Jing and Rod continue working on incorporating the EcoGrid 
> client into Kepler.  Comments appreciated, especially on the proposed 
> data access changes to Kepler (see kepler/docs/dev/screenshots and 
> kepler/docs/dev/EcoGrid* in CVS).
>
> Cheers,
> Matt
>
> Shawn Bowers wrote:
>
>>
>> Out of curiosity, what exactly is the "SEEK requirement" for dataset 
>> use in analytical pipelines. For example, your email below seems to 
>> suggest detailed EML metadata and placement in a catalog service 
>> (EcoGrid), which involves placement in a SEEK-aware catalog system 
>> (Metacat or SRB, e.g.) that I am assuming are (or will be eventually) 
>> curated.
>>
>> Are there other requirements? Are these or additional requirements 
>> captured somewhere?
>>
>> Shawn
>>
>>
>>
>> Matt Jones wrote:
>>
>>> Deana,
>>>
>>> I took a look at the site containing the data.  In order to get it 
>>> into EcoGrid reasonably, we really should develop some EML metadata 
>>> descriptions of the products you are interested in from that site.  
>>> I'm not sure how much work that would be -- depends on how complex 
>>> and variable the different data sources are.  Once we have an EML 
>>> description of each source, we can add them to the EcoGrid 
>>> (currently that means manually adding the EML and the data to one of 
>>> the EcoGrid systems).  My guess is that Metacat and SRB could be 
>>> used for this one, but DiGIR is probably not appropriate for this 
>>> data type.  Jing is working on putting EcoGrid access capabilities 
>>> into Kepler, so once the data sets are accessible in EcoGrid you 
>>> should be able to use them in Kepler in the workflow Chad is 
>>> developing.
>>>
>>> Matt
>>>
>>> Deana Pennington wrote:
>>>
>>>> At the BEAM/AMS/KR meeting in early Feb, we designed a first 
>>>> application for the ecological niche modelling community, that 
>>>> involves analyzing the effect of various modeled climate change 
>>>> scenarios on mammal populations.  To do the analysis, we need to 
>>>> use climate data from the following site:
>>>>
>>>> IPCC climate change:    http://ipcc-ddc.cru.uea.ac.uk/
>>>>
>>>> There will be other sites as well; I'll let you know when I find 
>>>> out what they are.  We will need to either set these up as nodes on 
>>>> the EcoGrid, or mirror the sites on one of our nodes.  Could 
>>>> someone please take a look at this site, and let me know if that is 
>>>> possible any time in the near future?  I am currently trying to 
>>>> figure out exactly which data are needed, and what we will have to 
>>>> do to them to get them into the workflow Chad is constructing.
>>>> Thanks,
>>>> Deana
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> seek-dev mailing list
>>> seek-dev at ecoinformatics.org
>>> http://www.ecoinformatics.org/mailman/listinfo/seek-dev
>>
>>
>>
>> _______________________________________________
>> seek-dev mailing list
>> seek-dev at ecoinformatics.org
>> http://www.ecoinformatics.org/mailman/listinfo/seek-dev
>
>

-- 
********

Deana D. Pennington, PhD
Long-term Ecological Research Network Office

UNM Biology Department
MSC03  2020
1 University of New Mexico
Albuquerque, NM  87131-0001

505-272-7288 (office)
505 272-7080 (fax)





More information about the Seek-dev mailing list