[seek-dev] Re: enm pipeline

Matt Jones jones at nceas.ucsb.edu
Tue Sep 14 12:39:48 PDT 2004


Hi Deana,

Thanks.  We're going to need some sample data to test with well before 
Oct 15th (the scheduled deadline for Kepler release), but by no means 
all of the data.  Once the Kepler pipeline is able to retrieve the data 
layers from the EcoGrid we should then be able to add arbitrary env data 
layers to the EcoGrid and have them be accessible.   So you and others 
can keep adding layers until the Dec workshop.

Thanks for the reminder about the IPCC conversion to GIS layers.  Could 
you describe what needs to happen there more fully -- I didn't see that 
in the current ENM pipeline, so we probably need to develop a pipeline 
for it.  It may be another thing we could ask Jianting to work on -- he 
already agreed to work on finishing up the GIS actors for Kepler.  Thanks.

Matt

Deana Pennington wrote:
> I can work on the data, but not until Sep 27.  If I can figure out how 
> to do templates in EML, it should go pretty quickly.
> 
> I think you have forgotten the pipeline that converts the IPCC climate 
> data to gis layers.
> 
> Deana
> 
> 
> Matt Jones wrote:
> 
>> Hi Deana,
>>
>> We had the conference call on the ENM pipeline in Kepler this morning. 
>> Its amazing how much stuff still remains to be done.  Our notes from 
>> the call are on the SEEK web site, including a list of action items to 
>> get the ENM pipeline done:
>>
>> http://seek.ecoinformatics.org/Wiki.jsp?page=ENMPipelineConferenceCall14Sep2004 
>>
>>
>> One of the items was tenatively assigned to you -- if you're willing. 
>> We need someone to coordinate getting the environmental data layers 
>> into an EcoGrid node and documented with EML metadata.  Then they will 
>> be pulled into the pipelines as needed.
>>
>> Interestingly, because the ENM pipleline will have so many runs 
>> (500,000) it will be important to be able to distribute the load -- so 
>> it looks like we might be doing some of this stuff on multiple 
>> machines.  Its gonna be a tough challenge, espacially because data 
>> transfer for the environmental data layers will be a significant 
>> bottleneck.  Ricardo estimates that one species (500 GARP runs) will 
>> take between 6-24 hours, dependging on which env layers are used.  So, 
>> unless we distribute it, we're looking at 1000 days to run this thing, 
>> obviously unacceptable. So the tradeoff between distributing the 
>> computation and moving the data is not an easy one to make. But we'll 
>> try to work it out.
>>
>> Could you look over the notes and let me know what you think?  Thanks,
>>
>> Matt
> 
> 
> 

-- 
-------------------------------------------------------------------
Matt Jones                                     jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Fax: 425-920-2439    Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
University of California Santa Barbara
Interested in ecological informatics? http://www.ecoinformatics.org
-------------------------------------------------------------------



More information about the Seek-dev mailing list