[Kepler-dev] Introduction...

Wed Mar 30 14:16:55 PST 2005

Hi Patrick,

Your projects sound very interresting and in tune with several other 
projects that work around kepler.  Here's the answers to your questions:

 > Here are some things that I'm looking for from y'all:
 > 	* I'd love trial read-access to the Kepler CVS repository

I'll create you an account right now and email you with the details.

 > 	* I'd love to hear about types of data and data repositories
 > 		that people have on the web

Right now, we have a very wide assortment of data in varying forms.  The 
SPA project is working mainly on bioinformatics (genome) data.  The GEON 
project is all geological data with a lot of GIS.  The SEEK project has 
a repository of ecological species counts and niche modeling data.  As 
you may have noticed, kepler is a bunch of projects that all had the 
common need for workflow execution so you can't really lump all of our 
work into area, except that we all work together on the functionality of 
kepler, specializing it where we need it for our own projects.

 > 	* I'd love to hear what sort of criterion you use in your
 > 		work when deciding which datasets to use

Like I said before, because the data used in kepler are so heterogeneous 
in form and function, it's kind of hard to give a general answer to this 
question.  I can speak for the SEEK project (of which I am a member) and 
say that we have been working for about 8 years on solutions to 
incorporate all sorts of heterogeneous ecological data (using a metadata 
based approach).  We deal with everything from excel spreadsheets to 
oracle databases and text files.  Our goal is more to let scientists 
choose the data they want to use and we focus more on building the tools 
and making them user friendly enough for a domain scientist to use. 
I'll let the other project members jump in on what criterion they use.

We welcome contributions to our project.  If you develop actors or 
extensions that you think would be broadly useful to the project, 
someone can sponsor you for a trial write membership (so you an write to 
CVS).  We have the rules for membership spelled out on our website. In 
addition to this list, we also have an IRC server for kepler developers. 
  You can join with any free irc client (i think most of us use xchat 
(xchat.org)).  The server is irc.ecoinformatics.org and the channel is 
#kepler.  Most of us are on the west coast so we are on during the day here.

I had a few other comments about your projects.  Please see below.

Patrick Stein wrote:
> Hello all,
> 
> My name is Patrick Stein.  I work for the Laboratory for Imaging
> Algorithms and Systems (LIAS) [1] in the Rochester Institute of
> Technology's [2] Center for Imaging Science (CIS) [3].
> 
> We have done a great deal with scientific workflows for the NASA/USRA
> SOFIA project [4].  We are involved with several other projects, too.
> Of particular note at the moment is our Wildfire Airborne Sensor Program
> (WASP) [5].  We are also closely connected to the Digital Imaging and
> Remote Sensing (DIRS) [6] group in CIS.
> 
> Building upon the work that we did with SOFIA, we are starting design
> work for our WASP data and algorithm architecture.  We will have a  
> plethora
> of multispectral image data.  There are many different things that we  
> want
> to do with this data.  Additionally, much of this data will be  
> non-proprietary.
> 
> We want to make our datasets as available as possible.  But, we don't  
> want
> scientists (or even algorithm developers) to have to know the  
> nitty-gritty
> details about the way we have our repository organized if they just want
> to make use of some of our data.  Some of our high-level goals are:
> 	* find a way to provide access to data from our repository (and other
> 		online repositories) that is easy and as data-independent
> 		as possible
> 	* find a way to help people find all of the free data available...
> 		a sort of science-data specific Google, if you will...
> 	* help scientists leverage the knowledge that other scientists
> 		bring by helping track what data scientists found useful
> 		for what types of science and what algorithms scientists
> 		found useful for that data/science
> 	* track cross-data correlations... people very often use this
> 		terrain data with that sensor data and this weather data
> 		with that water-quality data....

Before SEEK, several of us were on a project called the Knowledge 
Network for Biocomplexity (KNB).  In that project, we developed tools to 
store, index and make available heterogeneous data.  The project was 
targeted at the ecological community but our tools are general enough to 
be used outside of that community.  As an extension to the KNB project, 
SEEK is building on those tools with semantically enhanced searching and 
automated data integration via ontological  references.  It's funny you 
said "science-data specific google" because I think we have used the 
same analogy in past meetings.  So, I think your goals and ours are 
pretty aligned and we could probably find ways to collaborate to meet them.

Thanks for all the info on your project.

Chad Berkley