[kepler-dev] Kepler/CORE project announcement

Wed Nov 7 19:26:55 PST 2007

All,

For those curious about the new project I mentioned to Ken this  
morning, I include the press release for Kepler/CORE below.  You can  
read a hypertext version and find links to more information at:

http://www.kepler-project.org/Wiki.jsp?page=KeplerCOREAnnouncement

Cheers,

Tim

NSF Award for Kepler/CORE to Accelerate Scientific Workflow Development

The Office of Cyberinfrastructure at the National Science Foundation  
has awarded $1.7M over three years to a team of researchers from UC  
Davis, UC Santa Barbara, and UC San Diego to develop Kepler/CORE, a  
Comprehensive, Open, Reliable, and Extensible Scientific Workflow  
Infrastructure.

In recent years, scientific workflow research and development has  
gained enormous momentum, driven by the needs of many scientific  
communities to more effectively manage and analyze their increasing  
amounts of data.

Whether scientists are piecing together our ancestors' tale through  
Assembling the Tree of Life (AToL, pPod, CIPRes), deciphering the  
workings of our biological machinery by chasing and identifying  
transcription factors (ChIP2), studying the effect of invasive  
species on biodiversity (SEEK), observing and modeling the atmosphere  
and oceans to simulate and understand effects of climate change on  
the environment (COMET, REAP), trying to understand and tame nuclear  
fusion through plasma edge simulations (CPES), or probing the nature  
and origins of the universe through observation of gravitational  
lensing or simulations of supernova explosions (Kepler-Astro),  
science has become increasingly data-driven, requiring considerable  
computation resources, access to diverse data, and integration of  
complex software tools. To address these challenges, these and many  
other projects have employed the Kepler scientific workflow system.

"Scientific workflows are the scientists' way to get more eScience  
done by effectively harnessing cyberinfrastructure such as data grids  
and compute clusters from their desktops", says Bertram Ludaescher,  
Associate Professor at the Dept. of Computer Science and the Genome  
Center at UC Davis, and principal investigator of Kepler/CORE.

Scientific workflows start where script-based data-management  
solutions leave off. Like scripts, workflows can automate otherwise  
tedious and error-prone data-management and application-integration  
tasks. However, unlike custom scripts, scientific workflows can be  
more easily shared, reused, and adapted to new domains. Many  
scientific workflow systems also provide 'parallelism for free'. The  
Kepler system natively supports both assembly-line like 'pipeline- 
parallelism', as well as 'task-parallelism' that enables multiple  
pipelines of tasks to operate concurrently. And unlike script-writers  
who must explicitly fork processes, manage queues, and worry about  
synchronizing multiple operations, Kepler users can let the workflow  
system schedule parallel tasks automatically. Other advantages over  
scripts include built-in support for tracking data lineage or  
'provenance', which allows scientists to better interpret their  
analysis results, re-run workflows with varying parameter settings  
and data bindings, or simply debug or confirm 'strange' results.

"When we started Kepler a few years back as a grass-roots  
collaboration between the SEEK and SDM/SPA projects, we did not fully  
anticipate the broad interest scientific workflows would create",  
says co-PI Matt Jones, from the National Center for Ecological  
Analysis and Synthesis at UC Santa Barbara, adding, "The different  
groups in the Kepler community are pushing various extensions to the  
base system functionality, so it is now a perfect time to move Kepler  
from a research prototype to a reliable and easily extensible system."

Timothy McPhillips, co-PI at the UC Davis Genome Center, and chief  
software architect for Kepler/CORE adds, "To serve the target user  
communities, the system must be independently extensible by groups  
not directly collaborating with the team that develops and maintains  
the Kepler/CORE system. Facilitating extension in turn requires that  
the Kepler architecture be open and that the mechanisms and  
interfaces provided for developing extensions be well designed and  
clearly articulated."

Kepler/CORE development is informed and driven by various  
stakeholders, those projects and individuals who employ Kepler and  
wish to extend or otherwise improve the system for their specific  
needs. The inclusion of stakeholders in the steering of the overall  
collaboration aims at a more comprehensive and sustainable approach  
for future Kepler extensions.

"For Kepler to be seen as a viable starting point for developing  
workflow-oriented applications, and as middleware for developing user- 
oriented scientific applications, Kepler must be reliable both as a  
development platform and as a run-time environment for the user."  
says Ilkay Altintas, Kepler/CORE co-PI at the San Diego Supercomputer  
Center at UC San Diego.

While Kepler/CORE is primarily a software engineering project, many  
interesting computer science research problems are emerging from the  
application of scientific workflows: "As a computer scientist it is  
fascinating to see how real-world scientific-workflow problems-- 
workflow design, analysis, and optimization for example--lend  
themselves to exciting research problems in computer science,  
spanning the areas of databases, distributed and parallel computing,  
and programming languages", says Ludaescher.

Shawn Bowers, co-PI and computer scientist at the UC Davis Genome  
Center adds, "Scientific-workflow systems such as Kepler provide an  
opportunity to make scientific results more transparent and  
reproducible by capturing their provenance. Enhancing scientific  
workflows in this way we can dramatically improve the usability of  
scientific results for scientists and the broader public."