[kepler-dev] Kepler/CORE project announcement
Timothy McPhillips
tmcphillips at mac.com
Wed Nov 7 19:26:55 PST 2007
All,
For those curious about the new project I mentioned to Ken this
morning, I include the press release for Kepler/CORE below. You can
read a hypertext version and find links to more information at:
http://www.kepler-project.org/Wiki.jsp?page=KeplerCOREAnnouncement
Cheers,
Tim
NSF Award for Kepler/CORE to Accelerate Scientific Workflow Development
The Office of Cyberinfrastructure at the National Science Foundation
has awarded $1.7M over three years to a team of researchers from UC
Davis, UC Santa Barbara, and UC San Diego to develop Kepler/CORE, a
Comprehensive, Open, Reliable, and Extensible Scientific Workflow
Infrastructure.
In recent years, scientific workflow research and development has
gained enormous momentum, driven by the needs of many scientific
communities to more effectively manage and analyze their increasing
amounts of data.
Whether scientists are piecing together our ancestors' tale through
Assembling the Tree of Life (AToL, pPod, CIPRes), deciphering the
workings of our biological machinery by chasing and identifying
transcription factors (ChIP2), studying the effect of invasive
species on biodiversity (SEEK), observing and modeling the atmosphere
and oceans to simulate and understand effects of climate change on
the environment (COMET, REAP), trying to understand and tame nuclear
fusion through plasma edge simulations (CPES), or probing the nature
and origins of the universe through observation of gravitational
lensing or simulations of supernova explosions (Kepler-Astro),
science has become increasingly data-driven, requiring considerable
computation resources, access to diverse data, and integration of
complex software tools. To address these challenges, these and many
other projects have employed the Kepler scientific workflow system.
"Scientific workflows are the scientists' way to get more eScience
done by effectively harnessing cyberinfrastructure such as data grids
and compute clusters from their desktops", says Bertram Ludaescher,
Associate Professor at the Dept. of Computer Science and the Genome
Center at UC Davis, and principal investigator of Kepler/CORE.
Scientific workflows start where script-based data-management
solutions leave off. Like scripts, workflows can automate otherwise
tedious and error-prone data-management and application-integration
tasks. However, unlike custom scripts, scientific workflows can be
more easily shared, reused, and adapted to new domains. Many
scientific workflow systems also provide 'parallelism for free'. The
Kepler system natively supports both assembly-line like 'pipeline-
parallelism', as well as 'task-parallelism' that enables multiple
pipelines of tasks to operate concurrently. And unlike script-writers
who must explicitly fork processes, manage queues, and worry about
synchronizing multiple operations, Kepler users can let the workflow
system schedule parallel tasks automatically. Other advantages over
scripts include built-in support for tracking data lineage or
'provenance', which allows scientists to better interpret their
analysis results, re-run workflows with varying parameter settings
and data bindings, or simply debug or confirm 'strange' results.
"When we started Kepler a few years back as a grass-roots
collaboration between the SEEK and SDM/SPA projects, we did not fully
anticipate the broad interest scientific workflows would create",
says co-PI Matt Jones, from the National Center for Ecological
Analysis and Synthesis at UC Santa Barbara, adding, "The different
groups in the Kepler community are pushing various extensions to the
base system functionality, so it is now a perfect time to move Kepler
from a research prototype to a reliable and easily extensible system."
Timothy McPhillips, co-PI at the UC Davis Genome Center, and chief
software architect for Kepler/CORE adds, "To serve the target user
communities, the system must be independently extensible by groups
not directly collaborating with the team that develops and maintains
the Kepler/CORE system. Facilitating extension in turn requires that
the Kepler architecture be open and that the mechanisms and
interfaces provided for developing extensions be well designed and
clearly articulated."
Kepler/CORE development is informed and driven by various
stakeholders, those projects and individuals who employ Kepler and
wish to extend or otherwise improve the system for their specific
needs. The inclusion of stakeholders in the steering of the overall
collaboration aims at a more comprehensive and sustainable approach
for future Kepler extensions.
"For Kepler to be seen as a viable starting point for developing
workflow-oriented applications, and as middleware for developing user-
oriented scientific applications, Kepler must be reliable both as a
development platform and as a run-time environment for the user."
says Ilkay Altintas, Kepler/CORE co-PI at the San Diego Supercomputer
Center at UC San Diego.
While Kepler/CORE is primarily a software engineering project, many
interesting computer science research problems are emerging from the
application of scientific workflows: "As a computer scientist it is
fascinating to see how real-world scientific-workflow problems--
workflow design, analysis, and optimization for example--lend
themselves to exciting research problems in computer science,
spanning the areas of databases, distributed and parallel computing,
and programming languages", says Ludaescher.
Shawn Bowers, co-PI and computer scientist at the UC Davis Genome
Center adds, "Scientific-workflow systems such as Kepler provide an
opportunity to make scientific results more transparent and
reproducible by capturing their provenance. Enhancing scientific
workflows in this way we can dramatically improve the usability of
scientific results for scientists and the broader public."
More information about the Kepler-dev
mailing list