[kepler-dev] versioned workflows
ludaesch at ucdavis.edu
Tue Apr 15 06:36:42 PDT 2008
You're raising important issues (and ones that have come up repeatedly).
I'd like to mention only a few aspects, and just briefly for now:
First, in Kepler you can use Kepler Archive Files (KAR files) to create
self-contained versions of Kepler workflows. The use of such self-contained
archive files can give you a "snapshot" version of a workflow (and in a
sense "immunizes" you against evolving versions of actors). Alternatively,
you can choose to use the current version of actors.
Overall, as I see it, the problem contains the problem of software
configuration management as a special case and thus can be tricky to say the
Also, as far as I recall, LSIDs are used for identifying actors and
workflows, but I'm not sure whether a versioning feature is used as well
(here is some earlier info on KAR files):
Conceptually, it might be helpful to distinguish between a static
snapshot/archival version of a workflow, where the goal might be
reproducibility, and an "evolving workflow" where the user's goal is to
(mostly) use the current versions of actors.
The problem becomes even more interesting when considering that not only
workflows evolve, but also the data that is associated with particular
workflow runs. Sometimes data is implicitly referenced via remote queries
and services (say via a remote Blast).
In the general case, the functionality of a workflow thus can depend also on
snapshots of external entities. When recording provenance information, such
dependencies can be captured and can, in principle, be made part of an
archive as well.
The areas data provenance (~ data lineage and processing history) and
workflow evolution (aka workflow provenance) are active areas of research
and development, in Kepler, as well as in several other projects.
So much for now...
On Tue, Apr 15, 2008 at 5:50 AM, Paul Allen <pea1 at cornell.edu> wrote:
> Hello all,
> I'm wondering if there have been any thoughts about the versioning of
> workflows that reside in a repository. The idea would be to make sure
> that, if a workflow from a repository is referenced externally, it will
> always work in a manner similar (and produce similar output) as when it
> was referenced. I think that this is important if people are sharing
> workflows, yet those workflows continue to be improved or updated.
> I'm not sure if versioning workflows implies that actors are also
> Has anybody thought about this?
> Kepler-dev mailing list
> Kepler-dev at ecoinformatics.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Kepler-dev