[kepler-dev] versioned workflows
jones at nceas.ucsb.edu
Tue Apr 15 10:34:53 PDT 2008
A few addendums to Bertram's response...
Bertram Ludaescher wrote:
> Hi Paul:
> You're raising important issues (and ones that have come up repeatedly).
> I'd like to mention only a few aspects, and just briefly for now:
> First, in Kepler you can use Kepler Archive Files (KAR files) to create
> self-contained versions of Kepler workflows. The use of such
> self-contained archive files can give you a "snapshot" version of a
> workflow (and in a sense "immunizes" you against evolving versions of
> actors). Alternatively, you can choose to use the current version of
KAR files are the mechanism we chose to archive both actors and
workflows. We envisioned two types of KAR files, so-called 'package'
KAR files and 'archive' KAR files.
Package KAR files are designed to encapsulate a workflow component and
all dependencies for execution (such as jar files, native libraries,
etc) and give it an explicit version via an LSID. This allows component
developers to distribute specific versions of actors by multiple
mechanisms, including by uploading to the Kepler Repository. This is
'Archive' KAR files are designed to encapsulate the versioned workflow
and all actors and data needed to reproduce a workflow run. Thus, an
archive file would contain copies of all KAR files for components used
(or references to those components) and copies of all data used (or
references to those data). If references are used, they must be
persistently available, so at least initially we envisioned that Archive
KAR files would represent a 'deep copy' of the workflow, to ensure that
all executable components are available. This functionality is not
present in Kepler, although the infrastructure for it is in place -- we
just need to create the 'archive' function to wrap everything together
and is one of the top work items for the REAP project.
> Overall, as I see it, the problem contains the problem of software
> configuration management as a special case and thus can be tricky to say
> the least ...
Agreed...tricky to say the least.
> Also, as far as I recall, LSIDs are used for identifying actors and
> workflows, but I'm not sure whether a versioning feature is used as well
> (here is some earlier info on KAR files):
We do support versioning, via LSIDs. The hard part in my mind is making
sure that component authors are religious about versioning their actor
changes. Thus far, in the pre-1.0.0 release situation, we have allowed
actor authors to change their actor signatures and implementations
without revising the LSID. Post 1.0.0 this would not be recommended, as
it would break old workflows that incorporated early versions of that
actor. Thus, 1.0.0 represents a watershed in versioning. Separating
actors from the Kepler CORE will be very positive as it will allow
actors to be versioned independently of the core release.
Note that the wiki page Bertram points to is a little old but still very
relevant. It says we haven't handled the 'dynamic class loading', but
in fact we have partially solved this problem now, so jar files included
inthe KAR file are indeed loaded at runtime, so a complete actor with
its jar dependencies can be archived in a KAR. The major items left to
finish are to 1) manage versioing conflicts when different actors need
to load conflicting jar files, and 2) support native code dependencies,
such as actors that wrap native simulation models or other C code that
have library dependencies and platform-specific runtime dependiencies.
We have developed further requirements and plans for the REAP project in
> Conceptually, it might be helpful to distinguish between a static
> snapshot/archival version of a workflow, where the goal might be
> reproducibility, and an "evolving workflow" where the user's goal is to
> (mostly) use the current versions of actors.
Yep. Thus our different 'types' of KAR files.
> The problem becomes even more interesting when considering that not only
> workflows evolve, but also the data that is associated with particular
> workflow runs. Sometimes data is implicitly referenced via remote
> queries and services (say via a remote Blast).
> In the general case, the functionality of a workflow thus can depend
> also on snapshots of external entities. When recording provenance
> information, such dependencies can be captured and can, in principle, be
> made part of an archive as well.
KAR files can contain data. They can also contain references to data
that is archived, such as data in the KNB archive. Once Kepler
'Archive' files are implemented, we should be able to save and load the
data in a KAR file in order to reproduce a run. Data have an LSID too,
so the data can be versioned as well.
> The areas data provenance (~ data lineage and processing history) and
> workflow evolution (aka workflow provenance) are active areas of
> research and development, in Kepler, as well as in several other projects.
Very active indeed.
Paul -- if you have interest in working with us on this, we'd welcome
your input. Chad's been leading this effort for us on our end, and will
be picking it back up in a couple of months for REAP.
> So much for now...
> On Tue, Apr 15, 2008 at 5:50 AM, Paul Allen <pea1 at cornell.edu
> <mailto:pea1 at cornell.edu>> wrote:
> Hello all,
> I'm wondering if there have been any thoughts about the versioning of
> workflows that reside in a repository. The idea would be to make sure
> that, if a workflow from a repository is referenced externally, it will
> always work in a manner similar (and produce similar output) as when it
> was referenced. I think that this is important if people are sharing
> workflows, yet those workflows continue to be improved or updated.
> I'm not sure if versioning workflows implies that actors are also
> Has anybody thought about this?
> Kepler-dev mailing list
> Kepler-dev at ecoinformatics.org <mailto:Kepler-dev at ecoinformatics.org>
> Kepler-dev mailing list
> Kepler-dev at ecoinformatics.org
Matthew B. Jones
Director of Informatics Research and Development
National Center for Ecological Analysis and Synthesis (NCEAS)
UC Santa Barbara
jones at nceas.ucsb.edu Ph: 1-907-523-1960
More information about the Kepler-dev