[kepler-dev] versioned workflows

Claudio Silva csilva at cs.utah.edu
Tue Apr 15 09:26:12 PDT 2008

Bertram (and others),


We're doing research on how to reconstruct workflow evolution from  
workflow snapshots, and it would be nice to try it out on large  
workflow collections. If anyone on the list has kept MoML files on  
version control (cvs, svn, whatever), and would not mind giving us  
their collection, we would really appreciate. This will help us harden  
our tool and techniques. (And yes, we have already tested our  
algorithms with Kepler workflows -- thinking of it,  everything should  
also work on Ptolemy's workflows, although I do not think we've tried.)

Btw, we wrote a short survey of provenance for computational tasks  
with non-experts in mind. This might be useful to Paul, and other  
people just getting interested in this area:

Provenance for Computational Tasks: A Survey
Freire, Juliana; Koop, David; Santos, Emanuele; Silva, Cláudio T.
Page(s): 11-21
Volume 10; Issue 3
Computational Science & Engineering, IEEE

A draft version is available from Juliana Freire's webpage, or you can  
get the official version from IEEE Xplore:




On Apr 15, 2008, at 7:36 AM, Bertram Ludaescher wrote:
> Hi Paul:
> You're raising important issues (and ones that have come up  
> repeatedly).
> I'd like to mention only a few aspects, and just briefly for now:
> First, in Kepler you can use Kepler Archive Files (KAR files) to  
> create self-contained versions of Kepler workflows. The use of such  
> self-contained archive files can give you a "snapshot" version of a  
> workflow (and in a sense "immunizes" you against evolving versions  
> of actors). Alternatively, you can choose to use the current version  
> of actors.
> Overall, as I see it, the problem contains the problem of software  
> configuration management as a special case and thus can be tricky to  
> say the least ...
> Also, as far as I recall, LSIDs are used for identifying actors and  
> workflows, but I'm not sure whether a versioning feature is used as  
> well (here is some earlier info on KAR files): http://kepler-project.org/Wiki.jsp?page=KeplerObjectManager
> Conceptually, it might be helpful to distinguish between a static  
> snapshot/archival version of a workflow, where the goal might be  
> reproducibility, and an "evolving workflow" where the user's goal is  
> to (mostly) use the current versions of actors.
> The problem becomes even more interesting when considering that not  
> only workflows evolve, but also the data that is associated with  
> particular workflow runs. Sometimes data is implicitly referenced  
> via remote queries and services (say via a remote Blast).
> In the general case, the functionality of a workflow thus can depend  
> also on snapshots of external entities. When recording provenance  
> information, such dependencies can be captured and can, in  
> principle, be made part of an archive as well.
> The areas data provenance (~ data lineage and processing history)  
> and workflow evolution (aka workflow provenance) are active areas of  
> research and development, in Kepler, as well as in several other  
> projects.
> So much for now...
> Bertram
> On Tue, Apr 15, 2008 at 5:50 AM, Paul Allen <pea1 at cornell.edu> wrote:
> Hello all,
> I'm wondering if there have been any thoughts about the versioning of
> workflows that reside in a repository. The idea would be to make sure
> that, if a workflow from a repository is referenced externally, it  
> will
> always work in a manner similar (and produce similar output) as when  
> it
> was referenced. I think that this is important if people are sharing
> workflows, yet those workflows continue to be improved or updated.
> I'm not sure if versioning workflows implies that actors are also  
> versioned.
> Has anybody thought about this?
> Thanks,
> -Paul
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev

More information about the Kepler-dev mailing list