<div dir="ltr">Hi Quentin:<br><br>Interesting question! There are several answers to this. <br><br>First, "knowing which actor was executing" is generally not enough to resume execution:<br>Consider a workflow executing with a PN (process network) director. Then all actors execute as independent processes (Java threads really), so all are executing simultaneously.<br>
In contrast, a (sub-)workflow executing under SDF or DDF will be executed within a single thread, so at most one actor is executing at a given time in such a workflow.<br>(SDF creates a schedule "statically", i.e., prior to workflow execution, while DDF figures out which actors are ready to fire at runtime, then selects one and repeats)<br>
<br>But what you really need is to maintain the "workflow state" (or some part of it) persistently, so that you can resume a stopped or failed workflow. <br>One general way to do this is checkpointing, i.e., writing relevant information out to disk at certain times. While checkpointing can be very costly in general applications, in scientific workflows it can often be easier to do so, since usually components are loosely coupled, all information flow is visible via the channels (unless you do some side-effects outside the model), and actors are often (but not always) stateless. <br>
<br>I'm aware of several extensions that allow one to resume Kepler workflows (I think Ptolemy might have further ways): <br><br>-- One system has been called "smart rerun" (e.g. Ilkay Altintas or Dan Crawl can point you to it) and allows you to rerun a workflow with modified inputs and/or parameter settings, avoiding to re-execute parts that are "unchanged". I don't recall whether it handles only successful workflow runs (and optimizes their re-execution under change) or also partial (aborted) runs.<br>
<br>-- Norbert Podhorszki has developed workflows where actors themselves write out to disk some small information (in his case: remote commands that successfully terminated) which is used upon re-running the workflow to only execute the commands not yet successfully completed previously. Call this the "custom checkpointing" solution (instead of a general system extension, individual actors or workflows decide what to checkpoint; more work, but it can be more efficient to know what is needed to rerun).<br>
<br>-- One new director and workflow programming model called COMAD makes visible most if not all of the execution state visible "on the wire" by streaming nested data collections between actors. Like in other approaches, the information on the wire can be written to disk and the workflow resumed based on this info.<br>
<br>All these approaches are based on record information during runtime on disk (sometimes called 'provenance information'), which is then used when resuming the workflow.<br><br>The above options are not the only ones (e.g. Ptolemy probably has additional ways to restart a failed model). Which variant to choose (or which new variant to develop) may depend on, among other things:<br>
-- the size of data flowing through channels (or the availability of persistent ids to large chunks of data)<br>-- whether actors are stateful or stateless<br>-- the director(s) programming/execution model being used<br><br>
Bertram<br><br><br><div class="gmail_quote">On Tue, Aug 26, 2008 at 2:22 AM, Quentin BEY <span dir="ltr"><<a href="mailto:quentin.bey@onera.fr">quentin.bey@onera.fr</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi all,<br>
<br>
Once again I need help about Kepler's possibilities.<br>
<br>
I wonder if we can stop a workflow, quit Kepler, then reopen Kepler and<br>
resume the workflow. For instance, a workflow which take long time to<br>
execute stops because the computer shutdown (for whatever reason we<br>
ignore), if we know which actor was executing is there a simple way to<br>
resume execution from this actor?<br>
<br>
<br>
Thanks in advance,<br>
<br>
<br>
Quentin BEY -ONERA- France<br>
<br>
_______________________________________________<br>
Kepler-users mailing list<br>
<a href="mailto:Kepler-users@ecoinformatics.org">Kepler-users@ecoinformatics.org</a><br>
<a href="http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-users" target="_blank">http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-users</a><br>
<br>
</blockquote></div><br></div>