[kepler-dev] KARs and module dependencies

Matt Jones jones at nceas.ucsb.edu
Wed Aug 4 11:50:44 PDT 2010


I'm also in agreement that we need to provide a migration path forward.
 Derik and I discussed this, and we proposed to modify the original option
#1, so that now we would move to saving in KAR formats solely, but the
Kepler system would still be able to open XML files and execute XML files
from the command line.  We discussed this option today at the leadership
team meeting, and had support for this direction.  This will allow us to
utilize the KAR format more thoroughly in module development, but to still
allow batch processors that use the command line system to take advantage of
the XML format -- they, of course, will have to work out module dependencies
on their own because the XML files do not contain this information or other
information in that is present in KAR files.  This strategy both allows for
backwards compatibility and gives us a way forward with KAR development by
module authors.

Matt

On Wed, Aug 4, 2010 at 4:37 AM, Ilkay Altintas <altintas at sdsc.edu> wrote:

> Derik,
>
> I agree with Paul that if we go with option#1, we need to provide a utility
> to migrate the old files into the new kar. It is also not the most
> community-friendly approach.
>
> I don't yet have a string opinion since I don't know the cost of trying to
> go with option#2. Why is it unclear or hard to analyze the amount of work it
> requires?
>
> Thanks for looking into this. It is an important bug!
> -ilkay
>
> --
> Ilkay ALTINTAS
> Deputy Coordinator for Research, San Diego Supercomputer Center (SDSC)
> Lab Director, Scientific Workflow Automation Technologies (SWAT @ SDSC)
>
> University of California, San Diego
> 9500 Gilman Drive, MC: 0505  La Jolla, CA  92093-0505
> Phone: (858) 210-5877                     Fax: (858) 534-8303
> Web: http://users.sdsc.edu/~altintas
> Skype: ilkay.altintas
>
>
>
>
>
>
> On Aug 4, 2010, at 5:20 AM, Paul Edward Allen wrote:
>
> I am very much in favor of Kepler continuing to support reading/saving
> workflows as plain XML files. There are thousands of Kepler workflows out
> there that are plain XML, and at the very least you'd have to come up with a
> mechanism/utility to convert those to KAR files if the decision was made for
> Kepler not to support plain XML.
>
> Please remember those of us who are using non-Java GUIs to create
> workflows. Thanks.
>
> -Paul
>
> -----Original Message-----
>
> From: kepler-dev-bounces at kepler-project.org [mailto:kepler-dev-<kepler-dev->
>
> bounces at kepler-project.org] On Behalf Of Derik Barseghian
>
> Sent: Monday, August 02, 2010 8:16 PM
>
> To: Kepler Developers
>
> Subject: Re: [kepler-dev] KARs and module dependencies
>
>
> Hi all,
>
>
> I've implemented most of what was discussed below. However, this
>
> solution doesn't cover the case of workflows saved as just xml -- i.e.
>
> not in a KAR w/ a manifest that lists module-dependencies. A workflow
>
> can be created in a suite that add entries to the moml (like yourActor
>
> module, or reporting or provenance), and when you attempt to open this
>
> workflow in vanilla, you get NPEs re: elements missing, instead of a
>
> prompt asking you to download the missing modules.
>
>
> Two solutions come to mind:
>
> 1) No longer allow saving and opening workflows as xml, always save to
>
> a KAR
>
> pro: Simplifies our GUI wrt saving and opening.
>
> con: Negatively affects those currently using plain workflow files.
>
> Would probably require a utility to create a KAR from a workflow. At
>
> least some refactoring required to remove options from command line
>
> that utilize plain workflow files.
>
>
> 2) Move or keep a copy of module-dependencies in the workflow itself,
>
> and refactor to check these before actually attempting to open the
>
> workflow.
>
> pro: things continue to work similarly, just additional messages when
>
> you lack modules.
>
> con: It's not clear to me yet how much additional work this
>
> represents
>
>
> Please let me know your thoughts and additional pros/cons as you see
>
> them.
>
> Thanks,
>
> Derik
>
>
> On Jul 27, 2010, at 12:42 AM, David Welker wrote:
>
>
> Hi Derik,
>
>
> I meant to send you this earlier, but bad weather prevented the
>
> satelite-based internet connection that I am using from working.
>
>
> The way I see it, there are two major use cases here: (1) The user
>
> wants to replicate someone else's work (more rare, but useful,
>
> especially to compare someone elses algorithm with one's own) or (2)
>
> the user is primarily interested in developing a workflow for their
>
> own research (the more common case).
>
>
> There are at least two observations to be made: (1) To be assured of
>
> proper replication, it would be extremely helpful to have the exact
>
> module list under which the workflow was actually run, including
>
> precise version numbers and (2) for development and extension,
>
> users do not really care what module provides the necessary services
>
> for the workflow; they only care that the necessary services are
>
> provided from some module. Of course, the same user can be
>
> interested in both replication and development, but in that case,
>
> they would only be interested in specific module information while
>
> they are concerned with replication but would not otherwise care
>
> about that or desire to be limited by that during development.
>
>
> These observations have a couple of coceptual implications for the
>
> question of versioning workflows. Most fundamentally, a workflow
>
> developer will not be aware of the interests and needs of future
>
> users of their workflow. Therefore, they would not be in a good
>
> position to specify things such as a range of "compatible" module
>
> versions that are necessary. They do not know whether future users
>
> wish to engage in replication with the same and/or different data or
>
> whether they wish to engage in further development if the algorithm
>
> embedded in the workflow. Second, as has been mentioned above, when
>
> extended development of the workflow algorithm is contemplated,
>
> future users do not care which modules provide the services their
>
> workflow needs nor do they want to be limited by a specification
>
> that suggests that a particular module (or range of versions) is
>
> necessary. Not only us a workflow developer not in a good position
>
> to assert that a particular module (or range of versions) is
>
> necessary, it would be an error to even attempt to do do. When it
>
> comes to future development, there is no such thing as a necessary
>
> module.
>
>
> These observations also haves a few possible implementation
>
> implications. First, whenever a KAR is saved, a list of precise
>
> module versions should be saved right along with it. There should be
>
> a corresponding menu option called "retrieve modules for
>
> replication" that will be available for that workflow when it is
>
> later loaded. This option should be grayed out for older workflows
>
> that did not save precise version information. Second, the concept
>
> of different levels of "strictness of compliance" should be ditched.
>
> Either the user is concerned about replication, which requires
>
> precise version information to proceed with full confidence, or
>
> specific modules do not matter at all. In other words, there should
>
> be only one level of compliance for replication -- very strict.
>
> Third, in the future, in addition to a precise module list, we
>
> should save a list of services used by the workflow. This would
>
> provide developers extending an existing workflow with guidance
>
> concerning what services the workflow needs to run (but of course
>
> what services are really necessary and what constitutes a "working"
>
> workflow is something that only future developers can say with
>
> authority). Kepler does not currently support services, but I think
>
> we should provide such support in the near future.
>
>
> The bottom-line implications I see for your current plan is that I
>
> do not think a workflow/KAR should ever REQUIRE a level of
>
> strictness. Instead, if precise information is available, there
>
> should be a replication OPTION. That option should be completely
>
> strict about versions retrieved. On the other hand, If the user does
>
> not actively indicate a desire to replicate, the workflow should try
>
> to open without warnings to the user. That is all that I think
>
> should be done for now.
>
>
> Later on, if we provide explicit support for the concept of
>
> services, we will be in a position to warn the user if a service we
>
> think is probably necessary (but who are we to say for sure?) does
>
> not appear to be provided by any of the modules that are currently
>
> active. But for now, without support for services, I do not think
>
> any warnings are in order.
>
>
> I hope this perspective is helpful. Sorry for the delay, but the
>
> weather ruined my Internet connection and I was loathe to compose
>
> such an involved email using my iPhone. But I have finally bit the
>
> bullet, so here it is. Of course, as the future user of the
>
> "service" provided by this email, only you are in a position to
>
> determine whether the ideas expressed herein are useful to you. At
>
> least if the principle regarding who should have authority to
>
> determine what services are necessary and/or useful (future
>
> developers) is correct. =)
>
>
> Sent from my iPhone
>
>
> On Jul 23, 2010, at 8:40 PM, Derik Barseghian <barseghian at nceas.ucsb.edu
>
> wrote:
>
>
> Thanks very much Ben and Dan for the discussions on this topic.
>
>
> What Ben lists is essentially my plan, and I'm moving forward with
>
> implementation. To summarize some additional points:
>
>
> - This plan requires module repositories maintain all published
>
> versions of a module.
>
> I will:
>
> - Include the complete "currently active" module list (with version
>
> numbers) in the module-dependencies KAR attribute.
>
> - Eliminate the dependsOnModule KAR entry attribute, since it will
>
> no longer be used, updating the KAR version, schema, etc in the
>
> process.
>
> - Remove any code that was inserting a dependsOnModules value into
>
> the workflow.
>
> - Create a new user preference for the strictness of KAR
>
> compliance. I'm still determining the different modes. But e.g. if
>
> the user is set to 'very strict', in order to open a KAR, they will
>
> be prompted to import and restart with the exact same module set
>
> (with matching versions) as that which created the KAR. Essentially
>
> the more strict this setting, the more warnings a user will get
>
> when trying to open KARs.
>
>
> Derik
>
>
>
> On Jul 22, 2010, at 4:35 PM, ben leinfelder wrote:
>
>
> After bouncing these ideas around with Derik, here's a hybrid
>
> approach to handling module dependencies:
>
>
> -begin including module version number when writing module
>
> dependencies in a KAR file
>
> -when opening [possibly older] KARs that require non-vanilla
>
> modules:
>
>   -if the module version is not specified, then fetch the latest
>
> release of that module
>
>   -if the module version is specified, then fetch that version*
>
> *In practice we'll probably want Strict and Lax modes so that we
>
> aren't constantly swapping out modules each time we open a
>
> different KAR [for minor version changes].
>
>
> Additional notes:
>
> -Development on the trunk - where there is no module version -
>
> should also be considered a special case that does not trigger
>
> module download. It'd be nice to resolve the dependency using
>
> modules from the trunk if we are running from it.
>
> -We don't want to inadvertently downgrade someone who is opening
>
> an older KAR but wants to work with a newer version of the module.
>
> We need to be able to save an older KAR with newer module features.
>
> -We do want to allow a downgrade in cases where old features need
>
> to be used, say when reproducing workflow run results from a KAR -
>
> the whole point is that we reproduce them exactly - "archive"
>
> being the operative word.
>
>
> Hope I captured most of our discussion!
>
> -ben
>
>
> On Jul 22, 2010, at 2:34 PM, Derik Barseghian wrote:
>
>
> Hi all,
>
>
> I've found a bug that's blocking the reporting-2.0 release: when
>
> you're in vanilla kepler, and have a KAR in MyWorkflows that was
>
> created with modules you don't have installed, the KAR 'Import
>
> Dependent Modules' context menu item doesn't work:
>
> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5099
>
> I can take this bug if Kepler/CORE members can clarify what the
>
> desired behavior is -- a significant design decision is made
>
> depending on how this is fixed.
>
>
> The issue is basically that in a KAR manifest there are module-
>
> dependencies and dependsOnModules attributes, but the values
>
> stored are module names without version number. The 'Import
>
> Dependent Modules' action attempts to use these values to
>
> download dependencies, but fails because it tries to e.g. fetch
>
> provenance.zip instead of provenance-2.0.0.zip.
>
>
> Two issues come to mind:
>
>
> 1) When created, should a KAR manifest store exact versions of
>
> module dependencies instead of just module names?
>
>
> 2) Which versions of modules should the 'Import Dependent
>
> Modules' attempt to fetch and install?
>
>
> A) If we start storing module versions in the manifest, exactly
>
> those? This will not work if a module version is no longer
>
> available, so we would likely want to keep all old versions of
>
> modules in the released area of our repository (is this already
>
> the plan?).
>
>
> B) The newest? This means requiring modules remain backward
>
> compatible with all versions of their KAR artifacts. This would
>
> also require fetching the modules in the proper order. If
>
> aSuite-2.1.0 is out and it depends on bSuite-2.0.0, and
>
> bSuite-2.1.0 is also available, bSuite-2.1.0 should not be
>
> downloaded. If the manifest module-dependencies attribute doesn't
>
> store these in the right order (not sure), given a list of
>
> modules to download, the module manager code would have to be
>
> able to figure this out (maybe it already can?).
>
>
> C) Something else? :)
>
>
> I think B may be the way to go, and also does not require we do
>
> 1), even though we may want to do that in the future.
>
>
> Thanks,
>
> Derik
>
>
>
>
> _______________________________________________
>
> Kepler-dev mailing list
>
> Kepler-dev at kepler-project.org
>
> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at kepler-project.org
> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>
>
>
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at kepler-project.org
> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/kepler/pipermail/kepler-dev/attachments/20100804/9dd099f2/attachment-0001.html>


More information about the Kepler-dev mailing list