[kepler-dev] KARs and module dependencies

Paul Edward Allen pea1 at cornell.edu
Wed Aug 4 05:20:20 PDT 2010


I am very much in favor of Kepler continuing to support reading/saving workflows as plain XML files. There are thousands of Kepler workflows out there that are plain XML, and at the very least you'd have to come up with a mechanism/utility to convert those to KAR files if the decision was made for Kepler not to support plain XML. 

Please remember those of us who are using non-Java GUIs to create workflows. Thanks.

-Paul

> -----Original Message-----
> From: kepler-dev-bounces at kepler-project.org [mailto:kepler-dev-
> bounces at kepler-project.org] On Behalf Of Derik Barseghian
> Sent: Monday, August 02, 2010 8:16 PM
> To: Kepler Developers
> Subject: Re: [kepler-dev] KARs and module dependencies
> 
> Hi all,
> 
> I've implemented most of what was discussed below. However, this
> solution doesn't cover the case of workflows saved as just xml -- i.e.
> not in a KAR w/ a manifest that lists module-dependencies. A workflow
> can be created in a suite that add entries to the moml (like yourActor
> module, or reporting or provenance), and when you attempt to open this
> workflow in vanilla, you get NPEs re: elements missing, instead of a
> prompt asking you to download the missing modules.
> 
> Two solutions come to mind:
> 1) No longer allow saving and opening workflows as xml, always save to
> a KAR
> 	pro: Simplifies our GUI wrt saving and opening.
> 	con: Negatively affects those currently using plain workflow files.
> Would probably require a utility to create a KAR from a workflow. At
> least some refactoring required to remove options from command line
> that utilize plain workflow files.
> 
> 2) Move or keep a copy of module-dependencies in the workflow itself,
> and refactor to check these before actually attempting to open the
> workflow.
> 	pro: things continue to work similarly, just additional messages when
> you lack modules.
> 	con: It's not clear to me yet how much additional work this
> represents
> 
> Please let me know your thoughts and additional pros/cons as you see
> them.
> Thanks,
> Derik
> 
> On Jul 27, 2010, at 12:42 AM, David Welker wrote:
> 
> > Hi Derik,
> >
> > I meant to send you this earlier, but bad weather prevented the
> > satelite-based internet connection that I am using from working.
> >
> > The way I see it, there are two major use cases here: (1) The user
> > wants to replicate someone else's work (more rare, but useful,
> > especially to compare someone elses algorithm with one's own) or (2)
> > the user is primarily interested in developing a workflow for their
> > own research (the more common case).
> >
> > There are at least two observations to be made: (1) To be assured of
> > proper replication, it would be extremely helpful to have the exact
> > module list under which the workflow was actually run, including
> > precise version numbers and (2) for development and extension,
> > users do not really care what module provides the necessary services
> > for the workflow; they only care that the necessary services are
> > provided from some module. Of course, the same user can be
> > interested in both replication and development, but in that case,
> > they would only be interested in specific module information while
> > they are concerned with replication but would not otherwise care
> > about that or desire to be limited by that during development.
> >
> > These observations have a couple of coceptual implications for the
> > question of versioning workflows. Most fundamentally, a workflow
> > developer will not be aware of the interests and needs of future
> > users of their workflow. Therefore, they would not be in a good
> > position to specify things such as a range of "compatible" module
> > versions that are necessary. They do not know whether future users
> > wish to engage in replication with the same and/or different data or
> > whether they wish to engage in further development if the algorithm
> > embedded in the workflow. Second, as has been mentioned above, when
> > extended development of the workflow algorithm is contemplated,
> > future users do not care which modules provide the services their
> > workflow needs nor do they want to be limited by a specification
> > that suggests that a particular module (or range of versions) is
> > necessary. Not only us a workflow developer not in a good position
> > to assert that a particular module (or range of versions) is
> > necessary, it would be an error to even attempt to do do. When it
> > comes to future development, there is no such thing as a necessary
> > module.
> >
> > These observations also haves a few possible implementation
> > implications. First, whenever a KAR is saved, a list of precise
> > module versions should be saved right along with it. There should be
> > a corresponding menu option called "retrieve modules for
> > replication" that will be available for that workflow when it is
> > later loaded. This option should be grayed out for older workflows
> > that did not save precise version information. Second, the concept
> > of different levels of "strictness of compliance" should be ditched.
> > Either the user is concerned about replication, which requires
> > precise version information to proceed with full confidence, or
> > specific modules do not matter at all. In other words, there should
> > be only one level of compliance for replication -- very strict.
> > Third, in the future, in addition to a precise module list, we
> > should save a list of services used by the workflow. This would
> > provide developers extending an existing workflow with guidance
> > concerning what services the workflow needs to run (but of course
> > what services are really necessary and what constitutes a "working"
> > workflow is something that only future developers can say with
> > authority). Kepler does not currently support services, but I think
> > we should provide such support in the near future.
> >
> > The bottom-line implications I see for your current plan is that I
> > do not think a workflow/KAR should ever REQUIRE a level of
> > strictness. Instead, if precise information is available, there
> > should be a replication OPTION. That option should be completely
> > strict about versions retrieved. On the other hand, If the user does
> > not actively indicate a desire to replicate, the workflow should try
> > to open without warnings to the user. That is all that I think
> > should be done for now.
> >
> > Later on, if we provide explicit support for the concept of
> > services, we will be in a position to warn the user if a service we
> > think is probably necessary (but who are we to say for sure?) does
> > not appear to be provided by any of the modules that are currently
> > active. But for now, without support for services, I do not think
> > any warnings are in order.
> >
> > I hope this perspective is helpful. Sorry for the delay, but the
> > weather ruined my Internet connection and I was loathe to compose
> > such an involved email using my iPhone. But I have finally bit the
> > bullet, so here it is. Of course, as the future user of the
> > "service" provided by this email, only you are in a position to
> > determine whether the ideas expressed herein are useful to you. At
> > least if the principle regarding who should have authority to
> > determine what services are necessary and/or useful (future
> > developers) is correct. =)
> >
> > Sent from my iPhone
> >
> > On Jul 23, 2010, at 8:40 PM, Derik Barseghian <barseghian at nceas.ucsb.edu
> > > wrote:
> >
> >> Thanks very much Ben and Dan for the discussions on this topic.
> >>
> >> What Ben lists is essentially my plan, and I'm moving forward with
> >> implementation. To summarize some additional points:
> >>
> >> - This plan requires module repositories maintain all published
> >> versions of a module.
> >> I will:
> >> - Include the complete "currently active" module list (with version
> >> numbers) in the module-dependencies KAR attribute.
> >> - Eliminate the dependsOnModule KAR entry attribute, since it will
> >> no longer be used, updating the KAR version, schema, etc in the
> >> process.
> >> - Remove any code that was inserting a dependsOnModules value into
> >> the workflow.
> >> - Create a new user preference for the strictness of KAR
> >> compliance. I'm still determining the different modes. But e.g. if
> >> the user is set to 'very strict', in order to open a KAR, they will
> >> be prompted to import and restart with the exact same module set
> >> (with matching versions) as that which created the KAR. Essentially
> >> the more strict this setting, the more warnings a user will get
> >> when trying to open KARs.
> >>
> >> Derik
> >>
> >>
> >> On Jul 22, 2010, at 4:35 PM, ben leinfelder wrote:
> >>
> >>> After bouncing these ideas around with Derik, here's a hybrid
> >>> approach to handling module dependencies:
> >>>
> >>> -begin including module version number when writing module
> >>> dependencies in a KAR file
> >>> -when opening [possibly older] KARs that require non-vanilla
> >>> modules:
> >>>    -if the module version is not specified, then fetch the latest
> >>> release of that module
> >>>    -if the module version is specified, then fetch that version*
> >>> *In practice we'll probably want Strict and Lax modes so that we
> >>> aren't constantly swapping out modules each time we open a
> >>> different KAR [for minor version changes].
> >>>
> >>> Additional notes:
> >>> -Development on the trunk - where there is no module version -
> >>> should also be considered a special case that does not trigger
> >>> module download. It'd be nice to resolve the dependency using
> >>> modules from the trunk if we are running from it.
> >>> -We don't want to inadvertently downgrade someone who is opening
> >>> an older KAR but wants to work with a newer version of the module.
> >>> We need to be able to save an older KAR with newer module features.
> >>> -We do want to allow a downgrade in cases where old features need
> >>> to be used, say when reproducing workflow run results from a KAR -
> >>> the whole point is that we reproduce them exactly - "archive"
> >>> being the operative word.
> >>>
> >>> Hope I captured most of our discussion!
> >>> -ben
> >>>
> >>> On Jul 22, 2010, at 2:34 PM, Derik Barseghian wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I've found a bug that's blocking the reporting-2.0 release: when
> >>>> you're in vanilla kepler, and have a KAR in MyWorkflows that was
> >>>> created with modules you don't have installed, the KAR 'Import
> >>>> Dependent Modules' context menu item doesn't work:
> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5099
> >>>> I can take this bug if Kepler/CORE members can clarify what the
> >>>> desired behavior is -- a significant design decision is made
> >>>> depending on how this is fixed.
> >>>>
> >>>> The issue is basically that in a KAR manifest there are module-
> >>>> dependencies and dependsOnModules attributes, but the values
> >>>> stored are module names without version number. The 'Import
> >>>> Dependent Modules' action attempts to use these values to
> >>>> download dependencies, but fails because it tries to e.g. fetch
> >>>> provenance.zip instead of provenance-2.0.0.zip.
> >>>>
> >>>> Two issues come to mind:
> >>>>
> >>>> 1) When created, should a KAR manifest store exact versions of
> >>>> module dependencies instead of just module names?
> >>>>
> >>>> 2) Which versions of modules should the 'Import Dependent
> >>>> Modules' attempt to fetch and install?
> >>>>
> >>>> A) If we start storing module versions in the manifest, exactly
> >>>> those? This will not work if a module version is no longer
> >>>> available, so we would likely want to keep all old versions of
> >>>> modules in the released area of our repository (is this already
> >>>> the plan?).
> >>>>
> >>>> B) The newest? This means requiring modules remain backward
> >>>> compatible with all versions of their KAR artifacts. This would
> >>>> also require fetching the modules in the proper order. If
> >>>> aSuite-2.1.0 is out and it depends on bSuite-2.0.0, and
> >>>> bSuite-2.1.0 is also available, bSuite-2.1.0 should not be
> >>>> downloaded. If the manifest module-dependencies attribute doesn't
> >>>> store these in the right order (not sure), given a list of
> >>>> modules to download, the module manager code would have to be
> >>>> able to figure this out (maybe it already can?).
> >>>>
> >>>> C) Something else? :)
> >>>>
> >>>> I think B may be the way to go, and also does not require we do
> >>>> 1), even though we may want to do that in the future.
> >>>>
> >>>> Thanks,
> >>>> Derik
> >>>>
> 
> 
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at kepler-project.org
> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev


More information about the Kepler-dev mailing list