[kepler-dev] KARs and module dependencies
Derik Barseghian
barseghian at nceas.ucsb.edu
Mon Aug 2 17:16:17 PDT 2010
Hi all,
I've implemented most of what was discussed below. However, this
solution doesn't cover the case of workflows saved as just xml -- i.e.
not in a KAR w/ a manifest that lists module-dependencies. A workflow
can be created in a suite that add entries to the moml (like yourActor
module, or reporting or provenance), and when you attempt to open this
workflow in vanilla, you get NPEs re: elements missing, instead of a
prompt asking you to download the missing modules.
Two solutions come to mind:
1) No longer allow saving and opening workflows as xml, always save to
a KAR
pro: Simplifies our GUI wrt saving and opening.
con: Negatively affects those currently using plain workflow files.
Would probably require a utility to create a KAR from a workflow. At
least some refactoring required to remove options from command line
that utilize plain workflow files.
2) Move or keep a copy of module-dependencies in the workflow itself,
and refactor to check these before actually attempting to open the
workflow.
pro: things continue to work similarly, just additional messages when
you lack modules.
con: It's not clear to me yet how much additional work this represents
Please let me know your thoughts and additional pros/cons as you see
them.
Thanks,
Derik
On Jul 27, 2010, at 12:42 AM, David Welker wrote:
> Hi Derik,
>
> I meant to send you this earlier, but bad weather prevented the
> satelite-based internet connection that I am using from working.
>
> The way I see it, there are two major use cases here: (1) The user
> wants to replicate someone else's work (more rare, but useful,
> especially to compare someone elses algorithm with one's own) or (2)
> the user is primarily interested in developing a workflow for their
> own research (the more common case).
>
> There are at least two observations to be made: (1) To be assured of
> proper replication, it would be extremely helpful to have the exact
> module list under which the workflow was actually run, including
> precise version numbers and (2) for development and extension,
> users do not really care what module provides the necessary services
> for the workflow; they only care that the necessary services are
> provided from some module. Of course, the same user can be
> interested in both replication and development, but in that case,
> they would only be interested in specific module information while
> they are concerned with replication but would not otherwise care
> about that or desire to be limited by that during development.
>
> These observations have a couple of coceptual implications for the
> question of versioning workflows. Most fundamentally, a workflow
> developer will not be aware of the interests and needs of future
> users of their workflow. Therefore, they would not be in a good
> position to specify things such as a range of "compatible" module
> versions that are necessary. They do not know whether future users
> wish to engage in replication with the same and/or different data or
> whether they wish to engage in further development if the algorithm
> embedded in the workflow. Second, as has been mentioned above, when
> extended development of the workflow algorithm is contemplated,
> future users do not care which modules provide the services their
> workflow needs nor do they want to be limited by a specification
> that suggests that a particular module (or range of versions) is
> necessary. Not only us a workflow developer not in a good position
> to assert that a particular module (or range of versions) is
> necessary, it would be an error to even attempt to do do. When it
> comes to future development, there is no such thing as a necessary
> module.
>
> These observations also haves a few possible implementation
> implications. First, whenever a KAR is saved, a list of precise
> module versions should be saved right along with it. There should be
> a corresponding menu option called "retrieve modules for
> replication" that will be available for that workflow when it is
> later loaded. This option should be grayed out for older workflows
> that did not save precise version information. Second, the concept
> of different levels of "strictness of compliance" should be ditched.
> Either the user is concerned about replication, which requires
> precise version information to proceed with full confidence, or
> specific modules do not matter at all. In other words, there should
> be only one level of compliance for replication -- very strict.
> Third, in the future, in addition to a precise module list, we
> should save a list of services used by the workflow. This would
> provide developers extending an existing workflow with guidance
> concerning what services the workflow needs to run (but of course
> what services are really necessary and what constitutes a "working"
> workflow is something that only future developers can say with
> authority). Kepler does not currently support services, but I think
> we should provide such support in the near future.
>
> The bottom-line implications I see for your current plan is that I
> do not think a workflow/KAR should ever REQUIRE a level of
> strictness. Instead, if precise information is available, there
> should be a replication OPTION. That option should be completely
> strict about versions retrieved. On the other hand, If the user does
> not actively indicate a desire to replicate, the workflow should try
> to open without warnings to the user. That is all that I think
> should be done for now.
>
> Later on, if we provide explicit support for the concept of
> services, we will be in a position to warn the user if a service we
> think is probably necessary (but who are we to say for sure?) does
> not appear to be provided by any of the modules that are currently
> active. But for now, without support for services, I do not think
> any warnings are in order.
>
> I hope this perspective is helpful. Sorry for the delay, but the
> weather ruined my Internet connection and I was loathe to compose
> such an involved email using my iPhone. But I have finally bit the
> bullet, so here it is. Of course, as the future user of the
> "service" provided by this email, only you are in a position to
> determine whether the ideas expressed herein are useful to you. At
> least if the principle regarding who should have authority to
> determine what services are necessary and/or useful (future
> developers) is correct. =)
>
> Sent from my iPhone
>
> On Jul 23, 2010, at 8:40 PM, Derik Barseghian <barseghian at nceas.ucsb.edu
> > wrote:
>
>> Thanks very much Ben and Dan for the discussions on this topic.
>>
>> What Ben lists is essentially my plan, and I'm moving forward with
>> implementation. To summarize some additional points:
>>
>> - This plan requires module repositories maintain all published
>> versions of a module.
>> I will:
>> - Include the complete "currently active" module list (with version
>> numbers) in the module-dependencies KAR attribute.
>> - Eliminate the dependsOnModule KAR entry attribute, since it will
>> no longer be used, updating the KAR version, schema, etc in the
>> process.
>> - Remove any code that was inserting a dependsOnModules value into
>> the workflow.
>> - Create a new user preference for the strictness of KAR
>> compliance. I'm still determining the different modes. But e.g. if
>> the user is set to 'very strict', in order to open a KAR, they will
>> be prompted to import and restart with the exact same module set
>> (with matching versions) as that which created the KAR. Essentially
>> the more strict this setting, the more warnings a user will get
>> when trying to open KARs.
>>
>> Derik
>>
>>
>> On Jul 22, 2010, at 4:35 PM, ben leinfelder wrote:
>>
>>> After bouncing these ideas around with Derik, here's a hybrid
>>> approach to handling module dependencies:
>>>
>>> -begin including module version number when writing module
>>> dependencies in a KAR file
>>> -when opening [possibly older] KARs that require non-vanilla
>>> modules:
>>> -if the module version is not specified, then fetch the latest
>>> release of that module
>>> -if the module version is specified, then fetch that version*
>>> *In practice we'll probably want Strict and Lax modes so that we
>>> aren't constantly swapping out modules each time we open a
>>> different KAR [for minor version changes].
>>>
>>> Additional notes:
>>> -Development on the trunk - where there is no module version -
>>> should also be considered a special case that does not trigger
>>> module download. It'd be nice to resolve the dependency using
>>> modules from the trunk if we are running from it.
>>> -We don't want to inadvertently downgrade someone who is opening
>>> an older KAR but wants to work with a newer version of the module.
>>> We need to be able to save an older KAR with newer module features.
>>> -We do want to allow a downgrade in cases where old features need
>>> to be used, say when reproducing workflow run results from a KAR -
>>> the whole point is that we reproduce them exactly - "archive"
>>> being the operative word.
>>>
>>> Hope I captured most of our discussion!
>>> -ben
>>>
>>> On Jul 22, 2010, at 2:34 PM, Derik Barseghian wrote:
>>>
>>>> Hi all,
>>>>
>>>> I've found a bug that's blocking the reporting-2.0 release: when
>>>> you're in vanilla kepler, and have a KAR in MyWorkflows that was
>>>> created with modules you don't have installed, the KAR 'Import
>>>> Dependent Modules' context menu item doesn't work: http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5099
>>>> I can take this bug if Kepler/CORE members can clarify what the
>>>> desired behavior is -- a significant design decision is made
>>>> depending on how this is fixed.
>>>>
>>>> The issue is basically that in a KAR manifest there are module-
>>>> dependencies and dependsOnModules attributes, but the values
>>>> stored are module names without version number. The 'Import
>>>> Dependent Modules' action attempts to use these values to
>>>> download dependencies, but fails because it tries to e.g. fetch
>>>> provenance.zip instead of provenance-2.0.0.zip.
>>>>
>>>> Two issues come to mind:
>>>>
>>>> 1) When created, should a KAR manifest store exact versions of
>>>> module dependencies instead of just module names?
>>>>
>>>> 2) Which versions of modules should the 'Import Dependent
>>>> Modules' attempt to fetch and install?
>>>>
>>>> A) If we start storing module versions in the manifest, exactly
>>>> those? This will not work if a module version is no longer
>>>> available, so we would likely want to keep all old versions of
>>>> modules in the released area of our repository (is this already
>>>> the plan?).
>>>>
>>>> B) The newest? This means requiring modules remain backward
>>>> compatible with all versions of their KAR artifacts. This would
>>>> also require fetching the modules in the proper order. If
>>>> aSuite-2.1.0 is out and it depends on bSuite-2.0.0, and
>>>> bSuite-2.1.0 is also available, bSuite-2.1.0 should not be
>>>> downloaded. If the manifest module-dependencies attribute doesn't
>>>> store these in the right order (not sure), given a list of
>>>> modules to download, the module manager code would have to be
>>>> able to figure this out (maybe it already can?).
>>>>
>>>> C) Something else? :)
>>>>
>>>> I think B may be the way to go, and also does not require we do
>>>> 1), even though we may want to do that in the future.
>>>>
>>>> Thanks,
>>>> Derik
>>>>
More information about the Kepler-dev
mailing list