[kepler-dev] KARs and module dependencies
David Welker
david.v.welker at gmail.com
Tue Jul 27 00:42:11 PDT 2010
Hi Derik,
I meant to send you this earlier, but bad weather prevented the
satelite-based internet connection that I am using from working.
The way I see it, there are two major use cases here: (1) The user
wants to replicate someone else's work (more rare, but useful,
especially to compare someone elses algorithm with one's own) or (2)
the user is primarily interested in developing a workflow for their
own research (the more common case).
There are at least two observations to be made: (1) To be assured of
proper replication, it would be extremely helpful to have the exact
module list under which the workflow was actually run, including
precise version numbers and (2) for development and extension, users
do not really care what module provides the necessary services for the
workflow; they only care that the necessary services are provided from
some module. Of course, the same user can be interested in both
replication and development, but in that case, they would only be
interested in specific module information while they are concerned
with replication but would not otherwise care about that or desire to
be limited by that during development.
These observations have a couple of coceptual implications for the
question of versioning workflows. Most fundamentally, a workflow
developer will not be aware of the interests and needs of future users
of their workflow. Therefore, they would not be in a good position to
specify things such as a range of "compatible" module versions that
are necessary. They do not know whether future users wish to engage in
replication with the same and/or different data or whether they wish
to engage in further development if the algorithm embedded in the
workflow. Second, as has been mentioned above, when extended
development of the workflow algorithm is contemplated, future users do
not care which modules provide the services their workflow needs nor
do they want to be limited by a specification that suggests that a
particular module (or range of versions) is necessary. Not only us a
workflow developer not in a good position to assert that a particular
module (or range of versions) is necessary, it would be an error to
even attempt to do do. When it comes to future development, there is
no such thing as a necessary module.
These observations also haves a few possible implementation
implications. First, whenever a KAR is saved, a list of precise module
versions should be saved right along with it. There should be a
corresponding menu option called "retrieve modules for replication"
that will be available for that workflow when it is later loaded. This
option should be grayed out for older workflows that did not save
precise version information. Second, the concept of different levels
of "strictness of compliance" should be ditched. Either the user is
concerned about replication, which requires precise version
information to proceed with full confidence, or specific modules do
not matter at all. In other words, there should be only one level of
compliance for replication -- very strict. Third, in the future, in
addition to a precise module list, we should save a list of services
used by the workflow. This would provide developers extending an
existing workflow with guidance concerning what services the workflow
needs to run (but of course what services are really necessary and
what constitutes a "working" workflow is something that only future
developers can say with authority). Kepler does not currently support
services, but I think we should provide such support in the near future.
The bottom-line implications I see for your current plan is that I do
not think a workflow/KAR should ever REQUIRE a level of strictness.
Instead, if precise information is available, there should be a
replication OPTION. That option should be completely strict about
versions retrieved. On the other hand, If the user does not actively
indicate a desire to replicate, the workflow should try to open
without warnings to the user. That is all that I think should be done
for now.
Later on, if we provide explicit support for the concept of services,
we will be in a position to warn the user if a service we think is
probably necessary (but who are we to say for sure?) does not appear
to be provided by any of the modules that are currently active. But
for now, without support for services, I do not think any warnings are
in order.
I hope this perspective is helpful. Sorry for the delay, but the
weather ruined my Internet connection and I was loathe to compose such
an involved email using my iPhone. But I have finally bit the bullet,
so here it is. Of course, as the future user of the "service" provided
by this email, only you are in a position to determine whether the
ideas expressed herein are useful to you. At least if the principle
regarding who should have authority to determine what services are
necessary and/or useful (future developers) is correct. =)
Sent from my iPhone
On Jul 23, 2010, at 8:40 PM, Derik Barseghian
<barseghian at nceas.ucsb.edu> wrote:
> Thanks very much Ben and Dan for the discussions on this topic.
>
> What Ben lists is essentially my plan, and I'm moving forward with
> implementation. To summarize some additional points:
>
> - This plan requires module repositories maintain all published
> versions of a module.
> I will:
> - Include the complete "currently active" module list (with version
> numbers) in the module-dependencies KAR attribute.
> - Eliminate the dependsOnModule KAR entry attribute, since it will
> no longer be used, updating the KAR version, schema, etc in the
> process.
> - Remove any code that was inserting a dependsOnModules value into
> the workflow.
> - Create a new user preference for the strictness of KAR compliance.
> I'm still determining the different modes. But e.g. if the user is
> set to 'very strict', in order to open a KAR, they will be prompted
> to import and restart with the exact same module set (with matching
> versions) as that which created the KAR. Essentially the more strict
> this setting, the more warnings a user will get when trying to open
> KARs.
>
> Derik
>
>
> On Jul 22, 2010, at 4:35 PM, ben leinfelder wrote:
>
>> After bouncing these ideas around with Derik, here's a hybrid
>> approach to handling module dependencies:
>>
>> -begin including module version number when writing module
>> dependencies in a KAR file
>> -when opening [possibly older] KARs that require non-vanilla modules:
>> -if the module version is not specified, then fetch the latest
>> release of that module
>> -if the module version is specified, then fetch that version*
>> *In practice we'll probably want Strict and Lax modes so that we
>> aren't constantly swapping out modules each time we open a
>> different KAR [for minor version changes].
>>
>> Additional notes:
>> -Development on the trunk - where there is no module version -
>> should also be considered a special case that does not trigger
>> module download. It'd be nice to resolve the dependency using
>> modules from the trunk if we are running from it.
>> -We don't want to inadvertently downgrade someone who is opening an
>> older KAR but wants to work with a newer version of the module. We
>> need to be able to save an older KAR with newer module features.
>> -We do want to allow a downgrade in cases where old features need
>> to be used, say when reproducing workflow run results from a KAR -
>> the whole point is that we reproduce them exactly - "archive" being
>> the operative word.
>>
>> Hope I captured most of our discussion!
>> -ben
>>
>> On Jul 22, 2010, at 2:34 PM, Derik Barseghian wrote:
>>
>>> Hi all,
>>>
>>> I've found a bug that's blocking the reporting-2.0 release: when
>>> you're in vanilla kepler, and have a KAR in MyWorkflows that was
>>> created with modules you don't have installed, the KAR 'Import
>>> Dependent Modules' context menu item doesn't work: http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5099
>>> I can take this bug if Kepler/CORE members can clarify what the
>>> desired behavior is -- a significant design decision is made
>>> depending on how this is fixed.
>>>
>>> The issue is basically that in a KAR manifest there are module-
>>> dependencies and dependsOnModules attributes, but the values
>>> stored are module names without version number. The 'Import
>>> Dependent Modules' action attempts to use these values to download
>>> dependencies, but fails because it tries to e.g. fetch
>>> provenance.zip instead of provenance-2.0.0.zip.
>>>
>>> Two issues come to mind:
>>>
>>> 1) When created, should a KAR manifest store exact versions of
>>> module dependencies instead of just module names?
>>>
>>> 2) Which versions of modules should the 'Import Dependent Modules'
>>> attempt to fetch and install?
>>>
>>> A) If we start storing module versions in the manifest, exactly
>>> those? This will not work if a module version is no longer
>>> available, so we would likely want to keep all old versions of
>>> modules in the released area of our repository (is this already
>>> the plan?).
>>>
>>> B) The newest? This means requiring modules remain backward
>>> compatible with all versions of their KAR artifacts. This would
>>> also require fetching the modules in the proper order. If
>>> aSuite-2.1.0 is out and it depends on bSuite-2.0.0, and
>>> bSuite-2.1.0 is also available, bSuite-2.1.0 should not be
>>> downloaded. If the manifest module-dependencies attribute doesn't
>>> store these in the right order (not sure), given a list of modules
>>> to download, the module manager code would have to be able to
>>> figure this out (maybe it already can?).
>>>
>>> C) Something else? :)
>>>
>>> I think B may be the way to go, and also does not require we do
>>> 1), even though we may want to do that in the future.
>>>
>>> Thanks,
>>> Derik
>>> _______________________________________________
>>> Kepler-dev mailing list
>>> Kepler-dev at kepler-project.org
>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>
>
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at kepler-project.org
> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/kepler/pipermail/kepler-dev/attachments/20100727/3528f824/attachment-0001.html>
More information about the Kepler-dev
mailing list