[kepler-dev] KARs and module dependencies

David Welker david.v.welker at gmail.com
Tue Jul 27 00:42:11 PDT 2010


Hi Derik,

I meant to send you this earlier, but bad weather prevented the  
satelite-based internet connection that I am using from working.

The way I see it, there are two major use cases here: (1) The user  
wants to replicate someone else's work (more rare, but useful,  
especially to compare someone elses algorithm with one's own) or (2)  
the user is primarily interested in developing a workflow for their  
own research (the more common case).

There are at least two observations to be made: (1) To be assured of  
proper replication, it would be extremely helpful to have the exact  
module list under which the workflow was actually run, including  
precise version numbers and (2) for development and extension,  users  
do not really care what module provides the necessary services for the  
workflow; they only care that the necessary services are provided from  
some module. Of course, the same user can be interested in both  
replication and development, but in that case, they would only be  
interested in specific module information while they are concerned  
with replication but would not otherwise care about that or desire to  
be limited by that during development.

These observations have a couple of coceptual implications for the  
question of versioning workflows. Most fundamentally, a workflow  
developer will not be aware of the interests and needs of future users  
of their workflow. Therefore, they would not be in a good position to  
specify things such as a range of "compatible" module versions that  
are necessary. They do not know whether future users wish to engage in  
replication with the same and/or different data or whether they wish  
to engage in further development if the algorithm embedded in the  
workflow. Second, as has been mentioned above, when extended  
development of the workflow algorithm is contemplated, future users do  
not care which modules provide the services their workflow needs nor  
do they want to be limited by a specification that suggests that a  
particular module (or range of versions) is necessary. Not only us a  
workflow developer not in a good position to assert that a particular  
module (or range of versions) is necessary, it would be an error to  
even attempt to do do. When it comes to future development, there is  
no such thing as a necessary module.

These observations also haves a few possible implementation  
implications. First, whenever a KAR is saved, a list of precise module  
versions should be saved right along with it. There should be a  
corresponding menu option called "retrieve modules for replication"  
that will be available for that workflow when it is later loaded. This  
option should be grayed out for older workflows that did not save  
precise version information. Second, the concept of different levels  
of "strictness of compliance" should be ditched. Either the user is  
concerned about replication, which requires precise version  
information to proceed with full confidence, or specific modules do  
not matter at all. In other words, there should be only one level of  
compliance for replication -- very strict. Third, in the future, in  
addition to a precise module list, we should save a list of services  
used by the workflow. This would provide developers extending an  
existing workflow with guidance concerning what services the workflow  
needs to run (but of course what services are really necessary and  
what constitutes a "working" workflow is something that only future  
developers can say with authority). Kepler does not currently support  
services, but I think we should provide such support in the near future.

The bottom-line implications I see for your current plan is that I do  
not think a workflow/KAR should ever REQUIRE a level of strictness.  
Instead, if precise information is available, there should be a  
replication OPTION. That option should be completely strict about  
versions retrieved. On the other hand, If the user does not actively  
indicate a desire to replicate, the workflow should try to open  
without warnings to the user. That is all that I think should be done  
for now.

Later on, if we provide explicit support for the concept of services,  
we will be in a position to warn the user if a service we think is  
probably necessary (but who are we to say for sure?) does not appear  
to be provided by any of the modules that are currently active. But  
for now, without support for services, I do not think any warnings are  
in order.

I hope this perspective is helpful. Sorry for the delay, but the  
weather ruined my Internet connection and I was loathe to compose such  
an involved email using my iPhone. But I have finally bit the bullet,  
so here it is. Of course, as the future user of the "service" provided  
by this email, only you are in a position to determine whether the  
ideas expressed herein are useful to you. At least if the principle  
regarding who should have authority to determine what services are  
necessary and/or useful (future developers) is correct. =)

Sent from my iPhone

On Jul 23, 2010, at 8:40 PM, Derik Barseghian  
<barseghian at nceas.ucsb.edu> wrote:

> Thanks very much Ben and Dan for the discussions on this topic.
>
> What Ben lists is essentially my plan, and I'm moving forward with  
> implementation. To summarize some additional points:
>
> - This plan requires module repositories maintain all published  
> versions of a module.
> I will:
> - Include the complete "currently active" module list (with version  
> numbers) in the module-dependencies KAR attribute.
> - Eliminate the dependsOnModule KAR entry attribute, since it will  
> no longer be used, updating the KAR version, schema, etc in the  
> process.
> - Remove any code that was inserting a dependsOnModules value into  
> the workflow.
> - Create a new user preference for the strictness of KAR compliance.  
> I'm still determining the different modes. But e.g. if the user is  
> set to 'very strict', in order to open a KAR, they will be prompted  
> to import and restart with the exact same module set (with matching  
> versions) as that which created the KAR. Essentially the more strict  
> this setting, the more warnings a user will get when trying to open  
> KARs.
>
> Derik
>
>
> On Jul 22, 2010, at 4:35 PM, ben leinfelder wrote:
>
>> After bouncing these ideas around with Derik, here's a hybrid  
>> approach to handling module dependencies:
>>
>> -begin including module version number when writing module  
>> dependencies in a KAR file
>> -when opening [possibly older] KARs that require non-vanilla modules:
>>    -if the module version is not specified, then fetch the latest  
>> release of that module
>>    -if the module version is specified, then fetch that version*
>> *In practice we'll probably want Strict and Lax modes so that we  
>> aren't constantly swapping out modules each time we open a  
>> different KAR [for minor version changes].
>>
>> Additional notes:
>> -Development on the trunk - where there is no module version -  
>> should also be considered a special case that does not trigger  
>> module download. It'd be nice to resolve the dependency using  
>> modules from the trunk if we are running from it.
>> -We don't want to inadvertently downgrade someone who is opening an  
>> older KAR but wants to work with a newer version of the module. We  
>> need to be able to save an older KAR with newer module features.
>> -We do want to allow a downgrade in cases where old features need  
>> to be used, say when reproducing workflow run results from a KAR -  
>> the whole point is that we reproduce them exactly - "archive" being  
>> the operative word.
>>
>> Hope I captured most of our discussion!
>> -ben
>>
>> On Jul 22, 2010, at 2:34 PM, Derik Barseghian wrote:
>>
>>> Hi all,
>>>
>>> I've found a bug that's blocking the reporting-2.0 release: when  
>>> you're in vanilla kepler, and have a KAR in MyWorkflows that was  
>>> created with modules you don't have installed, the KAR 'Import  
>>> Dependent Modules' context menu item doesn't work: http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5099
>>> I can take this bug if Kepler/CORE members can clarify what the  
>>> desired behavior is -- a significant design decision is made  
>>> depending on how this is fixed.
>>>
>>> The issue is basically that in a KAR manifest there are module- 
>>> dependencies and dependsOnModules attributes, but the values  
>>> stored are module names without version number. The 'Import  
>>> Dependent Modules' action attempts to use these values to download  
>>> dependencies, but fails because it tries to e.g. fetch  
>>> provenance.zip instead of provenance-2.0.0.zip.
>>>
>>> Two issues come to mind:
>>>
>>> 1) When created, should a KAR manifest store exact versions of  
>>> module dependencies instead of just module names?
>>>
>>> 2) Which versions of modules should the 'Import Dependent Modules'  
>>> attempt to fetch and install?
>>>
>>> A) If we start storing module versions in the manifest, exactly  
>>> those? This will not work if a module version is no longer  
>>> available, so we would likely want to keep all old versions of  
>>> modules in the released area of our repository (is this already  
>>> the plan?).
>>>
>>> B) The newest? This means requiring modules remain backward  
>>> compatible with all versions of their KAR artifacts. This would  
>>> also require fetching the modules in the proper order. If  
>>> aSuite-2.1.0 is out and it depends on bSuite-2.0.0, and  
>>> bSuite-2.1.0 is also available, bSuite-2.1.0 should not be  
>>> downloaded. If the manifest module-dependencies attribute doesn't  
>>> store these in the right order (not sure), given a list of modules  
>>> to download, the module manager code would have to be able to  
>>> figure this out (maybe it already can?).
>>>
>>> C) Something else? :)
>>>
>>> I think B may be the way to go, and also does not require we do  
>>> 1), even though we may want to do that in the future.
>>>
>>> Thanks,
>>> Derik
>>> _______________________________________________
>>> Kepler-dev mailing list
>>> Kepler-dev at kepler-project.org
>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>
>
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at kepler-project.org
> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/kepler/pipermail/kepler-dev/attachments/20100727/3528f824/attachment-0001.html>


More information about the Kepler-dev mailing list