[kepler-dev] KARs and module dependencies

Derik Barseghian barseghian at nceas.ucsb.edu
Mon Aug 2 17:16:17 PDT 2010


Hi all,

I've implemented most of what was discussed below. However, this  
solution doesn't cover the case of workflows saved as just xml -- i.e.  
not in a KAR w/ a manifest that lists module-dependencies. A workflow  
can be created in a suite that add entries to the moml (like yourActor  
module, or reporting or provenance), and when you attempt to open this  
workflow in vanilla, you get NPEs re: elements missing, instead of a  
prompt asking you to download the missing modules.

Two solutions come to mind:
1) No longer allow saving and opening workflows as xml, always save to  
a KAR
	pro: Simplifies our GUI wrt saving and opening.
	con: Negatively affects those currently using plain workflow files.  
Would probably require a utility to create a KAR from a workflow. At  
least some refactoring required to remove options from command line  
that utilize plain workflow files.

2) Move or keep a copy of module-dependencies in the workflow itself,  
and refactor to check these before actually attempting to open the  
workflow.
	pro: things continue to work similarly, just additional messages when  
you lack modules.
	con: It's not clear to me yet how much additional work this represents

Please let me know your thoughts and additional pros/cons as you see  
them.
Thanks,
Derik

On Jul 27, 2010, at 12:42 AM, David Welker wrote:

> Hi Derik,
>
> I meant to send you this earlier, but bad weather prevented the  
> satelite-based internet connection that I am using from working.
>
> The way I see it, there are two major use cases here: (1) The user  
> wants to replicate someone else's work (more rare, but useful,  
> especially to compare someone elses algorithm with one's own) or (2)  
> the user is primarily interested in developing a workflow for their  
> own research (the more common case).
>
> There are at least two observations to be made: (1) To be assured of  
> proper replication, it would be extremely helpful to have the exact  
> module list under which the workflow was actually run, including  
> precise version numbers and (2) for development and extension,   
> users do not really care what module provides the necessary services  
> for the workflow; they only care that the necessary services are  
> provided from some module. Of course, the same user can be  
> interested in both replication and development, but in that case,  
> they would only be interested in specific module information while  
> they are concerned with replication but would not otherwise care  
> about that or desire to be limited by that during development.
>
> These observations have a couple of coceptual implications for the  
> question of versioning workflows. Most fundamentally, a workflow  
> developer will not be aware of the interests and needs of future  
> users of their workflow. Therefore, they would not be in a good  
> position to specify things such as a range of "compatible" module  
> versions that are necessary. They do not know whether future users  
> wish to engage in replication with the same and/or different data or  
> whether they wish to engage in further development if the algorithm  
> embedded in the workflow. Second, as has been mentioned above, when  
> extended development of the workflow algorithm is contemplated,  
> future users do not care which modules provide the services their  
> workflow needs nor do they want to be limited by a specification  
> that suggests that a particular module (or range of versions) is  
> necessary. Not only us a workflow developer not in a good position  
> to assert that a particular module (or range of versions) is  
> necessary, it would be an error to even attempt to do do. When it  
> comes to future development, there is no such thing as a necessary  
> module.
>
> These observations also haves a few possible implementation  
> implications. First, whenever a KAR is saved, a list of precise  
> module versions should be saved right along with it. There should be  
> a corresponding menu option called "retrieve modules for  
> replication" that will be available for that workflow when it is  
> later loaded. This option should be grayed out for older workflows  
> that did not save precise version information. Second, the concept  
> of different levels of "strictness of compliance" should be ditched.  
> Either the user is concerned about replication, which requires  
> precise version information to proceed with full confidence, or  
> specific modules do not matter at all. In other words, there should  
> be only one level of compliance for replication -- very strict.  
> Third, in the future, in addition to a precise module list, we  
> should save a list of services used by the workflow. This would  
> provide developers extending an existing workflow with guidance  
> concerning what services the workflow needs to run (but of course  
> what services are really necessary and what constitutes a "working"  
> workflow is something that only future developers can say with  
> authority). Kepler does not currently support services, but I think  
> we should provide such support in the near future.
>
> The bottom-line implications I see for your current plan is that I  
> do not think a workflow/KAR should ever REQUIRE a level of  
> strictness. Instead, if precise information is available, there  
> should be a replication OPTION. That option should be completely  
> strict about versions retrieved. On the other hand, If the user does  
> not actively indicate a desire to replicate, the workflow should try  
> to open without warnings to the user. That is all that I think  
> should be done for now.
>
> Later on, if we provide explicit support for the concept of  
> services, we will be in a position to warn the user if a service we  
> think is probably necessary (but who are we to say for sure?) does  
> not appear to be provided by any of the modules that are currently  
> active. But for now, without support for services, I do not think  
> any warnings are in order.
>
> I hope this perspective is helpful. Sorry for the delay, but the  
> weather ruined my Internet connection and I was loathe to compose  
> such an involved email using my iPhone. But I have finally bit the  
> bullet, so here it is. Of course, as the future user of the  
> "service" provided by this email, only you are in a position to  
> determine whether the ideas expressed herein are useful to you. At  
> least if the principle regarding who should have authority to  
> determine what services are necessary and/or useful (future  
> developers) is correct. =)
>
> Sent from my iPhone
>
> On Jul 23, 2010, at 8:40 PM, Derik Barseghian <barseghian at nceas.ucsb.edu 
> > wrote:
>
>> Thanks very much Ben and Dan for the discussions on this topic.
>>
>> What Ben lists is essentially my plan, and I'm moving forward with  
>> implementation. To summarize some additional points:
>>
>> - This plan requires module repositories maintain all published  
>> versions of a module.
>> I will:
>> - Include the complete "currently active" module list (with version  
>> numbers) in the module-dependencies KAR attribute.
>> - Eliminate the dependsOnModule KAR entry attribute, since it will  
>> no longer be used, updating the KAR version, schema, etc in the  
>> process.
>> - Remove any code that was inserting a dependsOnModules value into  
>> the workflow.
>> - Create a new user preference for the strictness of KAR  
>> compliance. I'm still determining the different modes. But e.g. if  
>> the user is set to 'very strict', in order to open a KAR, they will  
>> be prompted to import and restart with the exact same module set  
>> (with matching versions) as that which created the KAR. Essentially  
>> the more strict this setting, the more warnings a user will get  
>> when trying to open KARs.
>>
>> Derik
>>
>>
>> On Jul 22, 2010, at 4:35 PM, ben leinfelder wrote:
>>
>>> After bouncing these ideas around with Derik, here's a hybrid  
>>> approach to handling module dependencies:
>>>
>>> -begin including module version number when writing module  
>>> dependencies in a KAR file
>>> -when opening [possibly older] KARs that require non-vanilla  
>>> modules:
>>>    -if the module version is not specified, then fetch the latest  
>>> release of that module
>>>    -if the module version is specified, then fetch that version*
>>> *In practice we'll probably want Strict and Lax modes so that we  
>>> aren't constantly swapping out modules each time we open a  
>>> different KAR [for minor version changes].
>>>
>>> Additional notes:
>>> -Development on the trunk - where there is no module version -  
>>> should also be considered a special case that does not trigger  
>>> module download. It'd be nice to resolve the dependency using  
>>> modules from the trunk if we are running from it.
>>> -We don't want to inadvertently downgrade someone who is opening  
>>> an older KAR but wants to work with a newer version of the module.  
>>> We need to be able to save an older KAR with newer module features.
>>> -We do want to allow a downgrade in cases where old features need  
>>> to be used, say when reproducing workflow run results from a KAR -  
>>> the whole point is that we reproduce them exactly - "archive"  
>>> being the operative word.
>>>
>>> Hope I captured most of our discussion!
>>> -ben
>>>
>>> On Jul 22, 2010, at 2:34 PM, Derik Barseghian wrote:
>>>
>>>> Hi all,
>>>>
>>>> I've found a bug that's blocking the reporting-2.0 release: when  
>>>> you're in vanilla kepler, and have a KAR in MyWorkflows that was  
>>>> created with modules you don't have installed, the KAR 'Import  
>>>> Dependent Modules' context menu item doesn't work: http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5099
>>>> I can take this bug if Kepler/CORE members can clarify what the  
>>>> desired behavior is -- a significant design decision is made  
>>>> depending on how this is fixed.
>>>>
>>>> The issue is basically that in a KAR manifest there are module- 
>>>> dependencies and dependsOnModules attributes, but the values  
>>>> stored are module names without version number. The 'Import  
>>>> Dependent Modules' action attempts to use these values to  
>>>> download dependencies, but fails because it tries to e.g. fetch  
>>>> provenance.zip instead of provenance-2.0.0.zip.
>>>>
>>>> Two issues come to mind:
>>>>
>>>> 1) When created, should a KAR manifest store exact versions of  
>>>> module dependencies instead of just module names?
>>>>
>>>> 2) Which versions of modules should the 'Import Dependent  
>>>> Modules' attempt to fetch and install?
>>>>
>>>> A) If we start storing module versions in the manifest, exactly  
>>>> those? This will not work if a module version is no longer  
>>>> available, so we would likely want to keep all old versions of  
>>>> modules in the released area of our repository (is this already  
>>>> the plan?).
>>>>
>>>> B) The newest? This means requiring modules remain backward  
>>>> compatible with all versions of their KAR artifacts. This would  
>>>> also require fetching the modules in the proper order. If  
>>>> aSuite-2.1.0 is out and it depends on bSuite-2.0.0, and  
>>>> bSuite-2.1.0 is also available, bSuite-2.1.0 should not be  
>>>> downloaded. If the manifest module-dependencies attribute doesn't  
>>>> store these in the right order (not sure), given a list of  
>>>> modules to download, the module manager code would have to be  
>>>> able to figure this out (maybe it already can?).
>>>>
>>>> C) Something else? :)
>>>>
>>>> I think B may be the way to go, and also does not require we do  
>>>> 1), even though we may want to do that in the future.
>>>>
>>>> Thanks,
>>>> Derik
>>>>




More information about the Kepler-dev mailing list