[kepler-dev] KARs and module dependencies

Ilkay Altintas altintas at sdsc.edu
Wed Aug 4 05:37:31 PDT 2010


Derik,

I agree with Paul that if we go with option#1, we need to provide a  
utility to migrate the old files into the new kar. It is also not the  
most community-friendly approach.

I don't yet have a string opinion since I don't know the cost of  
trying to go with option#2. Why is it unclear or hard to analyze the  
amount of work it requires?

Thanks for looking into this. It is an important bug!
-ilkay

-- 
Ilkay ALTINTAS
Deputy Coordinator for Research, San Diego Supercomputer Center (SDSC)
Lab Director, Scientific Workflow Automation Technologies (SWAT @ SDSC)

University of California, San Diego
9500 Gilman Drive, MC: 0505  La Jolla, CA  92093-0505
Phone: (858) 210-5877                     Fax: (858) 534-8303
Web: http://users.sdsc.edu/~altintas
Skype: ilkay.altintas






On Aug 4, 2010, at 5:20 AM, Paul Edward Allen wrote:

> I am very much in favor of Kepler continuing to support reading/ 
> saving workflows as plain XML files. There are thousands of Kepler  
> workflows out there that are plain XML, and at the very least you'd  
> have to come up with a mechanism/utility to convert those to KAR  
> files if the decision was made for Kepler not to support plain XML.
>
> Please remember those of us who are using non-Java GUIs to create  
> workflows. Thanks.
>
> -Paul
>
>> -----Original Message-----
>> From: kepler-dev-bounces at kepler-project.org [mailto:kepler-dev-
>> bounces at kepler-project.org] On Behalf Of Derik Barseghian
>> Sent: Monday, August 02, 2010 8:16 PM
>> To: Kepler Developers
>> Subject: Re: [kepler-dev] KARs and module dependencies
>>
>> Hi all,
>>
>> I've implemented most of what was discussed below. However, this
>> solution doesn't cover the case of workflows saved as just xml --  
>> i.e.
>> not in a KAR w/ a manifest that lists module-dependencies. A workflow
>> can be created in a suite that add entries to the moml (like  
>> yourActor
>> module, or reporting or provenance), and when you attempt to open  
>> this
>> workflow in vanilla, you get NPEs re: elements missing, instead of a
>> prompt asking you to download the missing modules.
>>
>> Two solutions come to mind:
>> 1) No longer allow saving and opening workflows as xml, always save  
>> to
>> a KAR
>> 	pro: Simplifies our GUI wrt saving and opening.
>> 	con: Negatively affects those currently using plain workflow files.
>> Would probably require a utility to create a KAR from a workflow. At
>> least some refactoring required to remove options from command line
>> that utilize plain workflow files.
>>
>> 2) Move or keep a copy of module-dependencies in the workflow itself,
>> and refactor to check these before actually attempting to open the
>> workflow.
>> 	pro: things continue to work similarly, just additional messages  
>> when
>> you lack modules.
>> 	con: It's not clear to me yet how much additional work this
>> represents
>>
>> Please let me know your thoughts and additional pros/cons as you see
>> them.
>> Thanks,
>> Derik
>>
>> On Jul 27, 2010, at 12:42 AM, David Welker wrote:
>>
>>> Hi Derik,
>>>
>>> I meant to send you this earlier, but bad weather prevented the
>>> satelite-based internet connection that I am using from working.
>>>
>>> The way I see it, there are two major use cases here: (1) The user
>>> wants to replicate someone else's work (more rare, but useful,
>>> especially to compare someone elses algorithm with one's own) or (2)
>>> the user is primarily interested in developing a workflow for their
>>> own research (the more common case).
>>>
>>> There are at least two observations to be made: (1) To be assured of
>>> proper replication, it would be extremely helpful to have the exact
>>> module list under which the workflow was actually run, including
>>> precise version numbers and (2) for development and extension,
>>> users do not really care what module provides the necessary services
>>> for the workflow; they only care that the necessary services are
>>> provided from some module. Of course, the same user can be
>>> interested in both replication and development, but in that case,
>>> they would only be interested in specific module information while
>>> they are concerned with replication but would not otherwise care
>>> about that or desire to be limited by that during development.
>>>
>>> These observations have a couple of coceptual implications for the
>>> question of versioning workflows. Most fundamentally, a workflow
>>> developer will not be aware of the interests and needs of future
>>> users of their workflow. Therefore, they would not be in a good
>>> position to specify things such as a range of "compatible" module
>>> versions that are necessary. They do not know whether future users
>>> wish to engage in replication with the same and/or different data or
>>> whether they wish to engage in further development if the algorithm
>>> embedded in the workflow. Second, as has been mentioned above, when
>>> extended development of the workflow algorithm is contemplated,
>>> future users do not care which modules provide the services their
>>> workflow needs nor do they want to be limited by a specification
>>> that suggests that a particular module (or range of versions) is
>>> necessary. Not only us a workflow developer not in a good position
>>> to assert that a particular module (or range of versions) is
>>> necessary, it would be an error to even attempt to do do. When it
>>> comes to future development, there is no such thing as a necessary
>>> module.
>>>
>>> These observations also haves a few possible implementation
>>> implications. First, whenever a KAR is saved, a list of precise
>>> module versions should be saved right along with it. There should be
>>> a corresponding menu option called "retrieve modules for
>>> replication" that will be available for that workflow when it is
>>> later loaded. This option should be grayed out for older workflows
>>> that did not save precise version information. Second, the concept
>>> of different levels of "strictness of compliance" should be ditched.
>>> Either the user is concerned about replication, which requires
>>> precise version information to proceed with full confidence, or
>>> specific modules do not matter at all. In other words, there should
>>> be only one level of compliance for replication -- very strict.
>>> Third, in the future, in addition to a precise module list, we
>>> should save a list of services used by the workflow. This would
>>> provide developers extending an existing workflow with guidance
>>> concerning what services the workflow needs to run (but of course
>>> what services are really necessary and what constitutes a "working"
>>> workflow is something that only future developers can say with
>>> authority). Kepler does not currently support services, but I think
>>> we should provide such support in the near future.
>>>
>>> The bottom-line implications I see for your current plan is that I
>>> do not think a workflow/KAR should ever REQUIRE a level of
>>> strictness. Instead, if precise information is available, there
>>> should be a replication OPTION. That option should be completely
>>> strict about versions retrieved. On the other hand, If the user does
>>> not actively indicate a desire to replicate, the workflow should try
>>> to open without warnings to the user. That is all that I think
>>> should be done for now.
>>>
>>> Later on, if we provide explicit support for the concept of
>>> services, we will be in a position to warn the user if a service we
>>> think is probably necessary (but who are we to say for sure?) does
>>> not appear to be provided by any of the modules that are currently
>>> active. But for now, without support for services, I do not think
>>> any warnings are in order.
>>>
>>> I hope this perspective is helpful. Sorry for the delay, but the
>>> weather ruined my Internet connection and I was loathe to compose
>>> such an involved email using my iPhone. But I have finally bit the
>>> bullet, so here it is. Of course, as the future user of the
>>> "service" provided by this email, only you are in a position to
>>> determine whether the ideas expressed herein are useful to you. At
>>> least if the principle regarding who should have authority to
>>> determine what services are necessary and/or useful (future
>>> developers) is correct. =)
>>>
>>> Sent from my iPhone
>>>
>>> On Jul 23, 2010, at 8:40 PM, Derik Barseghian <barseghian at nceas.ucsb.edu
>>>> wrote:
>>>
>>>> Thanks very much Ben and Dan for the discussions on this topic.
>>>>
>>>> What Ben lists is essentially my plan, and I'm moving forward with
>>>> implementation. To summarize some additional points:
>>>>
>>>> - This plan requires module repositories maintain all published
>>>> versions of a module.
>>>> I will:
>>>> - Include the complete "currently active" module list (with version
>>>> numbers) in the module-dependencies KAR attribute.
>>>> - Eliminate the dependsOnModule KAR entry attribute, since it will
>>>> no longer be used, updating the KAR version, schema, etc in the
>>>> process.
>>>> - Remove any code that was inserting a dependsOnModules value into
>>>> the workflow.
>>>> - Create a new user preference for the strictness of KAR
>>>> compliance. I'm still determining the different modes. But e.g. if
>>>> the user is set to 'very strict', in order to open a KAR, they will
>>>> be prompted to import and restart with the exact same module set
>>>> (with matching versions) as that which created the KAR. Essentially
>>>> the more strict this setting, the more warnings a user will get
>>>> when trying to open KARs.
>>>>
>>>> Derik
>>>>
>>>>
>>>> On Jul 22, 2010, at 4:35 PM, ben leinfelder wrote:
>>>>
>>>>> After bouncing these ideas around with Derik, here's a hybrid
>>>>> approach to handling module dependencies:
>>>>>
>>>>> -begin including module version number when writing module
>>>>> dependencies in a KAR file
>>>>> -when opening [possibly older] KARs that require non-vanilla
>>>>> modules:
>>>>>   -if the module version is not specified, then fetch the latest
>>>>> release of that module
>>>>>   -if the module version is specified, then fetch that version*
>>>>> *In practice we'll probably want Strict and Lax modes so that we
>>>>> aren't constantly swapping out modules each time we open a
>>>>> different KAR [for minor version changes].
>>>>>
>>>>> Additional notes:
>>>>> -Development on the trunk - where there is no module version -
>>>>> should also be considered a special case that does not trigger
>>>>> module download. It'd be nice to resolve the dependency using
>>>>> modules from the trunk if we are running from it.
>>>>> -We don't want to inadvertently downgrade someone who is opening
>>>>> an older KAR but wants to work with a newer version of the module.
>>>>> We need to be able to save an older KAR with newer module  
>>>>> features.
>>>>> -We do want to allow a downgrade in cases where old features need
>>>>> to be used, say when reproducing workflow run results from a KAR -
>>>>> the whole point is that we reproduce them exactly - "archive"
>>>>> being the operative word.
>>>>>
>>>>> Hope I captured most of our discussion!
>>>>> -ben
>>>>>
>>>>> On Jul 22, 2010, at 2:34 PM, Derik Barseghian wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I've found a bug that's blocking the reporting-2.0 release: when
>>>>>> you're in vanilla kepler, and have a KAR in MyWorkflows that was
>>>>>> created with modules you don't have installed, the KAR 'Import
>>>>>> Dependent Modules' context menu item doesn't work:
>> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5099
>>>>>> I can take this bug if Kepler/CORE members can clarify what the
>>>>>> desired behavior is -- a significant design decision is made
>>>>>> depending on how this is fixed.
>>>>>>
>>>>>> The issue is basically that in a KAR manifest there are module-
>>>>>> dependencies and dependsOnModules attributes, but the values
>>>>>> stored are module names without version number. The 'Import
>>>>>> Dependent Modules' action attempts to use these values to
>>>>>> download dependencies, but fails because it tries to e.g. fetch
>>>>>> provenance.zip instead of provenance-2.0.0.zip.
>>>>>>
>>>>>> Two issues come to mind:
>>>>>>
>>>>>> 1) When created, should a KAR manifest store exact versions of
>>>>>> module dependencies instead of just module names?
>>>>>>
>>>>>> 2) Which versions of modules should the 'Import Dependent
>>>>>> Modules' attempt to fetch and install?
>>>>>>
>>>>>> A) If we start storing module versions in the manifest, exactly
>>>>>> those? This will not work if a module version is no longer
>>>>>> available, so we would likely want to keep all old versions of
>>>>>> modules in the released area of our repository (is this already
>>>>>> the plan?).
>>>>>>
>>>>>> B) The newest? This means requiring modules remain backward
>>>>>> compatible with all versions of their KAR artifacts. This would
>>>>>> also require fetching the modules in the proper order. If
>>>>>> aSuite-2.1.0 is out and it depends on bSuite-2.0.0, and
>>>>>> bSuite-2.1.0 is also available, bSuite-2.1.0 should not be
>>>>>> downloaded. If the manifest module-dependencies attribute doesn't
>>>>>> store these in the right order (not sure), given a list of
>>>>>> modules to download, the module manager code would have to be
>>>>>> able to figure this out (maybe it already can?).
>>>>>>
>>>>>> C) Something else? :)
>>>>>>
>>>>>> I think B may be the way to go, and also does not require we do
>>>>>> 1), even though we may want to do that in the future.
>>>>>>
>>>>>> Thanks,
>>>>>> Derik
>>>>>>
>>
>>
>> _______________________________________________
>> Kepler-dev mailing list
>> Kepler-dev at kepler-project.org
>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at kepler-project.org
> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/kepler/pipermail/kepler-dev/attachments/20100804/e4e16c2e/attachment-0001.html>


More information about the Kepler-dev mailing list