[kepler-dev] proposed changes to .kepler

Matt Jones jones at nceas.ucsb.edu
Thu Nov 19 21:45:14 PST 2009


I concur with Bertram.  We need to do the engineering so that these
decisions need not be made by users, so that upgrades occur seamlessly, and
so upgrades automatically migrate work that they have done in the older
version so that it is compatible in the newer version.  If we are doing our
jobs right, users should not end up with work (actors, workflow, metadata,
user-defined ontologies, etc.) stranded in an older version, and they should
lose no information, including the Kepler instance information, if they
completely replace their old install with a new install.

Matt

On Thu, Nov 19, 2009 at 8:27 AM, Aaron Schultz <aschultz at nceas.ucsb.edu>wrote:

> Hi Ben,  I would agree that this is the most sensible approach.  Have a
> user defined directory for persistent data that is version specific and have
> a .kepler directory that is transient.
>
> -Aaron
>
>
> On Nov 19, 2009, at 8:05 AM, ben leinfelder <leinfelder at nceas.ucsb.edu>
> wrote:
>
>  In response to "What constitutes a version of Kepler" --
>> What if we offloaded the burden of figuring that out so that the "correct"
>> .kepler is selected by the person actually running an instance of Kepler?
>> Think about how Eclipse uses workspaces: It's up to me to select the
>> appropriate one at startup.
>> 1) When I upgrade Eclipse (i.e. Kepler) I can either create a new
>> workspace (.kepler) or opt to "convert" the existing one (upgrade .kepler)
>> 2) If I've installed plugins (i.e. Kepler modules) and they are referenced
>> in my workspace (.kepler) but then I attempt to run a version of Eclipse
>> (Kepler) that lacks a plugin that error is my fault - I can either install
>> the plugin (Kepler module) or use a different workspace (.kepler)
>>
>> We'd of course loose the ability for different installations to share
>> across .kepler directories, but that simplifies things for us.
>>
>> At the very least, I think it would be a good feature [that potentially
>> saves us a lot of trouble] to specify an alternative .kepler location on
>> startup.
>>
>> -ben (expanded from Oliver's input)
>>
>> On Nov 18, 2009, at 7:23 PM, David Welker wrote:
>>
>>  If we want a more sophisticated solution, as Ben is suggesting, I think a
>>> good place to start thinking about the data that we interact with Kepler is
>>> this chart.
>>>
>>> Basically, the problem that Ben is getting at is what if we have
>>> persistent data that is inconsistent across versions? There are two ways to
>>> handle this problem:
>>>
>>> 1.) Make sure that there is no persistent data that is incompatible
>>> across versions.
>>> 2.) Store inconsistent persistent data in different folders.
>>>
>>> The issue with solution 2 is that is probably unwieldy. What constitutes
>>> a different version of Kepler? I would argue that a different version of
>>> Kepler does not constitute merely a release, but rather any unique
>>> combination of modules. That is, if I am running Kepler 2.0 with the WRP
>>> suite, that is a different version than Kepler 2.0, which is a different
>>> version than Kepler 2.0 with the tagging suite, which is different than
>>> Kepler 2.0 with the PPOD suite, which is different than Kepler 2.0 with a
>>> custom ad hoc suite, which is different than Kepler 1.0 with the WRP
>>> suite... You get the idea. The number of different versions is unwieldy. For
>>> that reason, I feel that solution 2.) is unworkable (although at one time I
>>> did lean towards that myself).
>>>
>>> So, we need to implement solution 1. Basically, we should make sure that
>>> no input files from .kepler are capable of causing any version of Kepler to
>>> crash. To do this, we might include meta-data within the data files that
>>> identifies the sort of data that is stored in a particular file or section
>>> of a file. Then, a particular version of Kepler could be made to be smart
>>> enough to know what sorts of data it can handle and which sort of data it
>>> can't. This also has the advantage of avoiding the situation where multiple
>>> versions of Kepler are unable to share persistent data that is in fact
>>> compatible.
>>>
>>> Back in the day, Tim and I made a chart that explores the sort of data
>>> that we interact with in Kepler. It might be useful to think about. Check
>>> that out here:
>>>
>>>
>>> https://kepler-project.org/developers/teams/framework/classification-of-persistent-system-state
>>>
>>>
>>> On Nov 18, 2009, at 8:41 PM, ben leinfelder wrote:
>>>
>>>  A few questions:
>>>> 1) What other files in .kepler will be considered "persistent"
>>>> 2) What is the plan for different versions of Kepler running with the
>>>> same .kepler?
>>>>   a) do we have another directory branch for each version:
>>>>   ~/.kepler/kepler-2.0/...
>>>>   ~/.kepler/kepler-2.1/...
>>>> 3) When thinking about multiple instances of Kepler on one machine are
>>>> we aiming to support:
>>>>   a) different versions being run at different times
>>>>   b) the same version being run concurrently
>>>>   c) different versions being run concurrently
>>>>
>>>> Depending on what we support, we'll have to revisit the embedded HSQLDB
>>>> that is used for the cache and provenance in addition to these directory
>>>> structures.
>>>>
>>>> -ben
>>>>
>>>> On Nov 18, 2009, at 2:08 PM, Derik Barseghian wrote:
>>>>
>>>>  Kepler devs,
>>>>>
>>>>> After some discussion with Aaron, Ben, Dan, and Chad, I'm wondering if
>>>>> anyone objects to dividing .kepler into two different areas -- there would
>>>>> be areas for 1) persistent items (e.g. provenance database) and 2) temporary
>>>>> items (e.g. cache). This would make it more apparent which things could be
>>>>> deleted without serious ramification (temp/), and the idea would be items in
>>>>> peristent/ should stick around and be dealt with during kepler upgrades for
>>>>> backwards compatibility.
>>>>>
>>>>> Also, I think we should utilize the InstanceAuthNamespace in these
>>>>> paths, so that items from different Kepler instances are separated.
>>>>>
>>>>> This could look like (imagine multiple namespace dirs):
>>>>>
>>>>> a)
>>>>> .kepler/perisistent/gamma.msi.ucsb.edu.OpenAuth.1278/
>>>>> .kepler/temp/gamma.msi.ucsb.edu.OpenAuth.1278/
>>>>>
>>>>> or b)
>>>>> .kepler/gamma.msi.ucsb.edu.OpenAuth.1278/persistent/
>>>>> .kepler/gamma.msi.ucsb.edu.OpenAuth.1278/temp/
>>>>>
>>>>> or c)
>>>>> .kepler_temp/gamma.msi.ucsb.edu.OpenAuth.1278/
>>>>> .kepler_persistent/gamma.msi.ucsb.edu.OpenAuth.1278/
>>>>>
>>>>> I prefer a).
>>>>>
>>>>> This partly came out of a discussion of bug 4514. I think the
>>>>> configuration files could be stored beneath these new paths, probably in
>>>>> persistent.
>>>>> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=4514
>>>>>
>>>>> A better solution might be to just have a .kepler to store temporary
>>>>> things, and to store persistent items in an OS-appropriate location, but I
>>>>> think this might be a larger change than we want to take on at the moment,
>>>>> as we try to get 2.0 out of the door.
>>>>>
>>>>> Let me know what you think,
>>>>> Derik
>>>>> _______________________________________________
>>>>> Kepler-dev mailing list
>>>>> Kepler-dev at kepler-project.org
>>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>>>
>>>>
>>>> _______________________________________________
>>>> Kepler-dev mailing list
>>>> Kepler-dev at kepler-project.org
>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>>
>>>
>>>
>> _______________________________________________
>> Kepler-dev mailing list
>> Kepler-dev at kepler-project.org
>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>
>>  _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at kepler-project.org
> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/kepler/pipermail/kepler-dev/attachments/20091119/5bdc6766/attachment-0001.html>


More information about the Kepler-dev mailing list