[kepler-dev] proposed changes to .kepler

Thu Nov 19 22:13:02 PST 2009

Hi Matt,

Yes definitely, the persistent data directory would need to be  
upgradeable between versions, but the transient directory would not.

-Aaron

On Nov 19, 2009, at 9:45 PM, Matt Jones <jones at nceas.ucsb.edu> wrote:

> I concur with Bertram.  We need to do the engineering so that these  
> decisions need not be made by users, so that upgrades occur  
> seamlessly, and so upgrades automatically migrate work that they  
> have done in the older version so that it is compatible in the newer  
> version.  If we are doing our jobs right, users should not end up  
> with work (actors, workflow, metadata, user-defined ontologies,  
> etc.) stranded in an older version, and they should lose no  
> information, including the Kepler instance information, if they  
> completely replace their old install with a new install.
>
> Matt
>
> On Thu, Nov 19, 2009 at 8:27 AM, Aaron Schultz <aschultz at nceas.ucsb.edu 
> > wrote:
> Hi Ben,  I would agree that this is the most sensible approach.   
> Have a user defined directory for persistent data that is version  
> specific and have a .kepler directory that is transient.
>
> -Aaron
>
>
> On Nov 19, 2009, at 8:05 AM, ben leinfelder  
> <leinfelder at nceas.ucsb.edu> wrote:
>
> In response to "What constitutes a version of Kepler" --
> What if we offloaded the burden of figuring that out so that the  
> "correct" .kepler is selected by the person actually running an  
> instance of Kepler?
> Think about how Eclipse uses workspaces: It's up to me to select the  
> appropriate one at startup.
> 1) When I upgrade Eclipse (i.e. Kepler) I can either create a new  
> workspace (.kepler) or opt to "convert" the existing one  
> (upgrade .kepler)
> 2) If I've installed plugins (i.e. Kepler modules) and they are  
> referenced in my workspace (.kepler) but then I attempt to run a  
> version of Eclipse (Kepler) that lacks a plugin that error is my  
> fault - I can either install the plugin (Kepler module) or use a  
> different workspace (.kepler)
>
> We'd of course loose the ability for different installations to  
> share across .kepler directories, but that simplifies things for us.
>
> At the very least, I think it would be a good feature [that  
> potentially saves us a lot of trouble] to specify an  
> alternative .kepler location on startup.
>
> -ben (expanded from Oliver's input)
>
> On Nov 18, 2009, at 7:23 PM, David Welker wrote:
>
> If we want a more sophisticated solution, as Ben is suggesting, I  
> think a good place to start thinking about the data that we interact  
> with Kepler is this chart.
>
> Basically, the problem that Ben is getting at is what if we have  
> persistent data that is inconsistent across versions? There are two  
> ways to handle this problem:
>
> 1.) Make sure that there is no persistent data that is incompatible  
> across versions.
> 2.) Store inconsistent persistent data in different folders.
>
> The issue with solution 2 is that is probably unwieldy. What  
> constitutes a different version of Kepler? I would argue that a  
> different version of Kepler does not constitute merely a release,  
> but rather any unique combination of modules. That is, if I am  
> running Kepler 2.0 with the WRP suite, that is a different version  
> than Kepler 2.0, which is a different version than Kepler 2.0 with  
> the tagging suite, which is different than Kepler 2.0 with the PPOD  
> suite, which is different than Kepler 2.0 with a custom ad hoc  
> suite, which is different than Kepler 1.0 with the WRP suite... You  
> get the idea. The number of different versions is unwieldy. For that  
> reason, I feel that solution 2.) is unworkable (although at one time  
> I did lean towards that myself).
>
> So, we need to implement solution 1. Basically, we should make sure  
> that no input files from .kepler are capable of causing any version  
> of Kepler to crash. To do this, we might include meta-data within  
> the data files that identifies the sort of data that is stored in a  
> particular file or section of a file. Then, a particular version of  
> Kepler could be made to be smart enough to know what sorts of data  
> it can handle and which sort of data it can't. This also has the  
> advantage of avoiding the situation where multiple versions of  
> Kepler are unable to share persistent data that is in fact compatible.
>
> Back in the day, Tim and I made a chart that explores the sort of  
> data that we interact with in Kepler. It might be useful to think  
> about. Check that out here:
>
> https://kepler-project.org/developers/teams/framework/classification-of-persistent-system-state
>
>
> On Nov 18, 2009, at 8:41 PM, ben leinfelder wrote:
>
> A few questions:
> 1) What other files in .kepler will be considered "persistent"
> 2) What is the plan for different versions of Kepler running with  
> the same .kepler?
>   a) do we have another directory branch for each version:
>   ~/.kepler/kepler-2.0/...
>   ~/.kepler/kepler-2.1/...
> 3) When thinking about multiple instances of Kepler on one machine  
> are we aiming to support:
>   a) different versions being run at different times
>   b) the same version being run concurrently
>   c) different versions being run concurrently
>
> Depending on what we support, we'll have to revisit the embedded  
> HSQLDB that is used for the cache and provenance in addition to  
> these directory structures.
>
> -ben
>
> On Nov 18, 2009, at 2:08 PM, Derik Barseghian wrote:
>
> Kepler devs,
>
> After some discussion with Aaron, Ben, Dan, and Chad, I'm wondering  
> if anyone objects to dividing .kepler into two different areas --  
> there would be areas for 1) persistent items (e.g. provenance  
> database) and 2) temporary items (e.g. cache). This would make it  
> more apparent which things could be deleted without serious  
> ramification (temp/), and the idea would be items in peristent/  
> should stick around and be dealt with during kepler upgrades for  
> backwards compatibility.
>
> Also, I think we should utilize the InstanceAuthNamespace in these  
> paths, so that items from different Kepler instances are separated.
>
> This could look like (imagine multiple namespace dirs):
>
> a)
> .kepler/perisistent/gamma.msi.ucsb.edu.OpenAuth.1278/
> .kepler/temp/gamma.msi.ucsb.edu.OpenAuth.1278/
>
> or b)
> .kepler/gamma.msi.ucsb.edu.OpenAuth.1278/persistent/
> .kepler/gamma.msi.ucsb.edu.OpenAuth.1278/temp/
>
> or c)
> .kepler_temp/gamma.msi.ucsb.edu.OpenAuth.1278/
> .kepler_persistent/gamma.msi.ucsb.edu.OpenAuth.1278/
>
> I prefer a).
>
> This partly came out of a discussion of bug 4514. I think the  
> configuration files could be stored beneath these new paths,  
> probably in persistent.
> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=4514
>
> A better solution might be to just have a .kepler to store temporary  
> things, and to store persistent items in an OS-appropriate location,  
> but I think this might be a larger change than we want to take on at  
> the moment, as we try to get 2.0 out of the door.
>
> Let me know what you think,
> Derik
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at kepler-project.org
> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at kepler-project.org
> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>
>
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at kepler-project.org
> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at kepler-project.org
> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/kepler/pipermail/kepler-dev/attachments/20091119/684942f1/attachment.html>