[kepler-dev] Code Freeze Proposal

Thu Dec 17 15:54:05 PST 2009

Hi Chad,

I think we need some common definitions of the different pieces of the 
system here.
I'll try to introduce these definitions as I step through the Save KAR 
process.

The save process for a KAR is:
1)  User initiates the save via a gui action (preferably a subclass of 
ExportArchiveAction)
or a save is programmatically done by some code somewhere
let's call this the "Save Context"
  perhaps it's a right click on an actor on the canvas
  or a user selects "File->Save Archive" from the main menu
  or right clicking on a set of WorkflowRuns in the Workflow Run Manager
  or at the end of an execution of a workflow on a server a program 
saves the results to a KAR file
2) The save context adds some NamedObjs to the SaveKAR object
   Let's call the objects in this list the "Save Initiator Objects"
   My proposal is that we only allow ComponentEntities to be "Save 
Initiator Objects" to make it more
     obvious to the developer how to use the system
   (currently any NamedObj is allowed which has lead to people just 
trying to add everything they want in the KAR at this stage)
3) The SaveKAR object then calls the save method of all KAREntryHandlers 
that have been
   registered in the system by different modules, passing to the save 
method the "Save Initiator List"
   (actually at the moment it passes one LSID at a time to the save 
method, looping through all the EntryHandlers for each object in the 
Initiator List, probably we should just be passing in the whole list of 
initiators to the save method and only calling each EntryHandler once)
4) Each KAREntryHandler generates a list of KAREntries that should be 
saved in the KAR file based
  on this Save Initiator list of objects that it received
  (and based on any information generated in the Save Context, e.g. a 
user specified selection of WorkflowRuns)
5) The SaveKAR object then builds and saves the KAR.  It includes all of 
the KAREntries that were returned
   by the KAREntryHandlers and nothing more. 
   It also adds all the KAREntries to the Cache after the save has 
succeeded (provided the kar was saved in a local repository)

You see here that the "Save Context" and the "EntryHandler" are things 
that exist in whatever module they are defined in and only the
SaveKAR object is in the core.  By going through all the Handlers in 
this way we can have many modules contributing objects to the KAR 
without knowledge of what the other modules are doing.  In some cases 
however the modules are tightly coupled, for example a ROML and a RIO 
are really associated with a WorkflowRun which is in turn associated 
with a Workflow.  So this may bring up the need for a more tiered 
approach that may need to call the EntryHandlers on multiple passes.

Imagine the Save Initiator List starts off with only ComponentEntities 
in it.  Then it runs through all the EntryHandlers on a first pass, 
passing the Save Initiator ComponentEntities to the save methods of the 
Entry Handlers.  Let's call all of the KarEntries returned by this first 
pass through the EntryHandlers, the "Pass 1" Kar Entries.  Now the "Pass 
1" Kar Entries could be passed into the KAREntryHandler save methods on 
the second pass, this would return another set of KarEntries that we'll 
call the "Pass 2" Kar Entries.  You can see here that we're now walking 
down the dependency chain, the first pass had ComponentEntities as the 
input, which returned any objects that were dependent on the 
ComponentEntites, for example the WorkflowRuns, then the second pass had 
the WorkflowRuns as the input which might return the ROML and RIOs 
associated with the WorkflowRuns.  This iterative process would go on 
until the KAREntryHandlers were not returning any more KAREntries and 
all of the dependencies had been added to the KAR.

Aaron

Chad Berkley wrote:
> Hey Aaron,
>
> See my comments below:
>
>
> Aaron Schultz wrote:
>>
>> Hi Chad, here is what I have so far on the KAR specification.
>>
>> What:
>> ----------
>>
>> The Kar system will allow files to be packaged together in a jar file 
>> with a manifest that conforms
>> to the Kar Manifest specification.
>> https://kepler-project.org/developers/teams/framework/kepler-archive-kar/kar-manifest-specification 
>>
>>
>> The Kar system will provide a standard mechanism for module specific 
>> serialization of objects to the Kar file
>> AND to the Kepler Cache system.  In other words every KAR entry has 
>> at least two serialized forms:
>>   a serialized form in the KAR, and a serialized form in the cache.  
>> Currently the serialized form in the
>>   cache must be a Java serialized object.  The non-serialized form of 
>> a KAR entry must provide a mechanism
>>   for storing, at the very least, the LSID information for the object 
>> so that it's Kar Manifest information
>>   can be retrieved from the cache based on the LSID.
>>
>> The Kar system will not depend on the gui.
>>
>> The Kar system will give all modules in the system a chance to 
>> contribute entries to the KAR file on save.
>
> What is controlling the save (i.e. what is the trigger for saving a 
> kar?)  It seems like any module could save the kar along the way, but 
> I'm not sure how the kar system would give each module a chance to add 
> stuff before a save.  Are you proposing a listener system for kar 
> changes?
>
>>
>> The Kar system will provide a standardized way for opening the 
>> entries of a kar file.
>>
>>
>> How:
>> ----------
>>
>> Previously:
>>
>> Module developer adds a KAREntryHandler and registers it in the system.
>>  From a standardized save context in the gui (i.e. an extension of 
>> ExportArchiveAction)
>>  NamedObjs were added to a SaveKAR object, the SaveKar (via 
>> KARBuilder) would then
>>  pass the LSIDs of the NamedObjs from the SaveKAR to all of the 
>> registered KarEntryHandler save methods
>>  and the EntryHandler was responsible for returning an appropriate 
>> KAR entry based on the LSID it received.
>>    This is perhaps too general because any NamedObj can be added to a 
>> SaveKar which makes it
>>    confusing for developers what they should do with it.
>>
>> Proposal:
>>
>> Really what we want to save in a KAR is one or more ComponentEntities 
>> (i.e. workflow or actor) along with any
>>  files that depend on that ComponentEntity in a general sense and in 
>> a specific save context we want
>>  the ability to pick and choose which dependencies get added.
>>
>> So to narrow the scope of the system we say that the SaveKar ONLY 
>> accepts ComponentEntities
>>  and then those ComponentEntities (not just their LSIDs) are passed 
>> to the KAREntryHandlers.
>
> So, the componentEntity gets to choose the objects in the kar?  How 
> would these objects be associated with the component entity?
>
>>  Those handlers then return an array of KAREntries that should be 
>> included in the KAR.
>>  To pick and choose from dependencies the Save context would need to 
>> record what should be saved
>>    (perhaps based on a user's selection in the gui) and then that 
>> list can be used by the EntryHandler
>>    to determine exactly which KAREntries should be returned for a 
>> given ComponentEntity.
>>
>> We need a KarGenericFile class for saving generic files in KARs.  
>> This will properly handle copying the
>>  files out of the KAR and into the cache.  This type of object does 
>> not have serialized forms and acts
>>  simply as a wrapper while in memory.
>>
>> We need to address LSID issues on non-NamedObjs.  An Interface that 
>> includes methods like,
>>  getLSID and getLSIDReferralList and updateLSIDRevision is needed for 
>> anyone wanting to use
>>  non-NamedObjs in a KAR file.
>
> Perhaps a system-wide lookup table for these type of objects would 
> work.    Maybe we should generalize this to all objects, including 
> component entities.  Then the lsid would only be embedded in the 
> component entity when it is saved or somehow transfered (i.e. uploaded 
> to a repository).
>
> I feel like we've kind of struggled with associating object IDs in 
> kepler for a while and it would be nice to come up with a good generic 
> method for doing this.  It would be nice if it could be the starting 
> point for allowing tighter user control over revisioning.
>
> I'd like to see some use case type statement for this.  I think they 
> would be helpful in better defining what we need to do.
>
> Overall, I like the plan though.  It seems pretty straight-forward.
>
> chad
>
>
>>
>>
>> Aaron
>>
>> Chad Berkley wrote:
>>> Hi,
>>>
>>> A few of us chatted this morning about the current bug list and the 
>>> release.  We would like to propose a code freeze target of *January 
>>> 31st, 2010*.  The big bugs that we're still working on include:
>>>
>>> * KAR subsystem bugs:  May need a slight redesign or 
>>> respecification. Needs unit tests built.  Aaron is going to create a 
>>> text document outlining the use cases of the KAR system and I am 
>>> going to try to write unit tests to those specifications.
>>>
>>> * Documentation: figure out how to allow modules to include 
>>> documentation that shows up with the other kepler help docs
>>>
>>> * Module Manager: figure out the current bugs and talk further about 
>>> how modules are integrated into the runtime environment.
>>>
>>> * other release bugs: There are a lot of smaller bugs, some of which 
>>> may be ironed out in the tasks above, but nonetheless, need to be 
>>> addressed.  Sean, Chad and Debi will work on this until others are 
>>> done with their tasks.
>>>
>>> Let us know if there are other concerns to be put in this list.  
>>> Also, please update any bugs assigned to you and close any that have 
>>> been completed.   Thanks,
>>> chad
>>> _______________________________________________
>>> Kepler-dev mailing list
>>> Kepler-dev at kepler-project.org
>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>
>>