[kepler-dev] Code Freeze Proposal

Fri Dec 18 10:14:08 PST 2009

Aaron Schultz wrote:
> 
> Hi Chad,
> 
> I think we need some common definitions of the different pieces of the 
> system here.
> I'll try to introduce these definitions as I step through the Save KAR 
> process.
> 
> The save process for a KAR is:
> 1)  User initiates the save via a gui action (preferably a subclass of 
> ExportArchiveAction)
> or a save is programmatically done by some code somewhere
> let's call this the "Save Context"
>  perhaps it's a right click on an actor on the canvas
>  or a user selects "File->Save Archive" from the main menu
>  or right clicking on a set of WorkflowRuns in the Workflow Run Manager
>  or at the end of an execution of a workflow on a server a program saves 
> the results to a KAR file
> 2) The save context adds some NamedObjs to the SaveKAR object
>   Let's call the objects in this list the "Save Initiator Objects"
>   My proposal is that we only allow ComponentEntities to be "Save 
> Initiator Objects" to make it more
>     obvious to the developer how to use the system
>   (currently any NamedObj is allowed which has lead to people just 
> trying to add everything they want in the KAR at this stage)

This makes sense to me.

> 3) The SaveKAR object then calls the save method of all KAREntryHandlers 
> that have been
>   registered in the system by different modules, passing to the save 
> method the "Save Initiator List"
>   (actually at the moment it passes one LSID at a time to the save 
> method, looping through all the EntryHandlers for each object in the 
> Initiator List, probably we should just be passing in the whole list of 
> initiators to the save method and only calling each EntryHandler once)

Yeah, I think passing the list of objects to save would be better.  Let 
the saving object handle them all.  Less work for the developer using 
the API and less error prone.

> 4) Each KAREntryHandler generates a list of KAREntries that should be 
> saved in the KAR file based
>  on this Save Initiator list of objects that it received
>  (and based on any information generated in the Save Context, e.g. a 
> user specified selection of WorkflowRuns)

Seems pretty straight-forward.

> 5) The SaveKAR object then builds and saves the KAR.  It includes all of 
> the KAREntries that were returned
>   by the KAREntryHandlers and nothing more.   It also adds all the 
> KAREntries to the Cache after the save has succeeded (provided the kar 
> was saved in a local repository)
> 

So what if it's not saved to the local repository?  Wouldn't we still 
want to cache it?  I guess not if the user is saving to some random 
directory, but if they are uploading it to another repository, I think 
we'd want to cache it.

> 
> You see here that the "Save Context" and the "EntryHandler" are things 
> that exist in whatever module they are defined in and only the
> SaveKAR object is in the core.  By going through all the Handlers in 
> this way we can have many modules contributing objects to the KAR 
> without knowledge of what the other modules are doing.  In some cases 
> however the modules are tightly coupled, for example a ROML and a RIO 
> are really associated with a WorkflowRun which is in turn associated 
> with a Workflow.  So this may bring up the need for a more tiered 
> approach that may need to call the EntryHandlers on multiple passes.
> 
> Imagine the Save Initiator List starts off with only ComponentEntities 
> in it.  Then it runs through all the EntryHandlers on a first pass, 
> passing the Save Initiator ComponentEntities to the save methods of the 
> Entry Handlers.  Let's call all of the KarEntries returned by this first 
> pass through the EntryHandlers, the "Pass 1" Kar Entries.  Now the "Pass 
> 1" Kar Entries could be passed into the KAREntryHandler save methods on 
> the second pass, this would return another set of KarEntries that we'll 
> call the "Pass 2" Kar Entries.  You can see here that we're now walking 
> down the dependency chain, the first pass had ComponentEntities as the 
> input, which returned any objects that were dependent on the 
> ComponentEntites, for example the WorkflowRuns, then the second pass had 
> the WorkflowRuns as the input which might return the ROML and RIOs 
> associated with the WorkflowRuns.  This iterative process would go on 
> until the KAREntryHandlers were not returning any more KAREntries and 
> all of the dependencies had been added to the KAR.

Ok, seems like it would work.  What do you think of the efficiency of an 
iterative design?  If there was a large object with lots of 
dependencies, do you think it's efficient enough to do it this way?  I 
just don't want it to start taking a long time to save.  Most of our 
objects are pretty small right now, but I could see larger objects that 
people might want to use, like data or images or whatever.  Or for 
really complex workflows, it seems like there might be a pretty big 
dependency chain.

chad

> 
> Aaron
> 
> 
> Chad Berkley wrote:
>> Hey Aaron,
>>
>> See my comments below:
>>
>>
>> Aaron Schultz wrote:
>>>
>>> Hi Chad, here is what I have so far on the KAR specification.
>>>
>>> What:
>>> ----------
>>>
>>> The Kar system will allow files to be packaged together in a jar file 
>>> with a manifest that conforms
>>> to the Kar Manifest specification.
>>> https://kepler-project.org/developers/teams/framework/kepler-archive-kar/kar-manifest-specification 
>>>
>>>
>>> The Kar system will provide a standard mechanism for module specific 
>>> serialization of objects to the Kar file
>>> AND to the Kepler Cache system.  In other words every KAR entry has 
>>> at least two serialized forms:
>>>   a serialized form in the KAR, and a serialized form in the cache.  
>>> Currently the serialized form in the
>>>   cache must be a Java serialized object.  The non-serialized form of 
>>> a KAR entry must provide a mechanism
>>>   for storing, at the very least, the LSID information for the object 
>>> so that it's Kar Manifest information
>>>   can be retrieved from the cache based on the LSID.
>>>
>>> The Kar system will not depend on the gui.
>>>
>>> The Kar system will give all modules in the system a chance to 
>>> contribute entries to the KAR file on save.
>>
>> What is controlling the save (i.e. what is the trigger for saving a 
>> kar?)  It seems like any module could save the kar along the way, but 
>> I'm not sure how the kar system would give each module a chance to add 
>> stuff before a save.  Are you proposing a listener system for kar 
>> changes?
>>
>>>
>>> The Kar system will provide a standardized way for opening the 
>>> entries of a kar file.
>>>
>>>
>>> How:
>>> ----------
>>>
>>> Previously:
>>>
>>> Module developer adds a KAREntryHandler and registers it in the system.
>>>  From a standardized save context in the gui (i.e. an extension of 
>>> ExportArchiveAction)
>>>  NamedObjs were added to a SaveKAR object, the SaveKar (via 
>>> KARBuilder) would then
>>>  pass the LSIDs of the NamedObjs from the SaveKAR to all of the 
>>> registered KarEntryHandler save methods
>>>  and the EntryHandler was responsible for returning an appropriate 
>>> KAR entry based on the LSID it received.
>>>    This is perhaps too general because any NamedObj can be added to a 
>>> SaveKar which makes it
>>>    confusing for developers what they should do with it.
>>>
>>> Proposal:
>>>
>>> Really what we want to save in a KAR is one or more ComponentEntities 
>>> (i.e. workflow or actor) along with any
>>>  files that depend on that ComponentEntity in a general sense and in 
>>> a specific save context we want
>>>  the ability to pick and choose which dependencies get added.
>>>
>>> So to narrow the scope of the system we say that the SaveKar ONLY 
>>> accepts ComponentEntities
>>>  and then those ComponentEntities (not just their LSIDs) are passed 
>>> to the KAREntryHandlers.
>>
>> So, the componentEntity gets to choose the objects in the kar?  How 
>> would these objects be associated with the component entity?
>>
>>>  Those handlers then return an array of KAREntries that should be 
>>> included in the KAR.
>>>  To pick and choose from dependencies the Save context would need to 
>>> record what should be saved
>>>    (perhaps based on a user's selection in the gui) and then that 
>>> list can be used by the EntryHandler
>>>    to determine exactly which KAREntries should be returned for a 
>>> given ComponentEntity.
>>>
>>> We need a KarGenericFile class for saving generic files in KARs.  
>>> This will properly handle copying the
>>>  files out of the KAR and into the cache.  This type of object does 
>>> not have serialized forms and acts
>>>  simply as a wrapper while in memory.
>>>
>>> We need to address LSID issues on non-NamedObjs.  An Interface that 
>>> includes methods like,
>>>  getLSID and getLSIDReferralList and updateLSIDRevision is needed for 
>>> anyone wanting to use
>>>  non-NamedObjs in a KAR file.
>>
>> Perhaps a system-wide lookup table for these type of objects would 
>> work.    Maybe we should generalize this to all objects, including 
>> component entities.  Then the lsid would only be embedded in the 
>> component entity when it is saved or somehow transfered (i.e. uploaded 
>> to a repository).
>>
>> I feel like we've kind of struggled with associating object IDs in 
>> kepler for a while and it would be nice to come up with a good generic 
>> method for doing this.  It would be nice if it could be the starting 
>> point for allowing tighter user control over revisioning.
>>
>> I'd like to see some use case type statement for this.  I think they 
>> would be helpful in better defining what we need to do.
>>
>> Overall, I like the plan though.  It seems pretty straight-forward.
>>
>> chad
>>
>>
>>>
>>>
>>> Aaron
>>>
>>> Chad Berkley wrote:
>>>> Hi,
>>>>
>>>> A few of us chatted this morning about the current bug list and the 
>>>> release.  We would like to propose a code freeze target of *January 
>>>> 31st, 2010*.  The big bugs that we're still working on include:
>>>>
>>>> * KAR subsystem bugs:  May need a slight redesign or 
>>>> respecification. Needs unit tests built.  Aaron is going to create a 
>>>> text document outlining the use cases of the KAR system and I am 
>>>> going to try to write unit tests to those specifications.
>>>>
>>>> * Documentation: figure out how to allow modules to include 
>>>> documentation that shows up with the other kepler help docs
>>>>
>>>> * Module Manager: figure out the current bugs and talk further about 
>>>> how modules are integrated into the runtime environment.
>>>>
>>>> * other release bugs: There are a lot of smaller bugs, some of which 
>>>> may be ironed out in the tasks above, but nonetheless, need to be 
>>>> addressed.  Sean, Chad and Debi will work on this until others are 
>>>> done with their tasks.
>>>>
>>>> Let us know if there are other concerns to be put in this list.  
>>>> Also, please update any bugs assigned to you and close any that have 
>>>> been completed.   Thanks,
>>>> chad
>>>> _______________________________________________
>>>> Kepler-dev mailing list
>>>> Kepler-dev at kepler-project.org
>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>>
>>>
>