[kepler-dev] Code Freeze Proposal
Chad Berkley
berkley at nceas.ucsb.edu
Fri Dec 18 10:14:08 PST 2009
Aaron Schultz wrote:
>
> Hi Chad,
>
> I think we need some common definitions of the different pieces of the
> system here.
> I'll try to introduce these definitions as I step through the Save KAR
> process.
>
> The save process for a KAR is:
> 1) User initiates the save via a gui action (preferably a subclass of
> ExportArchiveAction)
> or a save is programmatically done by some code somewhere
> let's call this the "Save Context"
> perhaps it's a right click on an actor on the canvas
> or a user selects "File->Save Archive" from the main menu
> or right clicking on a set of WorkflowRuns in the Workflow Run Manager
> or at the end of an execution of a workflow on a server a program saves
> the results to a KAR file
> 2) The save context adds some NamedObjs to the SaveKAR object
> Let's call the objects in this list the "Save Initiator Objects"
> My proposal is that we only allow ComponentEntities to be "Save
> Initiator Objects" to make it more
> obvious to the developer how to use the system
> (currently any NamedObj is allowed which has lead to people just
> trying to add everything they want in the KAR at this stage)
This makes sense to me.
> 3) The SaveKAR object then calls the save method of all KAREntryHandlers
> that have been
> registered in the system by different modules, passing to the save
> method the "Save Initiator List"
> (actually at the moment it passes one LSID at a time to the save
> method, looping through all the EntryHandlers for each object in the
> Initiator List, probably we should just be passing in the whole list of
> initiators to the save method and only calling each EntryHandler once)
Yeah, I think passing the list of objects to save would be better. Let
the saving object handle them all. Less work for the developer using
the API and less error prone.
> 4) Each KAREntryHandler generates a list of KAREntries that should be
> saved in the KAR file based
> on this Save Initiator list of objects that it received
> (and based on any information generated in the Save Context, e.g. a
> user specified selection of WorkflowRuns)
Seems pretty straight-forward.
> 5) The SaveKAR object then builds and saves the KAR. It includes all of
> the KAREntries that were returned
> by the KAREntryHandlers and nothing more. It also adds all the
> KAREntries to the Cache after the save has succeeded (provided the kar
> was saved in a local repository)
>
So what if it's not saved to the local repository? Wouldn't we still
want to cache it? I guess not if the user is saving to some random
directory, but if they are uploading it to another repository, I think
we'd want to cache it.
>
> You see here that the "Save Context" and the "EntryHandler" are things
> that exist in whatever module they are defined in and only the
> SaveKAR object is in the core. By going through all the Handlers in
> this way we can have many modules contributing objects to the KAR
> without knowledge of what the other modules are doing. In some cases
> however the modules are tightly coupled, for example a ROML and a RIO
> are really associated with a WorkflowRun which is in turn associated
> with a Workflow. So this may bring up the need for a more tiered
> approach that may need to call the EntryHandlers on multiple passes.
>
> Imagine the Save Initiator List starts off with only ComponentEntities
> in it. Then it runs through all the EntryHandlers on a first pass,
> passing the Save Initiator ComponentEntities to the save methods of the
> Entry Handlers. Let's call all of the KarEntries returned by this first
> pass through the EntryHandlers, the "Pass 1" Kar Entries. Now the "Pass
> 1" Kar Entries could be passed into the KAREntryHandler save methods on
> the second pass, this would return another set of KarEntries that we'll
> call the "Pass 2" Kar Entries. You can see here that we're now walking
> down the dependency chain, the first pass had ComponentEntities as the
> input, which returned any objects that were dependent on the
> ComponentEntites, for example the WorkflowRuns, then the second pass had
> the WorkflowRuns as the input which might return the ROML and RIOs
> associated with the WorkflowRuns. This iterative process would go on
> until the KAREntryHandlers were not returning any more KAREntries and
> all of the dependencies had been added to the KAR.
Ok, seems like it would work. What do you think of the efficiency of an
iterative design? If there was a large object with lots of
dependencies, do you think it's efficient enough to do it this way? I
just don't want it to start taking a long time to save. Most of our
objects are pretty small right now, but I could see larger objects that
people might want to use, like data or images or whatever. Or for
really complex workflows, it seems like there might be a pretty big
dependency chain.
chad
>
> Aaron
>
>
> Chad Berkley wrote:
>> Hey Aaron,
>>
>> See my comments below:
>>
>>
>> Aaron Schultz wrote:
>>>
>>> Hi Chad, here is what I have so far on the KAR specification.
>>>
>>> What:
>>> ----------
>>>
>>> The Kar system will allow files to be packaged together in a jar file
>>> with a manifest that conforms
>>> to the Kar Manifest specification.
>>> https://kepler-project.org/developers/teams/framework/kepler-archive-kar/kar-manifest-specification
>>>
>>>
>>> The Kar system will provide a standard mechanism for module specific
>>> serialization of objects to the Kar file
>>> AND to the Kepler Cache system. In other words every KAR entry has
>>> at least two serialized forms:
>>> a serialized form in the KAR, and a serialized form in the cache.
>>> Currently the serialized form in the
>>> cache must be a Java serialized object. The non-serialized form of
>>> a KAR entry must provide a mechanism
>>> for storing, at the very least, the LSID information for the object
>>> so that it's Kar Manifest information
>>> can be retrieved from the cache based on the LSID.
>>>
>>> The Kar system will not depend on the gui.
>>>
>>> The Kar system will give all modules in the system a chance to
>>> contribute entries to the KAR file on save.
>>
>> What is controlling the save (i.e. what is the trigger for saving a
>> kar?) It seems like any module could save the kar along the way, but
>> I'm not sure how the kar system would give each module a chance to add
>> stuff before a save. Are you proposing a listener system for kar
>> changes?
>>
>>>
>>> The Kar system will provide a standardized way for opening the
>>> entries of a kar file.
>>>
>>>
>>> How:
>>> ----------
>>>
>>> Previously:
>>>
>>> Module developer adds a KAREntryHandler and registers it in the system.
>>> From a standardized save context in the gui (i.e. an extension of
>>> ExportArchiveAction)
>>> NamedObjs were added to a SaveKAR object, the SaveKar (via
>>> KARBuilder) would then
>>> pass the LSIDs of the NamedObjs from the SaveKAR to all of the
>>> registered KarEntryHandler save methods
>>> and the EntryHandler was responsible for returning an appropriate
>>> KAR entry based on the LSID it received.
>>> This is perhaps too general because any NamedObj can be added to a
>>> SaveKar which makes it
>>> confusing for developers what they should do with it.
>>>
>>> Proposal:
>>>
>>> Really what we want to save in a KAR is one or more ComponentEntities
>>> (i.e. workflow or actor) along with any
>>> files that depend on that ComponentEntity in a general sense and in
>>> a specific save context we want
>>> the ability to pick and choose which dependencies get added.
>>>
>>> So to narrow the scope of the system we say that the SaveKar ONLY
>>> accepts ComponentEntities
>>> and then those ComponentEntities (not just their LSIDs) are passed
>>> to the KAREntryHandlers.
>>
>> So, the componentEntity gets to choose the objects in the kar? How
>> would these objects be associated with the component entity?
>>
>>> Those handlers then return an array of KAREntries that should be
>>> included in the KAR.
>>> To pick and choose from dependencies the Save context would need to
>>> record what should be saved
>>> (perhaps based on a user's selection in the gui) and then that
>>> list can be used by the EntryHandler
>>> to determine exactly which KAREntries should be returned for a
>>> given ComponentEntity.
>>>
>>> We need a KarGenericFile class for saving generic files in KARs.
>>> This will properly handle copying the
>>> files out of the KAR and into the cache. This type of object does
>>> not have serialized forms and acts
>>> simply as a wrapper while in memory.
>>>
>>> We need to address LSID issues on non-NamedObjs. An Interface that
>>> includes methods like,
>>> getLSID and getLSIDReferralList and updateLSIDRevision is needed for
>>> anyone wanting to use
>>> non-NamedObjs in a KAR file.
>>
>> Perhaps a system-wide lookup table for these type of objects would
>> work. Maybe we should generalize this to all objects, including
>> component entities. Then the lsid would only be embedded in the
>> component entity when it is saved or somehow transfered (i.e. uploaded
>> to a repository).
>>
>> I feel like we've kind of struggled with associating object IDs in
>> kepler for a while and it would be nice to come up with a good generic
>> method for doing this. It would be nice if it could be the starting
>> point for allowing tighter user control over revisioning.
>>
>> I'd like to see some use case type statement for this. I think they
>> would be helpful in better defining what we need to do.
>>
>> Overall, I like the plan though. It seems pretty straight-forward.
>>
>> chad
>>
>>
>>>
>>>
>>> Aaron
>>>
>>> Chad Berkley wrote:
>>>> Hi,
>>>>
>>>> A few of us chatted this morning about the current bug list and the
>>>> release. We would like to propose a code freeze target of *January
>>>> 31st, 2010*. The big bugs that we're still working on include:
>>>>
>>>> * KAR subsystem bugs: May need a slight redesign or
>>>> respecification. Needs unit tests built. Aaron is going to create a
>>>> text document outlining the use cases of the KAR system and I am
>>>> going to try to write unit tests to those specifications.
>>>>
>>>> * Documentation: figure out how to allow modules to include
>>>> documentation that shows up with the other kepler help docs
>>>>
>>>> * Module Manager: figure out the current bugs and talk further about
>>>> how modules are integrated into the runtime environment.
>>>>
>>>> * other release bugs: There are a lot of smaller bugs, some of which
>>>> may be ironed out in the tasks above, but nonetheless, need to be
>>>> addressed. Sean, Chad and Debi will work on this until others are
>>>> done with their tasks.
>>>>
>>>> Let us know if there are other concerns to be put in this list.
>>>> Also, please update any bugs assigned to you and close any that have
>>>> been completed. Thanks,
>>>> chad
>>>> _______________________________________________
>>>> Kepler-dev mailing list
>>>> Kepler-dev at kepler-project.org
>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev
>>>>
>>>
>
More information about the Kepler-dev
mailing list