[kepler-dev] ObjectManager/DataCacheManager anaysis

Tue Nov 1 07:43:41 PST 2005

Chad,

I think an internal database to store the metadata is a great idea.  I
also agree about the clob/blob stuff -- I've never been satisfied with
blob handling even in very expensive commercial databases when using the
jdbc 2 api.  Java 1.4 includes v3 of jdbc which added natural handling
of blob types, but I have no experience using them and I do not know of
the atomicity of blob inserts & updates provided in jdbc driver
implementations.  To compound that, since many of the actors do native
or Java File I/O, they are going to require real files on the filesystem
anyway.

Can you write up a couple of sequence diagrams demonstrating these
scenarios:

1)  Object attempts to find cached data which doesn't already exist (in
cache)
2)  Object attempts to find cached data whcih does exist (in cache)
3)  Object creates data and puts in cache -- object does not already
exist in cache
4)  Object creates data and puts in cache -- object already exists in cache

In particular highlight the functionality provided by the listener.  I
don't understand how the listener pattern will provide DataCacheObjects
to request to not be purged.   It  is not clear to me that the
addCacheObjectListener() is on the correct object -- perhaps it should
be on the ObjectCache itself.  It also seems that all
CacheObjectListener objects will recieve notification of all events even
if they do not pertain to the object itself.

Instead of using a listener, why not add abstract methods to the
CacheObject to handle it's own lifecycle events.  So the CacheObject is
notified when it's to be purged, instered, etc.

Even through the Listener pattern has been about since the Dawn of Time
and it is heavily used in the ptolemy framework, I have never been a big
fan of it.  In particlar, it can cause problems with the java gc if
there is no way to unregister listener objects.  If you determine the
listener pattern is the only one which provides the proper
functionality, you should provide a direct way to unregister a specific
listener object and also use WeakReferences in the implementation.

Because of the lack of multiple inheritance in Java, it is good practice
to provide an interface in addition to an abstract class.  The
ObjectCache should manipulate by the Interface.  The abstract class
could provide reasonable default behavour which a DataCacheObject can
extend if it has no other abstract objects it needs to extend.  This
practice saves a lot of refactoring in the future when it is realized
that a particular DataCacheObject needs multiple inheritence.  (Yes,
I've run into exactly this situation in the past.)

I think it would be wise to have two different lsids associated with
each object in the cache.  On would refer to the data object type (which
could presumably come from some dynamic loading system....) and would be
used by the ObjectCache to retrieve the proper typed object from the
repository.  The other would refer to the instance of the data itself.

What is the NativeLibraryCacheObject for?  I fail to understand how you
can convince the OS to load a specific chunk of native code when the
dynamic linker determines it is necessary.  If it's for the JNI wrapper,
then this library is loaded through the System.loadLibrary() and
ClassLoader.findLibrary() methods.  Even if you require a different API
(perhaps the following:  NativeLibraryCacheObject o =
ObjectCache.getInstance().getObject( "magiclsidforjnidll" ); 
Runtime.getRuntime().load( o.getAbsoluteFilename() );  ), it still will
have a hard time dealing with one dll requring another.  This is the
case with the implementation of the gdal actors -- there is a simple jni
pidgin which then loads the gdal.dll through the OS itself.  Until a
complete strategy for loading native libraries is found, I suggest
leaving this out.

Instead of the requestPurge and requestPurgeExtension methods, I think
using a policy pattern would be better.   Extend the CacheObject
interface with this:

CachePurgePolicy getPurgePolicy();
void setPurgePolicy( CachePurgePolicy );

interface CachePurgePolicy {

  bool canBePurged();

}

Or something like that.  If an object want's it's lifetime extended,
then it only needs to provide a different CachePurgePolicy.  The
ObjectCache would be able to query the object itself to determine if it
can be purged or not.

Can you highlight the differences between the ObjectCache.removeObject()
and ObjectCache.requestPurge() methods?

Shawn will probably ask for some more general query methods in
ObjectCache.  Can you describe how these could work?  I would guess
something like:  Vector ObjectCache.query( statement ) would be what
he'd want.  Does this mean we need to build an ObjectCache statement
hierarchy, or utilize jdbc's or something else?

Since all the ObjectCache methods are by lsid, I suggest you put the
lsid front and center in the CacheObject:

LSID CacheObject.getLsid();
LSID CacheObject.getDataTypeLsid();  // If you think this is useful.

The LSID should not be mutable.  The program should not be able to
retrieve a cached object, then change it's lsid and recache.  I don't
really know how you can prevent this use-case and at the same time
provide good initialization and deserialization semantics.

Kevin

Chad Berkley wrote:

>Hey,
>
>Last week I was tasked with looking into the current ObjectManager and 
>more specifically the DataCacheManager implementation to figure out 
>whether we should stick with the DataCacheManager as the underlying 
>cache system for kepler, or whether we should re-write it so that it 
>acts more in conjunction with the ObjectManager.  After looking at the 
>current code and the recommendations made by Kevin and others on the 
>optimal configuration, I think we should re-write the cache.  Below is 
>an outline of how I think it should be redesigned.  I think the 
>re-design has a much simpler API and a more logical process flow.  In 
>writing this, I've taken into consideration the original OM design on 
>the wiki, comments made by Kevin and others, Shawn and my experiences 
>trying to integrate SMS and my own experience writing the current OM on 
>top of the original cache.
>
>Objects:
>
>ObjectCache
>-----------
>ObjectCache getInstance() //singleton
>void insertObject(CacheObject)
>CacheObject removeObject(KeplerLSID)
>CacheObject getObject(KeplerLSID)
>CacheObject getTempObject() //request a single session temp object
>void requestPurge(KeplerLSID) //request an object be purged
>void requestPurgeExtension(KeplerLSID) //an object being purged can
>                                        //request that it not get purged
>void purgeAll() //clear the cache
>
>
>abstract CacheObject
>-----------
>void addAttribute(String name, Object value)
>Object getAttribute(String name)
>Object removeAttribute(String name)
>void addCAcheObjectListener() //listeners for cache events
>abstract void serialize()
>abstract Object getObject()
>
>
>interface CacheObjectListener
>-------------------
>void objectAdded(CacheEvent)
>void objectRemoved(CacheEvent)
>void objectPurged(CacheEvent)
>
>
>CacheEvent
>----------
>CacheObject getSource()
>
>
>The classes that would extend CacheObject are:
>KARCacheObject extends JarCacheObject
>DataCacheObject
>ActorCacheObject
>XMLMetadataCacheObject
>JarCacheObject
>NativeLibraryCacheObject
>WorkflowCacheObject
>FileCacheObject
>
>The listener interface will allow CacheObjects to have automatic actions 
>take place when they are added, removed or purged from the cache.  This 
>will allow, for instance, the KARCacheObject to process a kar file upon 
>being added or an ActorCacheObject to add itself to the tree 
>automatically.  This will keep the cache item specific code inside each 
>cache item instead of locating it in the cache itself.  The listener 
>will also allow items such as DataCacheObjects to request to not be 
>purged if they are large or recently used.  I think (correct me if i'm 
>wrong) this will also allow cache objects that are going to take a while 
>to retrieve (like DataCacheObject) to multi-thread themselves and not 
>stop the user from performing other tasks while the object is downloading.
>
>The current cache uses an xml file to store an index of cache items.  I 
>would, instead, like to use the embedded database for this.  I think it 
>will allow more flexibility in indexing the cache as well as speed up 
>loading of cache items.  Because of the BLOB/CLOB problem, I think the 
>cache objects should still be stored on disk with a pointer from the 
>database.
>
>This is going to require some reworking of existing code.  Basically the 
>current ObjectManager interface will go away and be replaced by this 
>cache.  This shouldn't be too big of a deal because the only place the 
>OM is being used is in the kar support classes.  This code can be 
>re-worked into the KARCacheObject class.  The one place that I'm 
>uncertain of the work required is in the various data actors.  I know 
>Jing has a bunch of code that uses the cache for the EML and other 
>datasource actors.  This will have to be re-written.
>
>Please take a look at this and let me know if I've forgotten anything. 
>Unless there is something hugely wrong with what I've written, I'd 
>rather not have a long, drawn-out discussion about this since it needs 
>to get implemented soon if we are going to make our Dec. 9 deadline. 
>Please reply with any comments within the next day or so.
>
>thanks,
>chad
>_______________________________________________
>Kepler-dev mailing list
>Kepler-dev at ecoinformatics.org
>http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
>  
>