[kepler-dev] Comments on Kepler Object Manager interface.

Mon Oct 3 10:51:54 PDT 2005

Hey Kevin,

Sorry I didn't respond sooner.  I've been busy trying to get all this 
stuff to work (which it is mostly working now).  See my responses below.

Kevin Ruland wrote:
> Hi all,
> 
> I finally read through the Object Manger wiki page and have some
> comments for it.  My thougths are relatively half baked and this email
> might descrive an api which is incomplete, broken or inconsistent.
> 
> - Kevin
> 
> First a question:  Is the LSID enough to be able to fetch remote
> data/metadata?  What happens when ObjectManager.getCacheItem( LSID lsid
> ) is called for an lsid which has not been cached?  I see there is an
> IDNotFoundException raised by the method, but does that indicate the
> caller is not resposible to fetch the data and insert it into the cache
> with ObjectManager.addCacheItem( CacheItem )?

If the lsid is a "localhost" lsid, the system will try to resolve it 
through the local cache (objectmanager).  If an object with that lsid 
has not been loaded, it will throw the exception.  If the lsid is not a 
localhost lsid, it will first look in the cache to see if the object 
exists there.  If it does not, it will try to resolve the lsid with 
whatever authority is in the lsid.  If that authority cannot resolve the 
id, it will throw the exception.

> 
> 1) The ObjectManager should implement a thread safe singleton using the
> Pimpl idiom in order to be jdk 1.4 safe.  This means there needs to be
> the following:
> 
> package o.k.objectmanager;
> 
> /* Note, package access to class */
> class ObjectManagerImpl {
> 
>   /* package access */
>   ObjectManagerImpl() { }
> 
>  /* methods and more methods */
> 
>  /* data and more data */
> 
> }
> 
> package o.k.objectmanager;
> public class ObjectManager {
> 
>   /* Object Manager can have only one data member for this to work.  It
> cannot have additional class or instance variables */
>   private ObjectMangerImpl pimp = new ObjectMangerImpl();
> 
>   private ObjectManager();
> 
>   /* methods should be public static. This allows two different
> synchronization strategies to the ObjectManager
>   * we can declare all the methods synchronized and easily prevent
> threading issues inside the ObjectMangerImpl
>   * by using the class level mutex on ObjectManager.
>   * Or we can expend the additional work necessary to make
> ObjectManagerImpl itself thread safe
>   * and push the syncronization into that class using more compex techniques
>   */
>   public CacheItem getCacheItem(LSID lsid) throws IDNotFoundException;
> }
> 
> Of course, insead of having ObjectManagerImpl in a seperate translation
> unit, we could have it in the ObjectManager.java file.  This adds a
> little more obscurity, but does bloat the ObjectManager.java file.

Yeah, i'll be working on this soon.

> 
> 2)  The ObjectManger, I believe, should be slightly more functional that
> this specification.  It should also control the cache invalidation
> policy (on a per CacheItem basis) and the CacheItem locking policy
> (again per CacheItem).  I propose addition two additional types to this
> system and some additional methods to ObjectManager in order to
> implement them.
> 
> public interface CacheItemInvalidationPolicy {
>   public bool isInvalid();
> }
> 
> public interface CacheItemLockingPolicy {
>   public bool ReadRequested();  /* returns true if lock is needed */
>   public bool WriteRequeseted(); /* returns true if lock is needed */
> }
> 
> When an object is requested from the cache, an invalidation policy and
> locking policy should be passed into the get CacheItem() method. 
> Default to the arugments should make some sense:  Default
> CacheItemInvalidationPolicy = "never" or "session", and default locking
> policy "read shared", for example.
> 
> If the cache item is already in the cache and has a different locking
> policy or invalidation policy,  the CacheItem should be modified (and
> persisted) with the more restrictive.  Ie, the shorter duration
> invalidation policy, or the more restrictive locking policy.
> 
> Right now, I only see four different locking policies:  Uncontrolled,
> ReadShared, ReadExclusive, and Exclusive.  The "Exclusive" policy means
> that only a single copy of the CacheItem can exist in the application. 
> This would be for those actors/widgets which operate with raw file-names
> and the ObjectManger cannot synchronize the access.  The ReadShared &
> ReadExclusive policies would be used when the CacheItem only has read
> access to the underlying cache storage for a short duration.  In either
> of these, write access would be exclusive and would preclude any current
> readers.  The "Uncontrolled" policy means that the ObjectManager does
> not get involved with any control to the underlying resource -- this is
> a very special policy which is ment to be used by "temporary files" if
> they are to be controlled by the ObjectManager.

These are all good ideas.  I think they can easily be worked into the 
current implementation.  I had always planned to have cache items 
eventually be expunged from the cache but hadn't gotten around to 
working on that.

> 
> 3)  The startup method for ObjectManager should flush from the cache any
> objects which have been invalidated since last running.  This would
> allow for "by date" invalidation so cache items can expire when the
> application is off.

agreed.

> 
> 4)  ObjectManager should provide a session shutdown method which is
> called during application exit.  This method would flush from the cache
> all CacheItems with "session" invalidation, ie should not be persisted
> past the session.

agreed.

> 
> 5) Possible CacheItemInvalidationPolicies:
>   - Session: does not persist past the session
>   - NoCache: does not cache at all.  This would be useful for the
> QuickSearch results which might should always reexecute.
>   - ByTime:  Compute the invalidation date some time in the future, eg
> "1 day" or "1 week"
>   - Never:  Never expires.  This might be painful.

Some "never" items will be needed.  for instance, local kar files.

> 
> 6) Default InvalidationPolicies could be incorporated into the LSID
> Resolver/Factory mechanism.  Then it could be configured using url
> pattern matching from the config.xml file or other such thing.

I don't think the lsid resolver is the place to implement invalidation 
policies.  the lsid spec is pretty narrow in its scope and i was trying 
to adhere to that.  i've stretched it a bit with the idea of a 
'localhost' lsid.

> 
> 7) the base CacheItem object should only encapsulate its interaction
> with the ObjectManager.  Any higher-order information should be left to
> derived types.  I suggest the following:
> 
> public abstract CacheItem {
> 
>   public LSID getIdentifier();
>   public CacheItemInvalidationPolicy getInvalidationPolicy();
>   public void setInvalidationPolicy( CacheItemInvalidationPolicy ); /*
> Exception free?  If the caller requests an extension of the
> invalidation, is this a problem? */
>   public CacheItemLockingPolicy getLockingPolicy();
>   public void setLockingPolicy( CacheItemLockingPolicy ); /* Exceptions
> possible if trying to loosen the locking policy? */
> 
> }
> 
> In particular the object should not provide *any* access to the
> underlying object representation.  I do not believe the methods: 
> getItemAsFile() or getItemAsInputStream() belong in this class.  These
> should be moved into a derived type which provides a Java IO interface
> to a CacheItem.  Such a class could be "FileCacheItem", which would
> provide seperate Input and Output methods, or "RawFileCacheItem" which
> would provide access to the cached object's file name.  Also,
> RawFileCacheItem should always use an Exclusive lock since it's premise
> is Kepler does not control access.
> 
> I do not think the constructor from LSID should be available in
> CacheItem.  Constructing a new CacheItem should be only available from
> ObjectManager who can delegate as appropriate to the LSID resolver.
> 
> The static int cache type enumeration is a problem because it is a
> static registration of the different cache item types.  Instead of using
> such a mechanism, one should rely on "instanceof" to determine the
> underlying representation if necessary.

I was trying to wrap the functionality of the DataItemCache with the 
ObjectManager.  In doing so, i was trying to hide the storage object 
within the cache.  At the time, I felt that having a different class for 
each type of storage item was unneeded complexity.  If I'm proven to be 
wrong, we might have to change some parts of this.

> 
> 8) In order to control the locking policy on FileCacheItems, instead of
> returning Java's InputStream we need to return an object which derives
> from InputStream which can control notification to the ObjectManager
> when the InputStream has been closed.  For example:
> 
> public class FileCacheItem {
> 
>   public InputStreamCacheItem getInputStream() {
>     return new InputStreamCacheItem( this, this.getFileName() )
>   }
> 
>   ....
> 
> }
> 
> public class InputStreamCacheItem extends java.io.FileInputStream {
> 
>   InputStreamCacheItem( FileCacheItem p ) {
>     p.AquireReadLock( ); /* who calls ObjectManager.AcquireReadLock
> depending on policy */
>     super( p.getFileName() );
>     _is_released = false;
>     _parent = p;
>   }
> 
>   /* override close to release the read lock */
>   void close() {
>     super.close();
>     if ( ! _is_released ) {
>       parent.ReleaseReadLock( ); /* who calls
> ObjectManager.ReleaseReadLock */
>       _is _released = true;
>     }
>   }
> 
>   /* finalize to release the resourced
>    *  This particular example may not require finalize method because
> it's already implemented in
>    * java.io.FileInputStream to call close().
>    */
>   protected void finalize() {
>     super.finalize();
>     if ( !_is_released ) {
>      parent.ReleaseReadLock(); /* who calls
> ObjectManager.ReleaseReadLock() */
>      _is_released = true;
>     }
>   }
> }
> 
> This may not be completely correct, but the idea is when the user of the
> FileCacheItem requests an open InputStream, the ObjectManager needs to
> be notified that a ReadLock is requested.  Then when the stream is
> closed, the ReadLock needs to be released.

yeah.

> 
> 
> 9)  Lock policy violations:  I think we should have some exceptions
> related to concurrent access which get raised under certain
> circumstances.  This exception should be a RuntimeException because
> checking this exception will provide no benefit -- At least that's what
> I'm thinking right now.
> 
> /* Should extend RuntimeException because there's nothing the caller can
> do with this.
>  * it represents an underlying programming error.
>  */
> class CacheConcurrentAccessException extends RuntimeException { }
> 
> Either an exception hierarchy should be used, or simple types and
> strings.  Exceptions should be raised in the following conditions:
> 
> ObjectManager.getCacheItem(LSID) for a RawFileCacheItem when the
> underlying object has already been retrieved.
> 
> ObjectManager.AcquireReadLock( CacheItem ) when the corresponding
> CacheItem satisfies:
>  -  ReadExclusive lock policy and Read Lock already acquired, or
>  -  Write Lock already acquired.  (Writes are always exclusive and
> cannot support readers.)
> 
> ObjectManger.AcquireWriteLock( CacheItem ) when the corresponding
> CacheItem satisfies:
>   - Write Lock already acquired.
>   - Any currently opened readers exist.  (This means ObjectManager needs
> to count Readers through the use of AcquireReadLock, even when using
> ReadShared locking policy).

i think we just need to work on the overall thread safety/locking for 
objectmanger.  There's basically nothing in there now that provides 
thread safety.

> 
> 10)  The ObjectManager could also be used to allocate temporary files
> for various reasons.  Such files would always use Uncontrolled Locking
> policy and Session invalidation policy.  Otherwise a central Temporary
> file service should be created which would allow control of the location
> of temporary files in the host system.

The objectman does have a method to create a temp file.  it provides an 
unlocked File object.

Now that I've shifted kepler over to use the new kar/objman system I 
think it will be easier to collaborate and coordinate work on these sub 
systems.  It's working pretty well now, but can definitely be improved.

thanks for you comments,
chad