[kepler-dev] Comments on Kepler Object Manager interface.

Mon Sep 19 09:54:00 PDT 2005

Hi all,

I finally read through the Object Manger wiki page and have some
comments for it.  My thougths are relatively half baked and this email
might descrive an api which is incomplete, broken or inconsistent.

- Kevin

First a question:  Is the LSID enough to be able to fetch remote
data/metadata?  What happens when ObjectManager.getCacheItem( LSID lsid
) is called for an lsid which has not been cached?  I see there is an
IDNotFoundException raised by the method, but does that indicate the
caller is not resposible to fetch the data and insert it into the cache
with ObjectManager.addCacheItem( CacheItem )?

1) The ObjectManager should implement a thread safe singleton using the
Pimpl idiom in order to be jdk 1.4 safe.  This means there needs to be
the following:

package o.k.objectmanager;

/* Note, package access to class */
class ObjectManagerImpl {

  /* package access */
  ObjectManagerImpl() { }

 /* methods and more methods */

 /* data and more data */

}

package o.k.objectmanager;
public class ObjectManager {

  /* Object Manager can have only one data member for this to work.  It
cannot have additional class or instance variables */
  private ObjectMangerImpl pimp = new ObjectMangerImpl();

  private ObjectManager();

  /* methods should be public static. This allows two different
synchronization strategies to the ObjectManager
  * we can declare all the methods synchronized and easily prevent
threading issues inside the ObjectMangerImpl
  * by using the class level mutex on ObjectManager.
  * Or we can expend the additional work necessary to make
ObjectManagerImpl itself thread safe
  * and push the syncronization into that class using more compex techniques
  */
  public CacheItem getCacheItem(LSID lsid) throws IDNotFoundException;
}

Of course, insead of having ObjectManagerImpl in a seperate translation
unit, we could have it in the ObjectManager.java file.  This adds a
little more obscurity, but does bloat the ObjectManager.java file.

2)  The ObjectManger, I believe, should be slightly more functional that
this specification.  It should also control the cache invalidation
policy (on a per CacheItem basis) and the CacheItem locking policy
(again per CacheItem).  I propose addition two additional types to this
system and some additional methods to ObjectManager in order to
implement them.

public interface CacheItemInvalidationPolicy {
  public bool isInvalid();
}

public interface CacheItemLockingPolicy {
  public bool ReadRequested();  /* returns true if lock is needed */
  public bool WriteRequeseted(); /* returns true if lock is needed */
}

When an object is requested from the cache, an invalidation policy and
locking policy should be passed into the get CacheItem() method. 
Default to the arugments should make some sense:  Default
CacheItemInvalidationPolicy = "never" or "session", and default locking
policy "read shared", for example.

If the cache item is already in the cache and has a different locking
policy or invalidation policy,  the CacheItem should be modified (and
persisted) with the more restrictive.  Ie, the shorter duration
invalidation policy, or the more restrictive locking policy.

Right now, I only see four different locking policies:  Uncontrolled,
ReadShared, ReadExclusive, and Exclusive.  The "Exclusive" policy means
that only a single copy of the CacheItem can exist in the application. 
This would be for those actors/widgets which operate with raw file-names
and the ObjectManger cannot synchronize the access.  The ReadShared &
ReadExclusive policies would be used when the CacheItem only has read
access to the underlying cache storage for a short duration.  In either
of these, write access would be exclusive and would preclude any current
readers.  The "Uncontrolled" policy means that the ObjectManager does
not get involved with any control to the underlying resource -- this is
a very special policy which is ment to be used by "temporary files" if
they are to be controlled by the ObjectManager.

3)  The startup method for ObjectManager should flush from the cache any
objects which have been invalidated since last running.  This would
allow for "by date" invalidation so cache items can expire when the
application is off.

4)  ObjectManager should provide a session shutdown method which is
called during application exit.  This method would flush from the cache
all CacheItems with "session" invalidation, ie should not be persisted
past the session.

5) Possible CacheItemInvalidationPolicies:
  - Session: does not persist past the session
  - NoCache: does not cache at all.  This would be useful for the
QuickSearch results which might should always reexecute.
  - ByTime:  Compute the invalidation date some time in the future, eg
"1 day" or "1 week"
  - Never:  Never expires.  This might be painful.

6) Default InvalidationPolicies could be incorporated into the LSID
Resolver/Factory mechanism.  Then it could be configured using url
pattern matching from the config.xml file or other such thing.

7) the base CacheItem object should only encapsulate its interaction
with the ObjectManager.  Any higher-order information should be left to
derived types.  I suggest the following:

public abstract CacheItem {

  public LSID getIdentifier();
  public CacheItemInvalidationPolicy getInvalidationPolicy();
  public void setInvalidationPolicy( CacheItemInvalidationPolicy ); /*
Exception free?  If the caller requests an extension of the
invalidation, is this a problem? */
  public CacheItemLockingPolicy getLockingPolicy();
  public void setLockingPolicy( CacheItemLockingPolicy ); /* Exceptions
possible if trying to loosen the locking policy? */

}

In particular the object should not provide *any* access to the
underlying object representation.  I do not believe the methods: 
getItemAsFile() or getItemAsInputStream() belong in this class.  These
should be moved into a derived type which provides a Java IO interface
to a CacheItem.  Such a class could be "FileCacheItem", which would
provide seperate Input and Output methods, or "RawFileCacheItem" which
would provide access to the cached object's file name.  Also,
RawFileCacheItem should always use an Exclusive lock since it's premise
is Kepler does not control access.

I do not think the constructor from LSID should be available in
CacheItem.  Constructing a new CacheItem should be only available from
ObjectManager who can delegate as appropriate to the LSID resolver.

The static int cache type enumeration is a problem because it is a
static registration of the different cache item types.  Instead of using
such a mechanism, one should rely on "instanceof" to determine the
underlying representation if necessary.

8) In order to control the locking policy on FileCacheItems, instead of
returning Java's InputStream we need to return an object which derives
from InputStream which can control notification to the ObjectManager
when the InputStream has been closed.  For example:

public class FileCacheItem {

  public InputStreamCacheItem getInputStream() {
    return new InputStreamCacheItem( this, this.getFileName() )
  }

  ....

}

public class InputStreamCacheItem extends java.io.FileInputStream {

  InputStreamCacheItem( FileCacheItem p ) {
    p.AquireReadLock( ); /* who calls ObjectManager.AcquireReadLock
depending on policy */
    super( p.getFileName() );
    _is_released = false;
    _parent = p;
  }

  /* override close to release the read lock */
  void close() {
    super.close();
    if ( ! _is_released ) {
      parent.ReleaseReadLock( ); /* who calls
ObjectManager.ReleaseReadLock */
      _is _released = true;
    }
  }

  /* finalize to release the resourced
   *  This particular example may not require finalize method because
it's already implemented in
   * java.io.FileInputStream to call close().
   */
  protected void finalize() {
    super.finalize();
    if ( !_is_released ) {
     parent.ReleaseReadLock(); /* who calls
ObjectManager.ReleaseReadLock() */
     _is_released = true;
    }
  }
}

This may not be completely correct, but the idea is when the user of the
FileCacheItem requests an open InputStream, the ObjectManager needs to
be notified that a ReadLock is requested.  Then when the stream is
closed, the ReadLock needs to be released.

9)  Lock policy violations:  I think we should have some exceptions
related to concurrent access which get raised under certain
circumstances.  This exception should be a RuntimeException because
checking this exception will provide no benefit -- At least that's what
I'm thinking right now.

/* Should extend RuntimeException because there's nothing the caller can
do with this.
 * it represents an underlying programming error.
 */
class CacheConcurrentAccessException extends RuntimeException { }

Either an exception hierarchy should be used, or simple types and
strings.  Exceptions should be raised in the following conditions:

ObjectManager.getCacheItem(LSID) for a RawFileCacheItem when the
underlying object has already been retrieved.

ObjectManager.AcquireReadLock( CacheItem ) when the corresponding
CacheItem satisfies:
 -  ReadExclusive lock policy and Read Lock already acquired, or
 -  Write Lock already acquired.  (Writes are always exclusive and
cannot support readers.)

ObjectManger.AcquireWriteLock( CacheItem ) when the corresponding
CacheItem satisfies:
  - Write Lock already acquired.
  - Any currently opened readers exist.  (This means ObjectManager needs
to count Readers through the use of AcquireReadLock, even when using
ReadShared locking policy).

10)  The ObjectManager could also be used to allocate temporary files
for various reasons.  Such files would always use Uncontrolled Locking
policy and Session invalidation policy.  Otherwise a central Temporary
file service should be created which would allow control of the location
of temporary files in the host system.