[kepler-dev] Replacing DataCacheManager with CacheManager.

Kevin Ruland kruland at ku.edu
Fri Dec 9 12:08:06 PST 2005


Hi.

I've found some more information which might prove useful.

hsqldb does support auto-increment IDENTITY columns.  We could utilize 
such a thing for a primary key to the table and allow access to objects 
through that number.  So, when a new object is inserted, this integer 
can be returned for the caller to utilize for future queries.

Of course, this does not provide for persistance beyond the current session.

If we can leverage the NAME column more, perhaps that could be used for 
the persistant key.  Basically that's what the DataCacheManager does 
now.  It uses a name like "EcoGrid Digir Query: <magic query string>" 
for the name of the object.

The current schema for the cachetable is:

name: varchar
lsid: varchar
date: varchar
file: varchar

With no constraints.

I suggest we do this:

id: IDENTITY
name: varchar
lsid: varchar
date: varchar
file: varchar
expiration: varchar (to be completed eventually)

Perhaps force:  lsid nullable unique  - because it seems that's what it 
should be.

Change some signatures:

integer CacheManager.insertObject( CacheObject ) - returns the id of the 
inserted element.

CacheObject CacheManager.getObject( int ) - returns the cache object for 
the given id.  Or null if not found.

vector<CacheObject> CacheManager.getObjectsByName( String ) - returns a 
vector of objects matching the given name string.

Also, there are a few places where sql is inlined in the application.  
This includes ddl statements as well as SIUD sql.  Perhaps we should 
consider pulling these things together into something which looks more 
like a data access pattern.  I think at least we should have the 
application initialization code pulled together which would include 
initialization of the user.dir directory structure and the clean database.

Kevin

Kevin Ruland wrote:

>Hi.
>
>One of the tickets assigned to me is to implement the cache expiration
>stuff so, in particular, the ecogrid queries are better behaved.  I was
>expecting to try to migrate the old ecogrid mechanism from the
>DataCacheManager to the new CacheManager before getting this to work. 
>However, I have some questions. 
>
>The resultsets returned from the ecogrid queries do not have anything
>resembling an lsid which is the primary key into the CacheManager.  We
>could hack together an lsid based on something like the search criteria,
>but strictly speaking this is not an lsid.  In addition, there is no
>real guarantee that the resultset returned for the same query will
>always be the same result set.  For example, additional Digir providers
>are available, or new data has been added to metacat, etc.
>
>I'm thinking we need some kind of internal lsid generator which can
>return new lsids for the local application.  Either we'd have to have
>the objects with these internal (localhost?) lsids always have "session"
>lifespan, or we'll have to come up with a mechanism which always returns
>the same lsid for any arbitrary input.  Maybe something like:
>
>class LSIDGenerator {
>
>  static LSID generate( Object o );
>
>}
>
>Some kind of magic checksum is computed on o and used in the lsid.  So
>when somebody does an Ecogrid quick search, the contents of the text box
>combined with the EcogridQueryEndpoint are passed into the generate
>method.  Note:  both these things are strings.
>
>A unique mapping from Object (or maybe the less general case String) ->
>LSID could then be used to lookup the resultset (or other object) from
>previously executed queries.
>
>I think we have the same problem when trying to use the CacheManager as
>a repository for intermediate results generated as part of a workflow.
>
>Kevin
>
>_______________________________________________
>Kepler-dev mailing list
>Kepler-dev at ecoinformatics.org
>http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
>  
>



More information about the Kepler-dev mailing list