[kepler-dev] Replacing DataCacheManager with CacheManager.

Chad Berkley berkley at nceas.ucsb.edu
Fri Dec 9 13:38:41 PST 2005


I'm not sure why we would need another ID when we have a (supposedly) 
unique lsid to use.  I've been designing all of the objectmanager 
classes around using lsids.  If we introduce yet another id to use 
internally, i forsee major headaches.

On another, yet slightly related note:
I'm also dubious as to how the new cache manager is going to work with 
data coming in from ecogrid.  I was working under the assumption (based 
on decisions made at the june kepler meeting) that all objects coming 
into kepler would have an lsid.  Apparently the "reality on the ground" 
(as CNN likes to put it) is much different.  Not only is the object 
cache tied to lsids but the SMS system is too.  If we hope to use SMS to 
search the data store, they must have lsids.

We could generate local lsids for these data objects pretty easily, but 
this will cause problems later if you try to transfer the data (via a 
kar) to another machine or if you try to upload it to another repository.

I don't really have a good solution for this.  I kind of think that, 
since we've designed the object manager around lsids, we should force 
external systems to play nice with kepler by providing lsids, either 
natively or through some external filtering system.

chad

Kevin Ruland wrote:
> Hi.
> 
> I've found some more information which might prove useful.
> 
> hsqldb does support auto-increment IDENTITY columns.  We could utilize 
> such a thing for a primary key to the table and allow access to objects 
> through that number.  So, when a new object is inserted, this integer 
> can be returned for the caller to utilize for future queries.
> 
> Of course, this does not provide for persistance beyond the current session.
> 
> If we can leverage the NAME column more, perhaps that could be used for 
> the persistant key.  Basically that's what the DataCacheManager does 
> now.  It uses a name like "EcoGrid Digir Query: <magic query string>" 
> for the name of the object.
> 
> The current schema for the cachetable is:
> 
> name: varchar
> lsid: varchar
> date: varchar
> file: varchar
> 
> With no constraints.
> 
> I suggest we do this:
> 
> id: IDENTITY
> name: varchar
> lsid: varchar
> date: varchar
> file: varchar
> expiration: varchar (to be completed eventually)
> 
> Perhaps force:  lsid nullable unique  - because it seems that's what it 
> should be.
> 
> Change some signatures:
> 
> integer CacheManager.insertObject( CacheObject ) - returns the id of the 
> inserted element.
> 
> CacheObject CacheManager.getObject( int ) - returns the cache object for 
> the given id.  Or null if not found.
> 
> vector<CacheObject> CacheManager.getObjectsByName( String ) - returns a 
> vector of objects matching the given name string.
> 
> Also, there are a few places where sql is inlined in the application.  
> This includes ddl statements as well as SIUD sql.  Perhaps we should 
> consider pulling these things together into something which looks more 
> like a data access pattern.  I think at least we should have the 
> application initialization code pulled together which would include 
> initialization of the user.dir directory structure and the clean database.
> 
> Kevin
> 
> Kevin Ruland wrote:
> 
> 
>>Hi.
>>
>>One of the tickets assigned to me is to implement the cache expiration
>>stuff so, in particular, the ecogrid queries are better behaved.  I was
>>expecting to try to migrate the old ecogrid mechanism from the
>>DataCacheManager to the new CacheManager before getting this to work. 
>>However, I have some questions. 
>>
>>The resultsets returned from the ecogrid queries do not have anything
>>resembling an lsid which is the primary key into the CacheManager.  We
>>could hack together an lsid based on something like the search criteria,
>>but strictly speaking this is not an lsid.  In addition, there is no
>>real guarantee that the resultset returned for the same query will
>>always be the same result set.  For example, additional Digir providers
>>are available, or new data has been added to metacat, etc.
>>
>>I'm thinking we need some kind of internal lsid generator which can
>>return new lsids for the local application.  Either we'd have to have
>>the objects with these internal (localhost?) lsids always have "session"
>>lifespan, or we'll have to come up with a mechanism which always returns
>>the same lsid for any arbitrary input.  Maybe something like:
>>
>>class LSIDGenerator {
>>
>> static LSID generate( Object o );
>>
>>}
>>
>>Some kind of magic checksum is computed on o and used in the lsid.  So
>>when somebody does an Ecogrid quick search, the contents of the text box
>>combined with the EcogridQueryEndpoint are passed into the generate
>>method.  Note:  both these things are strings.
>>
>>A unique mapping from Object (or maybe the less general case String) ->
>>LSID could then be used to lookup the resultset (or other object) from
>>previously executed queries.
>>
>>I think we have the same problem when trying to use the CacheManager as
>>a repository for intermediate results generated as part of a workflow.
>>
>>Kevin
>>
>>_______________________________________________
>>Kepler-dev mailing list
>>Kepler-dev at ecoinformatics.org
>>http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
>> 
>>
> 
> 
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev


More information about the Kepler-dev mailing list