[kepler-dev] Replacing DataCacheManager with CacheManager.

Fri Dec 9 14:23:56 PST 2005

This might be blasphemy in terms of the LSID standard,
but I think in many places we have a need for ids that
"look" like LSIDs, but aren't official ones.  For
example, temporary identifiers and local identifiers
would all fit this bill -- and we already have some
support for these within Kepler. Of course, these
"look-alikes" don't actually conform to the LSID
standard in terms of resolution, etc.

-shawn

Chad Berkley wrote:
> I'm not sure why we would need another ID when we have a (supposedly) 
> unique lsid to use.  I've been designing all of the objectmanager 
> classes around using lsids.  If we introduce yet another id to use 
> internally, i forsee major headaches.
> 
> On another, yet slightly related note:
> I'm also dubious as to how the new cache manager is going to work with 
> data coming in from ecogrid.  I was working under the assumption (based 
> on decisions made at the june kepler meeting) that all objects coming 
> into kepler would have an lsid.  Apparently the "reality on the ground" 
> (as CNN likes to put it) is much different.  Not only is the object 
> cache tied to lsids but the SMS system is too.  If we hope to use SMS to 
> search the data store, they must have lsids.
> 
> We could generate local lsids for these data objects pretty easily, but 
> this will cause problems later if you try to transfer the data (via a 
> kar) to another machine or if you try to upload it to another repository.
> 
> I don't really have a good solution for this.  I kind of think that, 
> since we've designed the object manager around lsids, we should force 
> external systems to play nice with kepler by providing lsids, either 
> natively or through some external filtering system.
> 
> chad
> 
> Kevin Ruland wrote:
>> Hi.
>>
>> I've found some more information which might prove useful.
>>
>> hsqldb does support auto-increment IDENTITY columns.  We could utilize 
>> such a thing for a primary key to the table and allow access to objects 
>> through that number.  So, when a new object is inserted, this integer 
>> can be returned for the caller to utilize for future queries.
>>
>> Of course, this does not provide for persistance beyond the current session.
>>
>> If we can leverage the NAME column more, perhaps that could be used for 
>> the persistant key.  Basically that's what the DataCacheManager does 
>> now.  It uses a name like "EcoGrid Digir Query: <magic query string>" 
>> for the name of the object.
>>
>> The current schema for the cachetable is:
>>
>> name: varchar
>> lsid: varchar
>> date: varchar
>> file: varchar
>>
>> With no constraints.
>>
>> I suggest we do this:
>>
>> id: IDENTITY
>> name: varchar
>> lsid: varchar
>> date: varchar
>> file: varchar
>> expiration: varchar (to be completed eventually)
>>
>> Perhaps force:  lsid nullable unique  - because it seems that's what it 
>> should be.
>>
>> Change some signatures:
>>
>> integer CacheManager.insertObject( CacheObject ) - returns the id of the 
>> inserted element.
>>
>> CacheObject CacheManager.getObject( int ) - returns the cache object for 
>> the given id.  Or null if not found.
>>
>> vector<CacheObject> CacheManager.getObjectsByName( String ) - returns a 
>> vector of objects matching the given name string.
>>
>> Also, there are a few places where sql is inlined in the application.  
>> This includes ddl statements as well as SIUD sql.  Perhaps we should 
>> consider pulling these things together into something which looks more 
>> like a data access pattern.  I think at least we should have the 
>> application initialization code pulled together which would include 
>> initialization of the user.dir directory structure and the clean database.
>>
>> Kevin
>>
>> Kevin Ruland wrote:
>>
>>
>>> Hi.
>>>
>>> One of the tickets assigned to me is to implement the cache expiration
>>> stuff so, in particular, the ecogrid queries are better behaved.  I was
>>> expecting to try to migrate the old ecogrid mechanism from the
>>> DataCacheManager to the new CacheManager before getting this to work. 
>>> However, I have some questions. 
>>>
>>> The resultsets returned from the ecogrid queries do not have anything
>>> resembling an lsid which is the primary key into the CacheManager.  We
>>> could hack together an lsid based on something like the search criteria,
>>> but strictly speaking this is not an lsid.  In addition, there is no
>>> real guarantee that the resultset returned for the same query will
>>> always be the same result set.  For example, additional Digir providers
>>> are available, or new data has been added to metacat, etc.
>>>
>>> I'm thinking we need some kind of internal lsid generator which can
>>> return new lsids for the local application.  Either we'd have to have
>>> the objects with these internal (localhost?) lsids always have "session"
>>> lifespan, or we'll have to come up with a mechanism which always returns
>>> the same lsid for any arbitrary input.  Maybe something like:
>>>
>>> class LSIDGenerator {
>>>
>>> static LSID generate( Object o );
>>>
>>> }
>>>
>>> Some kind of magic checksum is computed on o and used in the lsid.  So
>>> when somebody does an Ecogrid quick search, the contents of the text box
>>> combined with the EcogridQueryEndpoint are passed into the generate
>>> method.  Note:  both these things are strings.
>>>
>>> A unique mapping from Object (or maybe the less general case String) ->
>>> LSID could then be used to lookup the resultset (or other object) from
>>> previously executed queries.
>>>
>>> I think we have the same problem when trying to use the CacheManager as
>>> a repository for intermediate results generated as part of a workflow.
>>>
>>> Kevin
>>>
>>> _______________________________________________
>>> Kepler-dev mailing list
>>> Kepler-dev at ecoinformatics.org
>>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
>>>
>>>
>>
>> _______________________________________________
>> Kepler-dev mailing list
>> Kepler-dev at ecoinformatics.org
>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev