[kepler-dev] Replacing DataCacheManager with CacheManager.
Shawn Bowers
sbowers at ucdavis.edu
Fri Dec 9 14:23:56 PST 2005
This might be blasphemy in terms of the LSID standard,
but I think in many places we have a need for ids that
"look" like LSIDs, but aren't official ones. For
example, temporary identifiers and local identifiers
would all fit this bill -- and we already have some
support for these within Kepler. Of course, these
"look-alikes" don't actually conform to the LSID
standard in terms of resolution, etc.
-shawn
Chad Berkley wrote:
> I'm not sure why we would need another ID when we have a (supposedly)
> unique lsid to use. I've been designing all of the objectmanager
> classes around using lsids. If we introduce yet another id to use
> internally, i forsee major headaches.
>
> On another, yet slightly related note:
> I'm also dubious as to how the new cache manager is going to work with
> data coming in from ecogrid. I was working under the assumption (based
> on decisions made at the june kepler meeting) that all objects coming
> into kepler would have an lsid. Apparently the "reality on the ground"
> (as CNN likes to put it) is much different. Not only is the object
> cache tied to lsids but the SMS system is too. If we hope to use SMS to
> search the data store, they must have lsids.
>
> We could generate local lsids for these data objects pretty easily, but
> this will cause problems later if you try to transfer the data (via a
> kar) to another machine or if you try to upload it to another repository.
>
> I don't really have a good solution for this. I kind of think that,
> since we've designed the object manager around lsids, we should force
> external systems to play nice with kepler by providing lsids, either
> natively or through some external filtering system.
>
> chad
>
> Kevin Ruland wrote:
>> Hi.
>>
>> I've found some more information which might prove useful.
>>
>> hsqldb does support auto-increment IDENTITY columns. We could utilize
>> such a thing for a primary key to the table and allow access to objects
>> through that number. So, when a new object is inserted, this integer
>> can be returned for the caller to utilize for future queries.
>>
>> Of course, this does not provide for persistance beyond the current session.
>>
>> If we can leverage the NAME column more, perhaps that could be used for
>> the persistant key. Basically that's what the DataCacheManager does
>> now. It uses a name like "EcoGrid Digir Query: <magic query string>"
>> for the name of the object.
>>
>> The current schema for the cachetable is:
>>
>> name: varchar
>> lsid: varchar
>> date: varchar
>> file: varchar
>>
>> With no constraints.
>>
>> I suggest we do this:
>>
>> id: IDENTITY
>> name: varchar
>> lsid: varchar
>> date: varchar
>> file: varchar
>> expiration: varchar (to be completed eventually)
>>
>> Perhaps force: lsid nullable unique - because it seems that's what it
>> should be.
>>
>> Change some signatures:
>>
>> integer CacheManager.insertObject( CacheObject ) - returns the id of the
>> inserted element.
>>
>> CacheObject CacheManager.getObject( int ) - returns the cache object for
>> the given id. Or null if not found.
>>
>> vector<CacheObject> CacheManager.getObjectsByName( String ) - returns a
>> vector of objects matching the given name string.
>>
>> Also, there are a few places where sql is inlined in the application.
>> This includes ddl statements as well as SIUD sql. Perhaps we should
>> consider pulling these things together into something which looks more
>> like a data access pattern. I think at least we should have the
>> application initialization code pulled together which would include
>> initialization of the user.dir directory structure and the clean database.
>>
>> Kevin
>>
>> Kevin Ruland wrote:
>>
>>
>>> Hi.
>>>
>>> One of the tickets assigned to me is to implement the cache expiration
>>> stuff so, in particular, the ecogrid queries are better behaved. I was
>>> expecting to try to migrate the old ecogrid mechanism from the
>>> DataCacheManager to the new CacheManager before getting this to work.
>>> However, I have some questions.
>>>
>>> The resultsets returned from the ecogrid queries do not have anything
>>> resembling an lsid which is the primary key into the CacheManager. We
>>> could hack together an lsid based on something like the search criteria,
>>> but strictly speaking this is not an lsid. In addition, there is no
>>> real guarantee that the resultset returned for the same query will
>>> always be the same result set. For example, additional Digir providers
>>> are available, or new data has been added to metacat, etc.
>>>
>>> I'm thinking we need some kind of internal lsid generator which can
>>> return new lsids for the local application. Either we'd have to have
>>> the objects with these internal (localhost?) lsids always have "session"
>>> lifespan, or we'll have to come up with a mechanism which always returns
>>> the same lsid for any arbitrary input. Maybe something like:
>>>
>>> class LSIDGenerator {
>>>
>>> static LSID generate( Object o );
>>>
>>> }
>>>
>>> Some kind of magic checksum is computed on o and used in the lsid. So
>>> when somebody does an Ecogrid quick search, the contents of the text box
>>> combined with the EcogridQueryEndpoint are passed into the generate
>>> method. Note: both these things are strings.
>>>
>>> A unique mapping from Object (or maybe the less general case String) ->
>>> LSID could then be used to lookup the resultset (or other object) from
>>> previously executed queries.
>>>
>>> I think we have the same problem when trying to use the CacheManager as
>>> a repository for intermediate results generated as part of a workflow.
>>>
>>> Kevin
>>>
>>> _______________________________________________
>>> Kepler-dev mailing list
>>> Kepler-dev at ecoinformatics.org
>>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
>>>
>>>
>>
>> _______________________________________________
>> Kepler-dev mailing list
>> Kepler-dev at ecoinformatics.org
>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
More information about the Kepler-dev
mailing list