[kepler-dev] Replacing DataCacheManager with CacheManager.

Wed Dec 14 06:58:18 PST 2005

Hi Kevin:

My comments were not geared towards the cache system itself. I haven't
followed that one closely enought to make a specific comment on it
right now. My comments were on the provenance framework -- and for
that one on the general architecture, not necessarily some of the
specific things being implemented (e.g. by Kepler/SPA folks) right
now.

So you guys who are in the cache manager trenches hopefully know best
where it hurts and how to releave the pain ;-)

My $0.02: result caching is usually a good idea (of queries, data
objects etc). But it's no subsitute for solving underlying efficiency
problems that some of the catalog system might have. For that,
maybe indexes are what's needed (but that's a different discussion,
and one that I think was started before :-)

cheers

Bertram

>>> On Tue, 13 Dec 2005 10:21:44 -0600
>>> Kevin Ruland <kruland at ku.edu> wrote: 
KR> 
KR> Bertram,
KR> 
KR> I don't think I was very clear.  In order to utilize the cache you must 
KR> be able to determine if the object you desire is in the cache prior to 
KR> doing the long running process/query to retrieve it.  This means you 
KR> need to be able to determine what the cache key (I'm avoiding the term 
KR> lsid intentionally) of the object is based on the information you have 
KR> at hand prior to issuing the query.    If the only way to get the cache 
KR> key is to actually issue the query, then you might as well not use a 
KR> cache at all because you will always reissue the query.
KR> 
KR> The quick search thing utilizes this information in order to issue the 
KR> query:
KR> 
KR> Service end point.
KR> Query string.
KR> 
KR> So you need to be able to munge this data in order to come up with the 
KR> cache key.
KR> 
KR> If the only key into the CacheManager is the lsid and the lsid is 
KR> generated by the server, then we need another caching mechanism just for 
KR> ecogrid resultsets.  Since the only functional difference between this 
KR> alternative cache system and the CacheManger is the cache key structure 
KR> we will end up with significant code duplication.  One way to eliminate 
KR> the code duplication is to build the CacheManger on top of a more 
KR> general cache management system which utilizes simple strings as the 
KR> cache key and rely on the natural conversion from Lsid object to string 
KR> representation.
KR> 
KR> I don't really understand the provinance system nor how it works into 
KR> this.  But I wouldn't doubt that scientists will need to be able to 
KR> "import" data from some external system through simple text files or 
KR> graphical images.  Requiring the generation of an LSID for such files 
KR> might seem a little unnatural.  Don't really know where this thought it 
KR> headed right now and probably requires some insight from people who know.
KR> 
KR> Kevin
KR> 
KR> 
KR> Bertram Ludaescher wrote:
KR> 
>> Hi Kevin:
>> 
>> In general, intermediate data results, including a result-set of a
>> query or the products of arbitrary acts might very well have an lsid
>> and you even gave the reason for it ;-)
>> 
>> Because the 2nd time around, when you run the same query and you get a
>> different result, using the different assigned lsids, you might be
>> able to "spot the difference" (provided the lsids can be dereferencesd
>> and give you the results).
>> 
>> 
>>