[kepler-dev] Replacing DataCacheManager with CacheManager.

Kevin Ruland kruland at ku.edu
Wed Dec 14 04:37:58 PST 2005


All,

This discussion appears to have run it's course and there is no
indication of consensus.  However, this is very much a gating issue
because Jing and I had been postponing fixing some bugs in the
EcogridQuery/DataCacheManagement system with the understanding it would
be encorporated into the CacheManager.  From what I can see these are
the options:

1)  Don't cache Ecogrid Query results at all.  This includes not only
the Quick search results, but also the data behind the data source actors.

2)  Implement a new non-lsid based cache.

3) Implement a new non-lsid based cache and reimplement the CacheManager
on top of it.

4)  Fix the CacheManager to allow non-lsid cache keys.

5)  Fake an lsid for the resultsets based on the resultset data.

6)  Fake an lsid for the resultsets based on the query data.

My opinions is we should persue #4.  It results in less duplicated code,
is not that difficult, and in the end can lead to a more useful caching
mechanism.

#5 and #6 are pretty much bad ideas.  If we are forced to generate
artifical (and in the case of #5, useless) lsids, then it should
indicate to us that lsids are not answer.

#3 is not that far from #4.  In fact, I think they are essentially the
same amount of work.  But I don't see that having an extra interface
would be very beneficial and would probably lead to considerable
confusion in a few months.

#2 will just result in much more work for all involved.  Duplicated code
is typically very bad especially when it is a complex, thread-sensitive
system like a cache.

#1 would pretty much violate what I believe are user requirements to be
able to archive workflows and its data.

Kevin

Kevin Ruland wrote:

>Bertram,
>
>I don't think I was very clear.  In order to utilize the cache you must 
>be able to determine if the object you desire is in the cache prior to 
>doing the long running process/query to retrieve it.  This means you 
>need to be able to determine what the cache key (I'm avoiding the term 
>lsid intentionally) of the object is based on the information you have 
>at hand prior to issuing the query.    If the only way to get the cache 
>key is to actually issue the query, then you might as well not use a 
>cache at all because you will always reissue the query.
>
>The quick search thing utilizes this information in order to issue the 
>query:
>
>Service end point.
>Query string.
>
>So you need to be able to munge this data in order to come up with the 
>cache key.
>
>If the only key into the CacheManager is the lsid and the lsid is 
>generated by the server, then we need another caching mechanism just for 
>ecogrid resultsets.  Since the only functional difference between this 
>alternative cache system and the CacheManger is the cache key structure 
>we will end up with significant code duplication.  One way to eliminate 
>the code duplication is to build the CacheManger on top of a more 
>general cache management system which utilizes simple strings as the 
>cache key and rely on the natural conversion from Lsid object to string 
>representation.
>
>I don't really understand the provinance system nor how it works into 
>this.  But I wouldn't doubt that scientists will need to be able to 
>"import" data from some external system through simple text files or 
>graphical images.  Requiring the generation of an LSID for such files 
>might seem a little unnatural.  Don't really know where this thought it 
>headed right now and probably requires some insight from people who know.
>
>Kevin
>
>
>Bertram Ludaescher wrote:
>
>  
>
>>Hi Kevin:
>>
>>In general, intermediate data results, including a result-set of a
>>query or the products of arbitrary acts might very well have an lsid
>>and you even gave the reason for it ;-)
>>
>>Because the 2nd time around, when you run the same query and you get a
>>different result, using the different assigned lsids, you might be
>>able to "spot the difference" (provided the lsids can be dereferencesd
>>and give you the results).
>>
>> 
>>
>>    
>>
>
>_______________________________________________
>Kepler-dev mailing list
>Kepler-dev at ecoinformatics.org
>http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
>  
>



More information about the Kepler-dev mailing list