[kepler-dev] Replacing DataCacheManager with CacheManager.

Wed Dec 14 09:58:03 PST 2005

I vote #4 for the first if it is not hard to modify. Then #5 and 
#6 for second.

Jing

Jing Tao
National Center for Ecological
Analysis and Synthesis (NCEAS)
735 State St. Suite 204
Santa Barbara, CA 93101

On Wed, 14 Dec 2005, Kevin Ruland wrote:

> Date: Wed, 14 Dec 2005 06:37:58 -0600
> From: Kevin Ruland <kruland at ku.edu>
> To: Kepler-Dev <kepler-dev at ecoinformatics.org>
> Subject: Re: [kepler-dev] Replacing DataCacheManager with CacheManager.
> 
>
> All,
>
> This discussion appears to have run it's course and there is no
> indication of consensus.  However, this is very much a gating issue
> because Jing and I had been postponing fixing some bugs in the
> EcogridQuery/DataCacheManagement system with the understanding it would
> be encorporated into the CacheManager.  From what I can see these are
> the options:
>
> 1)  Don't cache Ecogrid Query results at all.  This includes not only
> the Quick search results, but also the data behind the data source actors.
>
> 2)  Implement a new non-lsid based cache.
>
> 3) Implement a new non-lsid based cache and reimplement the CacheManager
> on top of it.
>
> 4)  Fix the CacheManager to allow non-lsid cache keys.
>
> 5)  Fake an lsid for the resultsets based on the resultset data.
>
> 6)  Fake an lsid for the resultsets based on the query data.
>
> My opinions is we should persue #4.  It results in less duplicated code,
> is not that difficult, and in the end can lead to a more useful caching
> mechanism.
>
> #5 and #6 are pretty much bad ideas.  If we are forced to generate
> artifical (and in the case of #5, useless) lsids, then it should
> indicate to us that lsids are not answer.
>
> #3 is not that far from #4.  In fact, I think they are essentially the
> same amount of work.  But I don't see that having an extra interface
> would be very beneficial and would probably lead to considerable
> confusion in a few months.
>
> #2 will just result in much more work for all involved.  Duplicated code
> is typically very bad especially when it is a complex, thread-sensitive
> system like a cache.
>
> #1 would pretty much violate what I believe are user requirements to be
> able to archive workflows and its data.
>
> Kevin
>
> Kevin Ruland wrote:
>
>> Bertram,
>>
>> I don't think I was very clear.  In order to utilize the cache you must
>> be able to determine if the object you desire is in the cache prior to
>> doing the long running process/query to retrieve it.  This means you
>> need to be able to determine what the cache key (I'm avoiding the term
>> lsid intentionally) of the object is based on the information you have
>> at hand prior to issuing the query.    If the only way to get the cache
>> key is to actually issue the query, then you might as well not use a
>> cache at all because you will always reissue the query.
>>
>> The quick search thing utilizes this information in order to issue the
>> query:
>>
>> Service end point.
>> Query string.
>>
>> So you need to be able to munge this data in order to come up with the
>> cache key.
>>
>> If the only key into the CacheManager is the lsid and the lsid is
>> generated by the server, then we need another caching mechanism just for
>> ecogrid resultsets.  Since the only functional difference between this
>> alternative cache system and the CacheManger is the cache key structure
>> we will end up with significant code duplication.  One way to eliminate
>> the code duplication is to build the CacheManger on top of a more
>> general cache management system which utilizes simple strings as the
>> cache key and rely on the natural conversion from Lsid object to string
>> representation.
>>
>> I don't really understand the provinance system nor how it works into
>> this.  But I wouldn't doubt that scientists will need to be able to
>> "import" data from some external system through simple text files or
>> graphical images.  Requiring the generation of an LSID for such files
>> might seem a little unnatural.  Don't really know where this thought it
>> headed right now and probably requires some insight from people who know.
>>
>> Kevin
>>
>>
>> Bertram Ludaescher wrote:
>>
>>
>>
>>> Hi Kevin:
>>>
>>> In general, intermediate data results, including a result-set of a
>>> query or the products of arbitrary acts might very well have an lsid
>>> and you even gave the reason for it ;-)
>>>
>>> Because the 2nd time around, when you run the same query and you get a
>>> different result, using the different assigned lsids, you might be
>>> able to "spot the difference" (provided the lsids can be dereferencesd
>>> and give you the results).
>>>
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Kepler-dev mailing list
>> Kepler-dev at ecoinformatics.org
>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
>>
>>
>
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
>
>