[kepler-dev] Caching of Data in Kepler

Tue Sep 14 07:31:54 PDT 2004

Yesterday morning I put together a caching mechanism for the Ecogrid 
DataSources.

It is a hybred memory cache and file cache with threading. Here is how 
it works:

The CacheManager maintains a list of cached items. The base class is 
abstract enabling the implementing classes to implement "how" the data 
is obtained. The base class is responsible for threading, loading old 
data and saving out new data.

When a request is made the cache item is created on its own thread and 
begins to download the data, in the mean time it marks itself as "busy." 
When it finishes  it notifies any listeners that it is done and marks 
itself "complete"

The cache manager serializes itself out as an XML file, each entry in 
the cache is saved in a separate file thus making it simple and flexible.

The items keep track of their creation date and I could easily add the 
capability for them to automatically retrieve a newer version of their 
contents. The impl I have now keeps track of the ecogrid info necessary 
to retrieve the data.

So at the moment when a DataSource needs its data it just asks for it, 
then the cache will get it and notify them when it is there, it is all 
very transparent to the DS. The big difference is that it is more 
asynchronous than before.

I also created a quick little "Data Cache Viewer" that displays the 
entries in the cache "catalog"

Under the File menu item you can:
* Refresh a selected cached item
* Refresh all the items
* Delete a single cache item
* Delete all the cache items

I could easily add to the viewer a way to view the actual contents of a 
cached item (if we need it). Some scientists may want that....

After I got this working with the EML200DataSource, Jing informs me that 
Monarch has a "generic" memory cache and file cache. I haven't had the 
time yet to review the impls. Now, we can go with this specific impl 
that is tailored to our DataSource objects, or I could adapt it to use 
Monarch's file cache for serializing the output.

Any thoughts? Or maybe we just use this for now and look at the issue 
again after our Oct. deadline.

Also, it seems that we will also want to cache some of the metadata 
(table entity info) so a user could actually run "offline" if they 
wanted to or needed to.

Rod

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/kepler-dev/attachments/20040914/e71a2c6a/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dcv2.jpg
Type: image/jpeg
Size: 20644 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/kepler-dev/attachments/20040914/e71a2c6a/dcv2.jpg