[kepler-dev] [Bug 5429] New: improve default provenance store performance

Fri Jun 24 13:13:24 PDT 2011

http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5429

           Summary: improve default provenance store performance
           Product: Kepler
           Version: 2.1.0
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: provenance
        AssignedTo: barseghian at nceas.ucsb.edu
        ReportedBy: barseghian at nceas.ucsb.edu
         QAContact: kepler-dev at kepler-project.org
   Estimated Hours: 0.0

Currently there can be some big performance penalties when using kepler with
provenance turned on (by default using hsql). It would be great to improve
these.

Unless noted, references to workflow execution times below refer to the reap
GDD wf set to process 200days of data:
https://code.ecoinformatics.org/code/reap/trunk/usecases/terrestrial/workflows/derivedMETProducts/growingDegreeDays.kar

I see/saw a few issues:

-1) at one point I mentioned kepler shutdown was taking a very long time. This
isn't an issue anymore, shutdown seems near instant.

0) the pre-initialize stage of workflow execution can take a very long time and
grows longer w/ each subsequent execution when running with a provenance store
that's large. E.g. up to 15m.
Dan's fixed this issue, I believe w/ r27746. Pre-init is now close to instant
or just a few seconds.

1) execution of the workflow w/ provenance off takes a few seconds. With
provenance on, it takes about 4min to run the first time with an empty
provenance store.

2) subsequent executions of the same workflow take longer to run. 
E.g. Here are the execution times of 9 runs of the workflow on 2 different
machines: 
10.6 macbook 2.2ghz intel core 2 duo w/ 4gb RAM:
4:01, 4:03, 3:57, 7:43, 8:07, 8:01, 8:33, 8:10, 8:33, 
ubuntu 10.04 dual 3ghz w/ 2gb RAM: 
4:03, 4:13, 4:32, 9:13, 12:32, 8:08, 9:54, 9:06, 11:53 

3) startup time can take a very long time when the prior Kepler invocation ran
data/token intensive workflows. I believe what's happening is hsql is
incorporating the changes in the log file into the .data file. I think
something's happening w/ the .backup file too. The data file slowly grows very
large (a lot more than by 200mb), and finally the log file drops to near 0, and
then the data file decreases in size to a size larger than where it started. I
think with the default log file max size of 200mb, startup can take on the
order of 10-20m. I've tested w/ a variety of log file sizes. Making it
dramatically smaller, e.g. 5mb, dramatically improves startup time, but comes
at a huge workflow execution time penalty (~20m to run the wf), so this is an
unacceptable fix. The execution penalty starts happening when the log file max
size is set smaller than about 100mb. With a 100mb log file, startup is still
very slow.

One thing I've found that improves execution time performance is increasing the
'memory cache exponent' setting (hsqldb.cache_scale) from the default of 14 to
the max of 18. This setting "Indicates the maximum number of rows of cached
tables that are held in memory, calculated as 3 *(2**value) (three multiplied
by (two to the power value)). The default results in up to 3*16384 rows from
all cached tables being held in memory at any time." 
With a 200mb log file max size, and cache_scale=18, the first run of the
workflow takes about 2:17.

-- 
Configure bugmail: http://bugzilla.ecoinformatics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.