[kepler-dev] [Bug 4764] ProvenanceRecorder.changeExecuted slow after workflow run

bugzilla-daemon at ecoinformatics.org bugzilla-daemon at ecoinformatics.org
Fri Feb 12 16:28:59 PST 2010


http://bugzilla.ecoinformatics.org/show_bug.cgi?id=4764

Oliver Soong <soong at nceas.ucsb.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Platform|Other                       |All
         OS/Version|Windows XP                  |All

--- Comment #11 from Oliver Soong <soong at nceas.ucsb.edu> 2010-02-12 16:28:58 PST ---
So I wiped my .kepler and KeplerData, started Kepler (to initialize these
folders), and things ran "fast" (1-3 sec per 3000+).  I then copied the
provenance folder from the old KeplerData over the fresh copy and ran, and
things ran slow (30+ sec per 3000+).  The "fast" provenance DB was about 8MB
and the slow one was about 65MB.  Scaling seems to be non-linear, although this
isn't the best way to check timing.  This also explains why the slowness seemed
to be getting worse over time (but not fast enough for me to be sure of).

So it looks to me like the provenance DB bloats pretty quickly, slowing
everything down.  This is a little frustrating as an end user, but it seems
like the code operates correctly.  I'm willing to let this get triaged, as I
personally don't have any qualms with blowing away KeplerData and .kepler
frequently, but I believe this isn't how the design was envisioned.

I'm not entirely certain what provenance gets used for, but maybe certain
entries can be culled?  Alternatively, maybe we can allow the end user to flush
selected content, a la wrm's ability to delete rows?

Since it no longer seems to be a Windows problem, I'm changing the platform for
the bug.

-- 
Configure bugmail: http://bugzilla.ecoinformatics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.


More information about the Kepler-dev mailing list