[kepler-dev] [Bug 2895] New: - Distributed Execution Tracking Bug

bugzilla-daemon@ecoinformatics.org bugzilla-daemon at ecoinformatics.org
Mon Jul 23 10:31:03 PDT 2007


http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2895

           Summary: Distributed Execution Tracking Bug
           Product: Kepler
           Version: 1.0.0beta3
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P1
         Component: core
        AssignedTo: berkley at nceas.ucsb.edu
        ReportedBy: berkley at nceas.ucsb.edu
         QAContact: kepler-dev at ecoinformatics.org


Chad and Lucas are developing the distributed execution system for kepler.  The
system is currently working in a very simplified way.  This bug is a
consolidation of bug 1891 and bug 1899.  

The following items need to be added:

* Make sure that the JNI libraries can be accessed via the slave and that the
ENM actors will work on the slave

* we might have to solve the problem that kepler has where you can't run
multiple instances of the application with the same user account.  The problem
is that the cache uses an embedded database which only allows one connection at
a time.  the db is stored in the .kepler directory so if you try to run kepler
twice at the same time, you'll get an error on the 2nd one that the db is
already in use.  If we have a cluster where the slave is distributed via a
single home directory, this will be a problem

* Matt came up with the idea of using the ecogrid registry as a way of doing
node discovery. 

* Get this to run on the NCEAS ROCKS cluster.

* we need to deal with transferring support files to the slave(s).  This
includes doing the indirect transfers between slaves (instead of transferring
results back to the master then to the next slave, the slaves should be able to
transfer data between each other).


More information about the Kepler-dev mailing list