[kepler-dev] [Bug 2895] New: - Distributed Execution Tracking Bug
bugzilla-daemon at ecoinformatics.org
bugzilla-daemon at ecoinformatics.org
Mon Jul 23 10:31:03 PDT 2007
http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2895
Summary: Distributed Execution Tracking Bug
Product: Kepler
Version: 1.0.0beta3
Platform: Other
OS/Version: All
Status: NEW
Severity: normal
Priority: P1
Component: core
AssignedTo: berkley at nceas.ucsb.edu
ReportedBy: berkley at nceas.ucsb.edu
QAContact: kepler-dev at ecoinformatics.org
Chad and Lucas are developing the distributed execution system for kepler. The
system is currently working in a very simplified way. This bug is a
consolidation of bug 1891 and bug 1899.
The following items need to be added:
* Make sure that the JNI libraries can be accessed via the slave and that the
ENM actors will work on the slave
* we might have to solve the problem that kepler has where you can't run
multiple instances of the application with the same user account. The problem
is that the cache uses an embedded database which only allows one connection at
a time. the db is stored in the .kepler directory so if you try to run kepler
twice at the same time, you'll get an error on the 2nd one that the db is
already in use. If we have a cluster where the slave is distributed via a
single home directory, this will be a problem
* Matt came up with the idea of using the ecogrid registry as a way of doing
node discovery.
* Get this to run on the NCEAS ROCKS cluster.
* we need to deal with transferring support files to the slave(s). This
includes doing the indirect transfers between slaves (instead of transferring
results back to the master then to the next slave, the slaves should be able to
transfer data between each other).
More information about the Kepler-dev
mailing list