[kepler-dev] Q: SGE command execution on remote SGE cluster started on local PC fails due to environment problems

Hoeftberger, Johann Johann_Hoeftberger at DFCI.HARVARD.EDU
Fri Feb 13 07:49:22 PST 2015


Hello Jianwu,

thank you for your interesting answer, I didn't take notice of that 
settings by myself. Good hint!

I followed your advice and got rid of the qsub - unknown command error 
through it. But although I set SGE_ROOT and SGE_CELL in 
JobSubmitter-jobSubmitOptions options I still can't run my work flow, 
always get the following errror/exception:

ERROR (org.kepler.actor.job.JobSubmitter:fire:226) 
org.kepler.job.JobException: Error at job submission.
Command:cd [...]/SGE_Testscripts/[...]_Feb13_093517EST_0; 
/opt/sge/bin/lx24-amd64/qsub  SGE_ROOT=/opt/sge,SGE_CELL=default 
[...]/SGE_Testscripts/[...]_Feb13_093517EST_0/sgeTestscript.sh
Stdout:

Stderr:

Unable to initialize environment because of error: Please set the 
environment variable SGE_ROOT.
Exiting.

org.kepler.job.JobException: Error at job submission.
Command:cd [...]/SGE_Testscripts/[...]_Feb13_093517EST_0; 
/opt/sge/bin/lx24-amd64/qsub  SGE_ROOT=/opt/sge,SGE_CELL=default 
[...]/SGE_Testscripts/[...]_Feb13_093517EST_0/sgeTestscript.sh
Stdout:

Stderr:

Unable to initialize environment because of error: Please set the 
environment variable SGE_ROOT.
Exiting.

	at org.kepler.job.JobManager.submit(JobManager.java:307)
	at org.kepler.job.Job.submit(Job.java:375)
	at org.kepler.actor.job.JobSubmitter.fire(JobSubmitter.java:217)
	at 
ptolemy.actor.process.ProcessThread._iterateActor(ProcessThread.java:335)
	at ptolemy.actor.process.ProcessThread.run(ProcessThread.java:212)


I tried different ways to set the SGE_ROOT setting, alone, together with 
SGE_CELL, in parentheses, separated by comma, separated by semicolon. 
All with the same outcome, Kepler "thinks" SGE_root isn't set.

Do you have further ideas how I could solve my issue.

Best regards,
Johann


On 02/06/2015 05:20 PM, Jianwu Wang wrote:
> Hi Johann,
>
>      Can you try adding SGE_CELL and SGE_ROOT settings at 'job submit
> options' parameter of GenericJobLauncher actor or 'jobSubmitOptions'
> parameter of JobSubmitter actor? An example is
> "SGE_CELL=hoffman2,SGE_ROOT=/u/systems/SGE6.1u3". You might also need to
> set the path for your qsub (such as
> "/u/systems/SGE6.2u4/bin/lx26-amd64") to the 'binary path' parameter of
> GenericJobLauncher actor or 'binPath' parameter of JobManager actor. I
> remember I met a similar problem before and this did the trick.
>
> Best wishes
>
> Sincerely yours
>
> Jianwu WANG, Ph.D.
> jianwu at sdsc.edu
> http://users.sdsc.edu/~jianwu/
>
> Assistant Director for Research
> Workflows for Data Science (WorDS) Center of Excellence
> San Diego Supercomputer Center (SDSC)
> University of California, San Diego (UCSD)
>
> On 2/6/15 3:31 PM, Hoeftberger, Johann wrote:
>> Hello,
>>
>> I try to create my own simple Kepler test workflow on my local PC and
>> run it via SSH on a SGE cluster.
>> For that I took the Kepler demo Workflow
>> "Job_Submission_Using_JobManager" configured it for my local situation
>> an tried to run it.
>> I know that the connection to the cluster, the login with the given
>> credentials and the settings for the working directory work well. (I get
>> created directories and files for the cluster jobs which should be
>> executed.)
>>
>> The execution of the qsub command on the cluster doesn't work because at
>> first I got the exception "qsub: unknown command" and when I hardcoded
>> the full path for the qsub command in the implementation (to locate the
>> error only), I changed the initialization of private String
>> _sgeSubmitCmd in JobSupportSGE.java to the full path of the qsub command
>> on my SGE cluster, I got the next exception "Unable to initialize
>> environment because of error: Please set the environment variable
>> SGE_ROOT".
>>
>> SGE_ROOT is properly set on the SGE cluster. I tried to set it
>> additionally on my local PC (afterwards I restarted Eclipse where my
>> Kepler instance is running) and in the Eclipse - Run - Run Configuration
>> - Java Application - Environment. All these tries didn't solve the
>> problem, I still get the same exception about the uninitialized
>> environment.
>>
>> I found in SshExec.java::public int executeCmd(String command,
>> OutputStream streamOut, OutputStream streamErr, String thirdPartyTarget)
>> the documentation
>>
>> /**
>>      * Execute a command on the remote machine and expect a
>> password/passphrase
>>      * question from the command. The stream <i>streamOut</i> should be
>> provided
>>      * to get the output and errors merged. <i>streamErr</i> is not
>> used in
>> this
>>      * method (it will be empty string finally).
>>      *
>>      * @return exit code of command if execution succeeded,
>>      * @throws ExecTimeoutException
>>      *             if the command failed because of timeout
>>      * @throws SshException
>>      *             if an error occurs for the ssh connection during
>> the command
>>      *             execution Note: in this method, the SSH Channel is
>> forcing a
>>      *             pseudo-terminal allocation {see setPty(true)} to allow
>> remote
>>      *             commands to read something from their stdin (i.e.
>> from us
>>      *             here), thus, (1) remote environment is not set from
>>      *             .bashrc/.cshrc and (2) stdout and stderr come back
>> merged in
>>      *             one stream.
>>      */
>>
>> so I guess my problem is caused through an uninitialized or wrong
>> initialized used (pseudo) terminal.
>> For me it seems the set environment variables on the used systems (SGE
>> cluster, local PC) are not read and the Kepler implementation itself
>> doesn't set proper values for the needed variables.
>>
>> I haven't found a description or a configuration possibility for my
>> issue yet. So I think it is some kind of implementation flaw.
>>
>> Can somebody give me a hint how to solve this issue. I would like to run
>> Kepler locally on my PC but execute parts of my Workflow remotely on my
>> SGE cluster.
>>
>>
>> Kind regards,
>> Johann Hoeftberger
>>
>>
>
>
>



-- 



The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.



More information about the Kepler-dev mailing list