[kepler-dev] Q: SGE command execution on remote SGE cluster started on local PC fails due to environment problems

Jianwu Wang jianwu at sdsc.edu
Fri Feb 13 14:31:41 PST 2015


Hi Johann,

         I checked my old workflow and found I have to make some changes 
to make it working. The jobSubmitOptions setting working for me right 
now is "-ac SGE_CELL=hoffman2,SGE_ROOT=/u/systems/UGE8.0.1 -l 
h_data=1G". Since different clusters have different SGE installed, you 
need to find the one for your cluster. But it looks "-ac" is now needed 
for SGE_CELL and SEGE_ROOT.

Best wishes

Sincerely yours

Jianwu WANG, Ph.D.
jianwu at sdsc.edu
http://users.sdsc.edu/~jianwu/

Assistant Director for Research
Workflows for Data Science (WorDS) Center of Excellence
San Diego Supercomputer Center (SDSC)
University of California, San Diego (UCSD)

On 2/13/15 10:49 AM, Hoeftberger, Johann wrote:
> Hello Jianwu,
>
> thank you for your interesting answer, I didn't take notice of that
> settings by myself. Good hint!
>
> I followed your advice and got rid of the qsub - unknown command error
> through it. But although I set SGE_ROOT and SGE_CELL in
> JobSubmitter-jobSubmitOptions options I still can't run my work flow,
> always get the following errror/exception:
>
> ERROR (org.kepler.actor.job.JobSubmitter:fire:226)
> org.kepler.job.JobException: Error at job submission.
> Command:cd [...]/SGE_Testscripts/[...]_Feb13_093517EST_0;
> /opt/sge/bin/lx24-amd64/qsub  SGE_ROOT=/opt/sge,SGE_CELL=default
> [...]/SGE_Testscripts/[...]_Feb13_093517EST_0/sgeTestscript.sh
> Stdout:
>
> Stderr:
>
> Unable to initialize environment because of error: Please set the
> environment variable SGE_ROOT.
> Exiting.
>
> org.kepler.job.JobException: Error at job submission.
> Command:cd [...]/SGE_Testscripts/[...]_Feb13_093517EST_0;
> /opt/sge/bin/lx24-amd64/qsub  SGE_ROOT=/opt/sge,SGE_CELL=default
> [...]/SGE_Testscripts/[...]_Feb13_093517EST_0/sgeTestscript.sh
> Stdout:
>
> Stderr:
>
> Unable to initialize environment because of error: Please set the
> environment variable SGE_ROOT.
> Exiting.
>
> 	at org.kepler.job.JobManager.submit(JobManager.java:307)
> 	at org.kepler.job.Job.submit(Job.java:375)
> 	at org.kepler.actor.job.JobSubmitter.fire(JobSubmitter.java:217)
> 	at
> ptolemy.actor.process.ProcessThread._iterateActor(ProcessThread.java:335)
> 	at ptolemy.actor.process.ProcessThread.run(ProcessThread.java:212)
>
>
> I tried different ways to set the SGE_ROOT setting, alone, together with
> SGE_CELL, in parentheses, separated by comma, separated by semicolon.
> All with the same outcome, Kepler "thinks" SGE_root isn't set.
>
> Do you have further ideas how I could solve my issue.
>
> Best regards,
> Johann
>
>
> On 02/06/2015 05:20 PM, Jianwu Wang wrote:
>> Hi Johann,
>>
>>       Can you try adding SGE_CELL and SGE_ROOT settings at 'job submit
>> options' parameter of GenericJobLauncher actor or 'jobSubmitOptions'
>> parameter of JobSubmitter actor? An example is
>> "SGE_CELL=hoffman2,SGE_ROOT=/u/systems/SGE6.1u3". You might also need to
>> set the path for your qsub (such as
>> "/u/systems/SGE6.2u4/bin/lx26-amd64") to the 'binary path' parameter of
>> GenericJobLauncher actor or 'binPath' parameter of JobManager actor. I
>> remember I met a similar problem before and this did the trick.
>>
>> Best wishes
>>
>> Sincerely yours
>>
>> Jianwu WANG, Ph.D.
>> jianwu at sdsc.edu
>> http://users.sdsc.edu/~jianwu/
>>
>> Assistant Director for Research
>> Workflows for Data Science (WorDS) Center of Excellence
>> San Diego Supercomputer Center (SDSC)
>> University of California, San Diego (UCSD)
>>
>> On 2/6/15 3:31 PM, Hoeftberger, Johann wrote:
>>> Hello,
>>>
>>> I try to create my own simple Kepler test workflow on my local PC and
>>> run it via SSH on a SGE cluster.
>>> For that I took the Kepler demo Workflow
>>> "Job_Submission_Using_JobManager" configured it for my local situation
>>> an tried to run it.
>>> I know that the connection to the cluster, the login with the given
>>> credentials and the settings for the working directory work well. (I get
>>> created directories and files for the cluster jobs which should be
>>> executed.)
>>>
>>> The execution of the qsub command on the cluster doesn't work because at
>>> first I got the exception "qsub: unknown command" and when I hardcoded
>>> the full path for the qsub command in the implementation (to locate the
>>> error only), I changed the initialization of private String
>>> _sgeSubmitCmd in JobSupportSGE.java to the full path of the qsub command
>>> on my SGE cluster, I got the next exception "Unable to initialize
>>> environment because of error: Please set the environment variable
>>> SGE_ROOT".
>>>
>>> SGE_ROOT is properly set on the SGE cluster. I tried to set it
>>> additionally on my local PC (afterwards I restarted Eclipse where my
>>> Kepler instance is running) and in the Eclipse - Run - Run Configuration
>>> - Java Application - Environment. All these tries didn't solve the
>>> problem, I still get the same exception about the uninitialized
>>> environment.
>>>
>>> I found in SshExec.java::public int executeCmd(String command,
>>> OutputStream streamOut, OutputStream streamErr, String thirdPartyTarget)
>>> the documentation
>>>
>>> /**
>>>       * Execute a command on the remote machine and expect a
>>> password/passphrase
>>>       * question from the command. The stream <i>streamOut</i> should be
>>> provided
>>>       * to get the output and errors merged. <i>streamErr</i> is not
>>> used in
>>> this
>>>       * method (it will be empty string finally).
>>>       *
>>>       * @return exit code of command if execution succeeded,
>>>       * @throws ExecTimeoutException
>>>       *             if the command failed because of timeout
>>>       * @throws SshException
>>>       *             if an error occurs for the ssh connection during
>>> the command
>>>       *             execution Note: in this method, the SSH Channel is
>>> forcing a
>>>       *             pseudo-terminal allocation {see setPty(true)} to allow
>>> remote
>>>       *             commands to read something from their stdin (i.e.
>>> from us
>>>       *             here), thus, (1) remote environment is not set from
>>>       *             .bashrc/.cshrc and (2) stdout and stderr come back
>>> merged in
>>>       *             one stream.
>>>       */
>>>
>>> so I guess my problem is caused through an uninitialized or wrong
>>> initialized used (pseudo) terminal.
>>> For me it seems the set environment variables on the used systems (SGE
>>> cluster, local PC) are not read and the Kepler implementation itself
>>> doesn't set proper values for the needed variables.
>>>
>>> I haven't found a description or a configuration possibility for my
>>> issue yet. So I think it is some kind of implementation flaw.
>>>
>>> Can somebody give me a hint how to solve this issue. I would like to run
>>> Kepler locally on my PC but execute parts of my Workflow remotely on my
>>> SGE cluster.
>>>
>>>
>>> Kind regards,
>>> Johann Hoeftberger
>>>
>>>
>>
>>
>
>



More information about the Kepler-dev mailing list