[kepler-dev] Q: SGE command execution on remote SGE cluster started on local PC fails due to environment problems
Hoeftberger, Johann
Johann_Hoeftberger at DFCI.HARVARD.EDU
Tue Feb 17 08:55:29 PST 2015
Hello Jianwu,
thank you for your help.
But I am sorry, it doesn't work for me. It seems, I can't influence /
set the SGE_ROOT setting for Kepler with the mentioned "-ac SGE_CELL=...
SGE_ROOT=..." parameter / method in my case.
I followed your hint and added the -ac token, I tried to set just
SGE_ROOT ("-ac SGE_ROOT=..."), SGE_CELL and SGE_ROOT ("-ac
SGE_CELL=...,SGE_ROOT=") together. I used the values for these two
parameters I got from the environment variables set directly on the
cluster (ssh remote access, bash).
Always with the same, former outcome of an exception.
ERROR (org.kepler.actor.job.JobSubmitter:fire:226)
org.kepler.job.JobException: Error at job submission.
Command:cd [...]/SGE_Testscripts/[...]_Feb17_113738EST_0;
/opt/sge/bin/lx24-amd64/qsub -ac SGE_ROOT=/opt/sge
[...]/SGE_Testscripts/[...]_Feb17_113738EST_0/sgeTestscript.sh
Stdout:
Stderr:
Unable to initialize environment because of error: Please set the
environment variable SGE_ROOT.
Exiting.
org.kepler.job.JobException: Error at job submission.
Command:cd [...]/SGE_Testscripts/[...]_Feb17_113738EST_0;
/opt/sge/bin/lx24-amd64/qsub -ac SGE_ROOT=/opt/sge
[...]/SGE_Testscripts/[...]_Feb17_113738EST_0/sgeTestscript.sh
Stdout:
Stderr:
Unable to initialize environment because of error: Please set the
environment variable SGE_ROOT.
Exiting.
at org.kepler.job.JobManager.submit(JobManager.java:307)
at org.kepler.job.Job.submit(Job.java:375)
at org.kepler.actor.job.JobSubmitter.fire(JobSubmitter.java:217)
at
ptolemy.actor.process.ProcessThread._iterateActor(ProcessThread.java:335)
at ptolemy.actor.process.ProcessThread.run(ProcessThread.java:212)
So although I explicitly set SGE_ROOT with your mentioned approach
Kepler thinks it isn't set. Which means, it doesn't make use of the
setting at all. Are there further other ways to solve this issue?
I have to make use of our SGE cluster in my work flows and can't do this
as long as the related SGE actor doesn't work.
I appreciate any advices.
Best regards,
Johann
On 02/13/2015 05:31 PM, Jianwu Wang wrote:
> Hi Johann,
>
> I checked my old workflow and found I have to make some changes
> to make it working. The jobSubmitOptions setting working for me right
> now is "-ac SGE_CELL=hoffman2,SGE_ROOT=/u/systems/UGE8.0.1 -l
> h_data=1G". Since different clusters have different SGE installed, you
> need to find the one for your cluster. But it looks "-ac" is now needed
> for SGE_CELL and SEGE_ROOT.
>
> Best wishes
>
> Sincerely yours
>
> Jianwu WANG, Ph.D.
> jianwu at sdsc.edu
> http://users.sdsc.edu/~jianwu/
>
> Assistant Director for Research
> Workflows for Data Science (WorDS) Center of Excellence
> San Diego Supercomputer Center (SDSC)
> University of California, San Diego (UCSD)
>
> On 2/13/15 10:49 AM, Hoeftberger, Johann wrote:
>> Hello Jianwu,
>>
>> thank you for your interesting answer, I didn't take notice of that
>> settings by myself. Good hint!
>>
>> I followed your advice and got rid of the qsub - unknown command error
>> through it. But although I set SGE_ROOT and SGE_CELL in
>> JobSubmitter-jobSubmitOptions options I still can't run my work flow,
>> always get the following errror/exception:
>>
>> ERROR (org.kepler.actor.job.JobSubmitter:fire:226)
>> org.kepler.job.JobException: Error at job submission.
>> Command:cd [...]/SGE_Testscripts/[...]_Feb13_093517EST_0;
>> /opt/sge/bin/lx24-amd64/qsub SGE_ROOT=/opt/sge,SGE_CELL=default
>> [...]/SGE_Testscripts/[...]_Feb13_093517EST_0/sgeTestscript.sh
>> Stdout:
>>
>> Stderr:
>>
>> Unable to initialize environment because of error: Please set the
>> environment variable SGE_ROOT.
>> Exiting.
>>
>> org.kepler.job.JobException: Error at job submission.
>> Command:cd [...]/SGE_Testscripts/[...]_Feb13_093517EST_0;
>> /opt/sge/bin/lx24-amd64/qsub SGE_ROOT=/opt/sge,SGE_CELL=default
>> [...]/SGE_Testscripts/[...]_Feb13_093517EST_0/sgeTestscript.sh
>> Stdout:
>>
>> Stderr:
>>
>> Unable to initialize environment because of error: Please set the
>> environment variable SGE_ROOT.
>> Exiting.
>>
>> at org.kepler.job.JobManager.submit(JobManager.java:307)
>> at org.kepler.job.Job.submit(Job.java:375)
>> at org.kepler.actor.job.JobSubmitter.fire(JobSubmitter.java:217)
>> at
>> ptolemy.actor.process.ProcessThread._iterateActor(ProcessThread.java:335)
>> at ptolemy.actor.process.ProcessThread.run(ProcessThread.java:212)
>>
>>
>> I tried different ways to set the SGE_ROOT setting, alone, together with
>> SGE_CELL, in parentheses, separated by comma, separated by semicolon.
>> All with the same outcome, Kepler "thinks" SGE_root isn't set.
>>
>> Do you have further ideas how I could solve my issue.
>>
>> Best regards,
>> Johann
>>
>>
>> On 02/06/2015 05:20 PM, Jianwu Wang wrote:
>>> Hi Johann,
>>>
>>> Can you try adding SGE_CELL and SGE_ROOT settings at 'job submit
>>> options' parameter of GenericJobLauncher actor or 'jobSubmitOptions'
>>> parameter of JobSubmitter actor? An example is
>>> "SGE_CELL=hoffman2,SGE_ROOT=/u/systems/SGE6.1u3". You might also need to
>>> set the path for your qsub (such as
>>> "/u/systems/SGE6.2u4/bin/lx26-amd64") to the 'binary path' parameter of
>>> GenericJobLauncher actor or 'binPath' parameter of JobManager actor. I
>>> remember I met a similar problem before and this did the trick.
>>>
>>> Best wishes
>>>
>>> Sincerely yours
>>>
>>> Jianwu WANG, Ph.D.
>>> jianwu at sdsc.edu
>>> http://users.sdsc.edu/~jianwu/
>>>
>>> Assistant Director for Research
>>> Workflows for Data Science (WorDS) Center of Excellence
>>> San Diego Supercomputer Center (SDSC)
>>> University of California, San Diego (UCSD)
>>>
>>> On 2/6/15 3:31 PM, Hoeftberger, Johann wrote:
>>>> Hello,
>>>>
>>>> I try to create my own simple Kepler test workflow on my local PC and
>>>> run it via SSH on a SGE cluster.
>>>> For that I took the Kepler demo Workflow
>>>> "Job_Submission_Using_JobManager" configured it for my local situation
>>>> an tried to run it.
>>>> I know that the connection to the cluster, the login with the given
>>>> credentials and the settings for the working directory work well. (I
>>>> get
>>>> created directories and files for the cluster jobs which should be
>>>> executed.)
>>>>
>>>> The execution of the qsub command on the cluster doesn't work
>>>> because at
>>>> first I got the exception "qsub: unknown command" and when I hardcoded
>>>> the full path for the qsub command in the implementation (to locate the
>>>> error only), I changed the initialization of private String
>>>> _sgeSubmitCmd in JobSupportSGE.java to the full path of the qsub
>>>> command
>>>> on my SGE cluster, I got the next exception "Unable to initialize
>>>> environment because of error: Please set the environment variable
>>>> SGE_ROOT".
>>>>
>>>> SGE_ROOT is properly set on the SGE cluster. I tried to set it
>>>> additionally on my local PC (afterwards I restarted Eclipse where my
>>>> Kepler instance is running) and in the Eclipse - Run - Run
>>>> Configuration
>>>> - Java Application - Environment. All these tries didn't solve the
>>>> problem, I still get the same exception about the uninitialized
>>>> environment.
>>>>
>>>> I found in SshExec.java::public int executeCmd(String command,
>>>> OutputStream streamOut, OutputStream streamErr, String
>>>> thirdPartyTarget)
>>>> the documentation
>>>>
>>>> /**
>>>> * Execute a command on the remote machine and expect a
>>>> password/passphrase
>>>> * question from the command. The stream <i>streamOut</i>
>>>> should be
>>>> provided
>>>> * to get the output and errors merged. <i>streamErr</i> is not
>>>> used in
>>>> this
>>>> * method (it will be empty string finally).
>>>> *
>>>> * @return exit code of command if execution succeeded,
>>>> * @throws ExecTimeoutException
>>>> * if the command failed because of timeout
>>>> * @throws SshException
>>>> * if an error occurs for the ssh connection during
>>>> the command
>>>> * execution Note: in this method, the SSH Channel is
>>>> forcing a
>>>> * pseudo-terminal allocation {see setPty(true)} to
>>>> allow
>>>> remote
>>>> * commands to read something from their stdin (i.e.
>>>> from us
>>>> * here), thus, (1) remote environment is not set from
>>>> * .bashrc/.cshrc and (2) stdout and stderr come back
>>>> merged in
>>>> * one stream.
>>>> */
>>>>
>>>> so I guess my problem is caused through an uninitialized or wrong
>>>> initialized used (pseudo) terminal.
>>>> For me it seems the set environment variables on the used systems (SGE
>>>> cluster, local PC) are not read and the Kepler implementation itself
>>>> doesn't set proper values for the needed variables.
>>>>
>>>> I haven't found a description or a configuration possibility for my
>>>> issue yet. So I think it is some kind of implementation flaw.
>>>>
>>>> Can somebody give me a hint how to solve this issue. I would like to
>>>> run
>>>> Kepler locally on my PC but execute parts of my Workflow remotely on my
>>>> SGE cluster.
>>>>
>>>>
>>>> Kind regards,
>>>> Johann Hoeftberger
>>>>
>>>>
>>>
>>>
>>
>>
>
>
>
--
The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.
More information about the Kepler-dev
mailing list