[kepler-dev] Kepler Remote Execution Engine

Tue Jul 1 17:21:50 PDT 2008

I enabled wiki access for Paul -- sorry for the delay while I was on vacation.

Note there is already a section on the wiki on distributed computing and 
execution engines.  Chad and Lucas have already implemented the infrastructure 
to send workflows and subworkflows to remote machines, execute the workflow, and 
return the results.  He has also implemented job monitoring services for 
checking status on those runs.  This work allows a Kepler workflow to distribute 
computations over hundreds of nodes of a cluster without a scientific user 
needing to know how distributed or remote computing works.  This work is 
partially described here:

http://www.kepler-project.org/Wiki.jsp?page=WorkingDistributedFeatures

Other older but more general discussions of distributed features are here:

http://www.kepler-project.org/Wiki.jsp?page=DistributedKepler

Matt

Paul Allen wrote:
> I second the notion that getting Tristan's start into a wiki that we can 
> all edit is critical. I recall Matt saying that we all should be able to 
> edit the Kepler-Project.org pages, but I don't recall the details. I've 
> tried logging in with my CVS username password, but that doesn't  seem 
> to get me very far (logged in but not authorized -- 
> _uid=allen,o=unaffiliated_)
> 
> I'm going to hold back on comments until we can collaborate via wiki.
> 
> -Paul
> 
> Christopher Brooks wrote:
>> Hi Tristan,
>>
>> This looks pretty interesting.  Various Ptolemy sponsors have an
>> interest in an execution engine as well.
>>
>> One issue is that it seems like this sort of thing has come up
>> before.  I have no experience with grid computing, but it
>> seems like an analysis should include comparisons with grid
>> computing and differentiate what a Kepler Remote Execution Engine
>> (KREE) would need.
>>
>> Also, things like BOINC (http://boinc.berkeley.edu/) have
>> always seem of interest.  I'd love to see an interface to BOINC
>>
>>
>> A project needs a name and I like:
>>   Kepler Remote Execution Engine (KREE)
>> or
>>   Ptolemy Remote Execution Engine (PREE)
>>
>>
>> As a joke:
>>   Kepler Remote Execution Engine for Ptolemy (KREEP)
>>
>> I'm fine with KREE.
>>
>> For the record, I've included a copy of your page.  Getting this
>> on a more accessible wiki somewhere would be good.
>>
>> _Christopher
>>
>>
>>
>> INTRODUCTION:
>> One of the high priority things I want for Hydrant is to separate out the
>> execution of workflows from hydrant itself. This has a number of advantages:
>> * Web site performance doesn't suffer if any computation heavy workflows are
>>   executed.
>> * An execution server could be used by other applications.
>> * Each institution could deploy it's own execution server, allowing users to run
>>   workflows on their own institutions servers even if starting the job from
>>   Hydrant
>> Below is a list of requirements for the different concepts I see in such a
>> system, followed by a list of work already done.
>>
>> BACKEND-FRONTEND INTERACTION
>> * Frontend sends a URI for the workflow it wants to run, Backend returns the ID
>>   of the newly created job, or throws an error if there is a problem.
>> * Frontend can poll the server to get various information.
>>   * Job status
>>   * Results
>>   * Messages (any miscellaneous info)
>> * Frontend can implement an interface that will allow it to be sent updates
>>   from the server as apposed to polling for them.
>> * Frontends can tell the server to delete a job. This should be done once the
>>   frontend has retreived all the results.
>> * Backend only allows trusted frontends to connect.
>>   * Uses certificates.
>> ? How to handle file inputs?
>>   ? dictionary of inputs and URIs to access those files.
>>
>> JOBS
>> * Each job is linked to the frontend which started it. 
>> * Only the frontend which started the job can access it.
>> * Job lifecycle:
>>   * Starts off in the 'NEW' state.
>>   * frontend then posts a number of variables to change for the job.
>>     * key:'.workflow.path.to.actor.property', value:'value for property'
>>     * File inputs need to be passed in as URIs to a remote resource that the
>>       backend can access.
>>   * Goes to 'QUEUED' when the frontend tells it to.
>>   * When it can be started, goes into 'RUNNING' state.
>>   * If it requires user input, goes into 'WAITING' state.
>>   * If an error occures during execution, goes into 'FAILED' state.
>>   * When complete, goes into 'FINISHED' state.
>> * A list of messages is kept for miscellaneous notifications.
>> * Job status and results are stored in a database.
>> * Jobs can be stopped and deleted at the client's request.
>>
>> WORKFLOWS
>> * Backend stores a list of restricted Actors, and removes/replaces them from a
>>   workflow before it's run.
>>   * Changes like this should be listed in the Job's messages.
>> * Backend stores a list of the third party software that it supports (i.e. R, 
>>   Matlab).
>> * Workflows are kept on the backend. If the same URI is passed to the backend
>>   the backend will use the protocols equivalent to http's If-Modified-Since
>>   header to detect if the workflow has changed since it was last downloaded and
>>   if not just use the last downloaded version. If the protocol doesn't support
>>   such a function, it will always download a new copy of the workflow.
>>
>> KORE/CORE CHANGES REQUIRED:
>> * Easier way to manage output from Actors.
>>
>> WORK ALREADY DONE:
>> Christopher Tuot's group has built the beginings of a Backend which loads a MoML
>> passed to it in a POST. It supports multiple 'Frontends' which poll the server
>> to get results. Jobs and Workflows are currently only stored in memory and are
>> lost when the server restarts.
>>
>> Jianwu Wang has written a Backend which can load a workflow from a URL.
>>
>> OTHER DISCUSSION TOPICS:
>> * Is it worth exploring an OSGI architecture?
>>
>>
>>
>>
>>
>>
>>
>> ============== WHITEBOARD ==============
>>
>> ** Social Networking **
>> Current:
>>  Hydrant
>>  MyExperiment
>>
>> Features:
>> * Visualisation of Workflows
>>  - Simple view, no flash or other requirements other than a javascript enabled
>>    web browser.
>> * Sharing Workflows
>> * Start Jobs 
>>  - hooks to Execution server
>> * Edit Workflows 
>>  - hooks to building server
>> * Discussion of Workflows/Jobs
>> * Tags/Ratings on Workflow/Jobs
>>
>> Requirements:
>> ...
>>
>> Possible Technologies:
>> * google friend connect
>> * some form of CMS ?
>>
>> * Building
>> Current:
>>  KFlex
>>
>> * Execution
>> Current:
>>  Hydrant
>>
>> Description: A Frontend/Backend modeled server for executing workflows.
>>
>> Features:
>> * Multiple frontends.
>> * Administration page
>>  - Handle loading of jars
>>  - Handle setup of frontends
>>
>>
>> Requirements:
>> * A standardised API for the backend.
>> * Frontends must implement a frontend API
>> * Frontend-Backend Security
>>  - Backend should only accept requests from known frontends.
>>    Possible solutions: IP security (not ideal)
>>                        Certificate based security.
>>                        (SecurityPlugin)
>> * User segregation...
>>  - If a user starts a job only that user should be able to see/control it.
>>    Possible solution: only allow servers with trusted certificates to
>>                       access the execution engine, and let them handle
>>                       user access control.
>> * Execution Plugin
>>  - An interface that handles execution of workflows. This should be built
>>    so that a simple one server execution model can be used to start with
>>    but a distributed execution model could be easily implemented later on.
>>
>> Possible Technologies:
>> * Google Web Toolkit for Admin page
>>
>>
>>
>> Backend Communication <--a--> Frontend Communication
>>           ^
>>           |
>>           b
>>           |
>>           v
>>     Backend Engine --c--> Database
>>                    --d--> Kepler
>>
>> == Database ==
>> Job: 
>>      auto id;
>>      string workflow_file;
>>      string status;
>> MonitorRegister:
>>         auto id;
>>         job job;
>>         string host;
>>
>>
>> == Technical Use cases ==
>>
>> Queue new Job:
>>       backend/communication/Backend.java:queueNewJob(*MailScanner has detected a possible fraud attempt from "www.workflowrepo.org" claiming to be* "http://www.workflowrepo.org/workflow.xml")
>>       backend/communication/Backend.java:queueNewJob(*MailScanner has detected a possible fraud attempt from "www.workflowrepo.org" claiming to be* "http://www.workflowrepo.org/workflow.xml", false)
>>       backend/engine/Engine.java:queueNewJob(*MailScanner has detected a possible fraud attempt from "www.workflowrepo.org" claiming to be* "http://www.workflowrepo.org/workflow.xml", false)
>>
>> --end--
>>
>> --------
>>
>>     
>>     Hi all,
>>     
>>     I've been pondering over the requirements for removing the execution side o
>>    f
>>     things from the rest of hydrant. I was thinking of putting this on the
>>     kepler-project wiki, but I couldn't create any new pages (i logged in with
>>     my username i use for the cvs and unaffiliated institution). Can anyone hel
>>    p
>>     me out here? For now i've just put it in my git repository, which can be
>>     accessed here:
>>     http://www.hpc.jcu.edu.au/git/?p=jc124742/documents.git;a=blob_plain;f=hydrant-requirements.txt;hb=da7c5bd888faa4591e84509c53ade2291df11db5
>>     I know there is a lot of other people wanting similar things to this, so
>>     lets start a discussion and get something that we can put into the Kepler
>>     Core. If you have any ideas, amendments, criticisms or anything that we can
>>     discuss, fire away!
>>     
>>     Cheers,
>>     -Tristan
>>     
>>     -- 
>>     Tristan King
>>     Research Officer,
>>     eResearch Centre
>>     James Cook University, Townsville Qld 4811
>>     Australia
>>     
>>     Phone: +61747816902
>>     E-mail: tristan.king at jcu.edu.au www: http://eresearch.jcu.edu.au
>> _______________________________________________
>> Kepler-dev mailing list
>> Kepler-dev at ecoinformatics.org
>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
>>
>>   
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Kepler-dev mailing list
> Kepler-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matthew B. Jones
Director of Informatics Research and Development
National Center for Ecological Analysis and Synthesis (NCEAS)
UC Santa Barbara
jones at nceas.ucsb.edu                       Ph: 1-907-523-1960
http://www.nceas.ucsb.edu/ecoinfo
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~