[kepler-dev] Kepler Remote Execution Engine

Wed Jun 18 12:07:37 PDT 2008

Hi Tristan,

This looks pretty interesting.  Various Ptolemy sponsors have an
interest in an execution engine as well.

One issue is that it seems like this sort of thing has come up
before.  I have no experience with grid computing, but it
seems like an analysis should include comparisons with grid
computing and differentiate what a Kepler Remote Execution Engine
(KREE) would need.

Also, things like BOINC (http://boinc.berkeley.edu/) have
always seem of interest.  I'd love to see an interface to BOINC

A project needs a name and I like:
  Kepler Remote Execution Engine (KREE)
or
  Ptolemy Remote Execution Engine (PREE)

As a joke:
  Kepler Remote Execution Engine for Ptolemy (KREEP)

I'm fine with KREE.

For the record, I've included a copy of your page.  Getting this
on a more accessible wiki somewhere would be good.

_Christopher

INTRODUCTION:
One of the high priority things I want for Hydrant is to separate out the
execution of workflows from hydrant itself. This has a number of advantages:
* Web site performance doesn't suffer if any computation heavy workflows are
  executed.
* An execution server could be used by other applications.
* Each institution could deploy it's own execution server, allowing users to run
  workflows on their own institutions servers even if starting the job from
  Hydrant
Below is a list of requirements for the different concepts I see in such a
system, followed by a list of work already done.

BACKEND-FRONTEND INTERACTION
* Frontend sends a URI for the workflow it wants to run, Backend returns the ID
  of the newly created job, or throws an error if there is a problem.
* Frontend can poll the server to get various information.
  * Job status
  * Results
  * Messages (any miscellaneous info)
* Frontend can implement an interface that will allow it to be sent updates
  from the server as apposed to polling for them.
* Frontends can tell the server to delete a job. This should be done once the
  frontend has retreived all the results.
* Backend only allows trusted frontends to connect.
  * Uses certificates.
? How to handle file inputs?
  ? dictionary of inputs and URIs to access those files.

JOBS
* Each job is linked to the frontend which started it. 
* Only the frontend which started the job can access it.
* Job lifecycle:
  * Starts off in the 'NEW' state.
  * frontend then posts a number of variables to change for the job.
    * key:'.workflow.path.to.actor.property', value:'value for property'
    * File inputs need to be passed in as URIs to a remote resource that the
      backend can access.
  * Goes to 'QUEUED' when the frontend tells it to.
  * When it can be started, goes into 'RUNNING' state.
  * If it requires user input, goes into 'WAITING' state.
  * If an error occures during execution, goes into 'FAILED' state.
  * When complete, goes into 'FINISHED' state.
* A list of messages is kept for miscellaneous notifications.
* Job status and results are stored in a database.
* Jobs can be stopped and deleted at the client's request.

WORKFLOWS
* Backend stores a list of restricted Actors, and removes/replaces them from a
  workflow before it's run.
  * Changes like this should be listed in the Job's messages.
* Backend stores a list of the third party software that it supports (i.e. R, 
  Matlab).
* Workflows are kept on the backend. If the same URI is passed to the backend
  the backend will use the protocols equivalent to http's If-Modified-Since
  header to detect if the workflow has changed since it was last downloaded and
  if not just use the last downloaded version. If the protocol doesn't support
  such a function, it will always download a new copy of the workflow.

KORE/CORE CHANGES REQUIRED:
* Easier way to manage output from Actors.

WORK ALREADY DONE:
Christopher Tuot's group has built the beginings of a Backend which loads a MoML
passed to it in a POST. It supports multiple 'Frontends' which poll the server
to get results. Jobs and Workflows are currently only stored in memory and are
lost when the server restarts.

Jianwu Wang has written a Backend which can load a workflow from a URL.

OTHER DISCUSSION TOPICS:
* Is it worth exploring an OSGI architecture?

============== WHITEBOARD ==============

** Social Networking **
Current:
 Hydrant
 MyExperiment

Features:
* Visualisation of Workflows
 - Simple view, no flash or other requirements other than a javascript enabled
   web browser.
* Sharing Workflows
* Start Jobs 
 - hooks to Execution server
* Edit Workflows 
 - hooks to building server
* Discussion of Workflows/Jobs
* Tags/Ratings on Workflow/Jobs

Requirements:
...

Possible Technologies:
* google friend connect
* some form of CMS ?

* Building
Current:
 KFlex

* Execution
Current:
 Hydrant

Description: A Frontend/Backend modeled server for executing workflows.

Features:
* Multiple frontends.
* Administration page
 - Handle loading of jars
 - Handle setup of frontends

Requirements:
* A standardised API for the backend.
* Frontends must implement a frontend API
* Frontend-Backend Security
 - Backend should only accept requests from known frontends.
   Possible solutions: IP security (not ideal)
                       Certificate based security.
                       (SecurityPlugin)
* User segregation...
 - If a user starts a job only that user should be able to see/control it.
   Possible solution: only allow servers with trusted certificates to
                      access the execution engine, and let them handle
                      user access control.
* Execution Plugin
 - An interface that handles execution of workflows. This should be built
   so that a simple one server execution model can be used to start with
   but a distributed execution model could be easily implemented later on.

Possible Technologies:
* Google Web Toolkit for Admin page

Backend Communication <--a--> Frontend Communication
          ^
          |
          b
          |
          v
    Backend Engine --c--> Database
                   --d--> Kepler

== Database ==
Job: 
     auto id;
     string workflow_file;
     string status;
MonitorRegister:
        auto id;
        job job;
        string host;

== Technical Use cases ==

Queue new Job:
      backend/communication/Backend.java:queueNewJob("http://www.workflowrepo.org/workflow.xml")
      backend/communication/Backend.java:queueNewJob("http://www.workflowrepo.org/workflow.xml", false)
      backend/engine/Engine.java:queueNewJob("http://www.workflowrepo.org/workflow.xml", false)

--end--

--------

    Hi all,

    I've been pondering over the requirements for removing the execution side o
   f
    things from the rest of hydrant. I was thinking of putting this on the
    kepler-project wiki, but I couldn't create any new pages (i logged in with
    my username i use for the cvs and unaffiliated institution). Can anyone hel
   p
    me out here? For now i've just put it in my git repository, which can be
    accessed here:
    http://www.hpc.jcu.edu.au/git/?p=jc124742/documents.git;a=blob_plain;f=hydrant-requirements.txt;hb=da7c5bd888faa4591e84509c53ade2291df11db5
    I know there is a lot of other people wanting similar things to this, so
    lets start a discussion and get something that we can put into the Kepler
    Core. If you have any ideas, amendments, criticisms or anything that we can
    discuss, fire away!

    Cheers,
    -Tristan

    -- 
    Tristan King
    Research Officer,
    eResearch Centre
    James Cook University, Townsville Qld 4811
    Australia

    Phone: +61747816902
    E-mail: tristan.king at jcu.edu.au www: http://eresearch.jcu.edu.au