[kepler-users] dataflow using Kepler on Amazon EC2

Fri Mar 25 14:17:57 PDT 2011

Hi Jianwu,

After playing with Kepler, there are some questions I have.

1. All my Java classes are in the same project. There is only a single Main class. In order to use Kepler, must each class be converted to a .jar file individually? If so, the disadvantage to this is then that only command-line parameters can be passed in. How does the data flow between the classes? Must each class read input from files and write input to files? That is, what is the nature of your type system for passing data between components. Can data be passed directly through RAM or must it go through the file system? 

2. Provenance is very important for my workflow. The workflow will be run multiple times and a large number of versions will be created. These should be organized somewhere on the file system with timestamps and descriptions of the versions of the workflows that were used. How much support does Kepler have for this? 

3. Have you seen the new Conveyor paper? http://www.ncbi.nlm.nih.gov/pubmed?term=21278189 My requirements are very similar to those addressed in this paper. However, the current version of Conveyor does not seem very stable: I was even unable to get their graphical user interface running from their Java files. What are the capabilities of Kepler for this use case?

Sincerely, with best wishes,
Luqman

On Mar 22, 2011, at 3:51 PM, Jianwu Wang wrote:

> Hi Luqman,
> 
>    Your target is still not clear to me. Please break it into sub tasks so that we can help more efficiently. Or you can try Kepler first before getting more specific questions to ask.
> 
>    About Kepler workflow execution on EC2, I did some experiments on it and don't think it is hard to execute Kepler workflows on EC2.
> 
> Best wishes
> 
> Sincerely yours
> 
> Jianwu Wang
> jianwu at sdsc.edu
> http://users.sdsc.edu/~jianwu/
> 
> Assistant Project Scientist
> Scientific Workflow Automation Technologies (SWAT) Laboratory
> San Diego Supercomputer Center
> University of California, San Diego
> San Diego, CA, U.S.A.
> 
> 
> On 3/21/2011 5:10 PM, Luqman Hodgkinson wrote:
>> 
>> 
>> 
>> Dear Kepler developers,
>> I have a collection of Java classes linked by a custom dataflow architecture. All classes are in a single project but some of these classes call executables written in languages other than Java. I am investigating the possibility of transitioning to Kepler. Essentially my desires are to link these Java classes in a DAG representing the dataflow and to execute the dataflow in Amazon EC2. The data flowing along the edges are arbitrary custom Java classes. Additionally it is important to cache intermediate results. The data is acquired from a few web services: iRefIndex, IntAct, UniProt, and Gene Ontology. There are complex software dependencies so after setting up the dataflow I would like to save the entire system as an abstract machine image (AMI). How difficult would this transition be, and would it be worth the effort? I would appreciate your comments and advice.
>> 		Sincerely, with best wishes,
>> 		Luqman Hodgkinson,
>> 		Ph.D. student, UC-Berkeley
>> _______________________________________________
>> Kepler-users mailing list
>> Kepler-users at kepler-project.org
>> http://lists.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-users