[kepler-users] NGS Specific

Matt Jones jones at nceas.ucsb.edu
Tue Feb 1 10:15:02 PST 2011


Madhavi,

Some Kepler users have definitely used Kepler at scales like this --
Norbert's work on plasma fusion simulations that generate 800GB of data in a
30 hour simulation run under Kepler control comes to mind.  However, the
performance at large scale like this will entirely depend on the specific
workflows and actors used and how they handle the data.  Any actors that try
to pass large data objects like that in memory, or try to process them
without considering their scale, will fail.  Few of the standard actors have
currently been built to deal with that, so you'll need to be careful.  As
long as you are streaming small chunks of data through the engine at a time,
or if you use control tokens to control processing flow on data objects
stored outside of memory, or if you use actors that you specially design to
work with large data, then you should be good to go.  The devil is of course
in the details of your particular case.

Matt

On Mon, Jan 31, 2011 at 9:02 PM, Madhavi Tikhe <
madhavi_tikhe at persistent.co.in> wrote:

>  Hi Jianwu,
>
>
>
> Right, I was referring to Next Gen Sequencing. One of the problems we face
> is the data size. It is huge. The file size can range from 50mb to 500gb.
> Would you please let me know if you have any metrics for performance of
> Kepler when the some or all actors perform CPU/memory intensive tasks?
>
> I want to know if Kepler can gracefully handle such a large data. Pl let me
> know what your thoughts are on this.
>
>
>
> Thanks,
>
> Madhavi
>
>
>
> *From:* Jianwu Wang [mailto:jianwu at sdsc.edu]
> *Sent:* Monday, January 31, 2011 11:50 PM
> *To:* Madhavi Tikhe
> *Cc:* kepler-users at kepler-project.org Users
> *Subject:* Re: [kepler-users] NGS Specific
>
>
>
> Hi Madhavi,
>
>     I guess NGS means 'Next Generation Sequencing'. I know one work related
> to it in Kepler is to use Hadoop to support. You can find our paper, titled:
> "Kepler + Hadoop : A General Architecture Facilitating Data-Intensive
> Applications in Scientific Workflow Systems" at my website
> http://users.sdsc.edu/~jianwu/. Notes that this module is available in
> Kepler trunk, but not in production yet. We are updating this module
> recently.
>
>  Best wishes
>
>
>
> Sincerely yours
>
>
>
> Jianwu Wang
>
> jianwu at sdsc.edu
>
> http://users.sdsc.edu/~jianwu/
>
>
>
> Assistant Project Scientist
>
> Scientific Workflow Automation Technologies (SWAT) Laboratory
>
> San Diego Supercomputer Center
>
> University of California, San Diego
>
> San Diego, CA, U.S.A.
>
>
> On 1/31/2011 5:45 AM, Madhavi Tikhe wrote:
>
> Hi
>
> What all NGS specific features does Kepler provide?
>
> Support for large data sets ( 1tb)?
>
> Support for CPU intensive modules?
>
> Any other features?
>
>
>
> Thanks,
>
> Madhavi
>
>
>
>
>
> DISCLAIMER ========== This e-mail may contain privileged and confidential
> information which is the property of Persistent Systems Ltd. It is intended
> only for the use of the individual or entity to which it is addressed. If
> you are not the intended recipient, you are not authorized to read, retain,
> copy, print, distribute or use this message. If you have received this
> communication in error, please notify the sender and delete all copies of
> this message. Persistent Systems Ltd. does not accept any liability for
> virus infected mails.
>
>
>
>
>
> _______________________________________________
>
> Kepler-users mailing list
>
> Kepler-users at kepler-project.org
>
> http://lists.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-users
>
> DISCLAIMER ========== This e-mail may contain privileged and confidential
> information which is the property of Persistent Systems Ltd. It is intended
> only for the use of the individual or entity to which it is addressed. If
> you are not the intended recipient, you are not authorized to read, retain,
> copy, print, distribute or use this message. If you have received this
> communication in error, please notify the sender and delete all copies of
> this message. Persistent Systems Ltd. does not accept any liability for
> virus infected mails.
>
> _______________________________________________
> Kepler-users mailing list
> Kepler-users at kepler-project.org
> http://lists.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nceas.ucsb.edu/kepler/pipermail/kepler-users/attachments/20110201/ef0d07ec/attachment.html>


More information about the Kepler-users mailing list