[kepler-users] Greetings/newbie questions about Kepler

Fri Oct 16 15:13:33 PDT 2009

Dear Keplerites:

I have been evaluating Kepler for use in a major project.  The project will
perform spatial hydrological analysis for a 30+ year time period using
monthly time steps. I should say, that at first glance, I'm very impressed.

By way of background, I developed a large grained data flow language and
director to process time series satellite imagery through complex workflows
in an array processor in the 1980s.  The goal was to have a flexible system
that would process the imagery in one pass through the array processor
(VAXen were notorious for their slow i/o bus).  The language used functional
expressions to define output ports, input ports, and actors (e.g., (out1,
out2, ...) = actor(in1, in2, ...))  In addition, one could specify how many
tokens needed to be present on each port before firing, and how many tokens
were consumed per firing.  This allowed time series image operations such as
sliding window means and block means to be computed with a single general
purpose actor.  Somewhere, there's a conference paper.  But I've lost track
of the citation.

So now, some 25+ years later, I'm evaluating Kepler.  In some respects it
looks like a dream come true.  But I have some questions before I can "bet
the project" on it.  Some are institutional, some are technical.  I've
plowed through much of the documentation, but quickly.  If my questions are
naive and clearly answered (but somehow overlooked) in the documentation, I
apologize.

1. Is funding and staffing for Kepler development and maintenance secure for
the foreseeable future?  I don't want to build a major project on a system
that may be orphaned before I'm finished.

2. In my experimentation, the system appears stable and reasonably
efficient.  But my testing has been on the trivial side.  Does the system
scale well to large sophisticated image processing workflows with many
levels of composite actors and high volume data streams?

3. In my experimentation with the RExpression actor, it has become clear
that data is transmitted from output port to input port in character string
format.  R statements are then used to translate the character string data
back into binary data objects for use in R.  If the amount of data is large,
the system automatically switches to passing data as .Rdata (.sav) files or
text files.  Is this true for all channels? The application I envision would
be processing long time series of global geographic grids each with 0.5
degree resolution or finer.  As a result, repeated saves to disk followed by
reads from disk will create serious performance issues.  I was hoping that
Kepler tokens could be used to pass large binary objects such as 720x360
floating point matrices in memory via some form of IPC (as opposed to disk
files).  Is that possible?  If so, could you point me to the relevant
documentation?  I've looked, but cannot find it.  If this capability does
not exist, is it something you would consider adding in future releases?  We
also process routinely process gigapixel imagery.  In such cases, it is
often desirable to process sequences of scan lines in a pipeline, only
reading from disk at the beginning of the pipeline and only writing to disk
at the end of the pipeline.  Everything in between the end-points stays in
memory.

3a. It also appears that every fire of the Rexpression operator requires a
re-instantiation of R with all of it's initialization overhead.  Is there a
way to avoid this? Is this true with all actors?

4. In order to perform time series operations such as first differencing, an
actor either needs to be able to read multiple items of its input port (in
the case of first differencing, one would want to read 2 tokens, but only
remove 1 per firing) or have memory that persists from firing to firing.  At
the moment, I don't see the capability to read more than one token at a time
off the input port (while leaving some of them on the port to be re-read on
the next firing).  But it looks like one can simulate persistent memory by
routing an output port to an input port via the SampleDelay actor.  This
appears possible using the SampleDelay actor.  Is this the recommended
approach?  If so, this exacerbates the i/o problem with large tokens
described in question #3 above.

5.  Does anyone have experience using Python to drive ArcGIS geoprocessor
tools from within Kepler?

6. I note you have an actor that provides access to MATLAB.  Can the same
actor be used for Octave? (see http://www.gnu.org/software/octave/).

I appreciate any response I can get to these questions.  I'll probably have
more questions as I dig into Kepler in more detail.  I thank you for your
patience as I come up to speed.

With best regards,
Tom Parris
Vice President, ISciences, LLC.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nceas.ucsb.edu/kepler/pipermail/kepler-users/attachments/20091016/d7226c9a/attachment.html>