[kepler-users] Greetings/newbie questions about Kepler

Fri Oct 16 21:23:27 PDT 2009

Tom,

I can answer some of these questions...

Tom Parris wrote:
> Dear Keplerites:
> 
> I have been evaluating Kepler for use in a major project.  The project 
> will perform spatial hydrological analysis for a 30+ year time period 
> using monthly time steps. I should say, that at first glance, I'm very 
> impressed. 
> 
> By way of background, I developed a large grained data flow language and 
> director to process time series satellite imagery through complex 
> workflows in an array processor in the 1980s.  The goal was to have a 
> flexible system that would process the imagery in one pass through the 
> array processor (VAXen were notorious for their slow i/o bus).  The 
> language used functional expressions to define output ports, input 
> ports, and actors (e.g., (out1, out2, ...) = actor(in1, in2, ...))  In 
> addition, one could specify how many tokens needed to be present on each 
> port before firing, and how many tokens were consumed per firing.  This 
> allowed time series image operations such as sliding window means and 
> block means to be computed with a single general purpose actor.  
> Somewhere, there's a conference paper.  But I've lost track of the citation.
> 
> So now, some 25+ years later, I'm evaluating Kepler.  In some respects 
> it looks like a dream come true.  But I have some questions before I can 
> "bet the project" on it.  Some are institutional, some are technical.  
> I've plowed through much of the documentation, but quickly.  If my 
> questions are naive and clearly answered (but somehow overlooked) in the 
> documentation, I apologize.
> 
> 1. Is funding and staffing for Kepler development and maintenance secure 
> for the foreseeable future?  I don't want to build a major project on a 
> system that may be orphaned before I'm finished.

There is no such thing as perfectly stable funding, but that said...
Funding is stable for the Ptolemy II system, on which Kepler is built.

> 
> 2. In my experimentation, the system appears stable and reasonably 
> efficient.  But my testing has been on the trivial side.  Does the 
> system scale well to large sophisticated image processing workflows with 
> many levels of composite actors and high volume data streams?

There are applications with tens of thousands of actors.
They require machines with quite a bit of memory, however.

> 
> 3. In my experimentation with the RExpression actor, it has become clear 
> that data is transmitted from output port to input port in character 
> string format.  R statements are then used to translate the character 
> string data back into binary data objects for use in R.  If the amount 
> of data is large, the system automatically switches to passing data as 
> .Rdata (.sav) files or text files.  Is this true for all channels? The 
> application I envision would be processing long time series of global 
> geographic grids each with 0.5 degree resolution or finer.  As a result, 
> repeated saves to disk followed by reads from disk will create serious 
> performance issues.  I was hoping that Kepler tokens could be used to 
> pass large binary objects such as 720x360 floating point matrices in 
> memory via some form of IPC (as opposed to disk files).  Is that 
> possible?  If so, could you point me to the relevant documentation?  
> I've looked, but cannot find it.  If this capability does not exist, is 
> it something you would consider adding in future releases?  We also 
> process routinely process gigapixel imagery.  In such cases, it is often 
> desirable to process sequences of scan lines in a pipeline, only reading 
> from disk at the beginning of the pipeline and only writing to disk at 
> the end of the pipeline.  Everything in between the end-points stays in 
> memory.
> 
> 3a. It also appears that every fire of the Rexpression operator requires 
> a re-instantiation of R with all of it's initialization overhead.  Is 
> there a way to avoid this? Is this true with all actors? 

I don't know much about the R actor... I'll leave this for someone else.

> 
> 4. In order to perform time series operations such as first 
> differencing, an actor either needs to be able to read multiple items of 
> its input port (in the case of first differencing, one would want to 
> read 2 tokens, but only remove 1 per firing) or have memory that 
> persists from firing to firing.  At the moment, I don't see the 
> capability to read more than one token at a time off the input port 
> (while leaving some of them on the port to be re-read on the next 
> firing).  But it looks like one can simulate persistent memory by 
> routing an output port to an input port via the SampleDelay actor.  This 
> appears possible using the SampleDelay actor.  Is this the recommended 
> approach?  If so, this exacerbates the i/o problem with large tokens 
> described in question #3 above.

There is a fairly extensive signal processing library. See the FIR
actor for an actor that operates on sliding windows...
The SDF director is fairly sophisticated about handling such models.

> 
> 5.  Does anyone have experience using Python to drive ArcGIS 
> geoprocessor tools from within Kepler?
> 
> 6. I note you have an actor that provides access to MATLAB.  Can the 
> same actor be used for Octave? (see http://www.gnu.org/software/octave/).

I don't think there is anything standard about the API to MATLAB, so
I doubt it... But I'm sure one could create an actor for Octave.

Regards,
Edward Lee

-------------- next part --------------
A non-text attachment was scrubbed...
Name: eal.vcf
Type: text/x-vcard
Size: 351 bytes
Desc: not available
URL: <http://lists.nceas.ucsb.edu/kepler/pipermail/kepler-users/attachments/20091016/a3bb33b6/attachment.vcf>