Dear Keplerites:<br><br>I have been evaluating Kepler for use in a major project. The project will perform spatial hydrological analysis for a 30+ year time period using monthly time steps. I should say, that at first glance, I'm very impressed. <br>
<br>By way of background, I developed a large grained data flow language and director to process time series satellite imagery through complex workflows in an array processor in the 1980s. The goal was to have a flexible system that would process the imagery in one pass through the array processor (VAXen were notorious for their slow i/o bus). The language used functional expressions to define output ports, input ports, and actors (e.g., (out1, out2, ...) = actor(in1, in2, ...)) In addition, one could specify how many tokens needed to be present on each port before firing, and how many tokens were consumed per firing. This allowed time series image operations such as sliding window means and block means to be computed with a single general purpose actor. Somewhere, there's a conference paper. But I've lost track of the citation.<br>
<br>So now, some 25+ years later, I'm evaluating Kepler. In some respects it looks like a dream come true. But I have some questions before I can "bet the project" on it. Some are institutional, some are technical. I've plowed through much of the documentation, but quickly. If my questions are naive and clearly answered (but somehow overlooked) in the documentation, I apologize.<br>
<br>1. Is funding and staffing for Kepler development and maintenance secure for the foreseeable future? I don't want to build a major project on a system that may be orphaned before I'm finished.<br><br>2. In my experimentation, the system appears stable and reasonably efficient. But my testing has been on the trivial side. Does the system scale well to large sophisticated image processing workflows with many levels of composite actors and high volume data streams?<br>
<br>3. In my experimentation with the RExpression actor, it has become clear that data is transmitted from output port to input port in character string format. R statements are then used to translate the character string data back into binary data objects for use in R. If the amount of data is large, the system automatically switches to passing data as .Rdata (.sav) files or text files. Is this true for all channels? The application I envision would be processing long time series of global geographic grids each with 0.5 degree resolution or finer. As a result, repeated saves to disk followed by reads from disk will create serious performance issues. I was hoping that Kepler tokens could be used to pass large binary objects such as 720x360 floating point matrices in memory via some form of IPC (as opposed to disk files). Is that possible? If so, could you point me to the relevant documentation? I've looked, but cannot find it. If this capability does not exist, is it something you would consider adding in future releases? We also process routinely process gigapixel imagery. In such cases, it is often desirable to process sequences of scan lines in a pipeline, only reading from disk at the beginning of the pipeline and only writing to disk at the end of the pipeline. Everything in between the end-points stays in memory.<br>
<br>3a. It also appears that every fire of the Rexpression operator requires a re-instantiation of R with all of it's initialization overhead. Is there a way to avoid this? Is this true with all actors? <br><br>4. In order to perform time series operations such as first differencing, an actor either needs to be able to read multiple items of its input port (in the case of first differencing, one would want to read 2 tokens, but only remove 1 per firing) or have memory that persists from firing to firing. At the moment, I don't see the capability to read more than one token at a time off the input port (while leaving some of them on the port to be re-read on the next firing). But it looks like one can simulate persistent memory by routing an output port to an input port via the SampleDelay actor. This appears possible using the SampleDelay actor. Is this the recommended approach? If so, this exacerbates the i/o problem with large tokens described in question #3 above. <br>
<br>5. Does anyone have experience using Python to drive ArcGIS geoprocessor tools from within Kepler? <br><br>6. I note you have an actor that provides access to MATLAB. Can the same actor be used for Octave? (see <a href="http://www.gnu.org/software/octave/">http://www.gnu.org/software/octave/</a>). <br>
<br>I appreciate any response I can get to these questions. I'll probably have more questions as I dig into Kepler in more detail. I thank you for your patience as I come up to speed.<br><br>With best regards,<br>Tom Parris<br>
Vice President, ISciences, LLC.<br>