[kepler-dev] Combining the CommandLine and Exec actors
ludaesch at sdsc.edu
Sun Aug 8 22:58:01 PDT 2004
>>>>> "TF" == Tobin Fricke <tobin at splorg.org> writes:
TF> On Sun, 8 Aug 2004, Dan Higgins wrote:
>> On Friday on IRC, you mentioned the possibility of combining the
>> CommandLine and Exec actors. I thought I would just write down some
>> thoughts on that idea.
TF> Hi Dan,
TF> This is a good summary of the differences between the various Exec-like
TF> actors. I've been pondering this lately and agree that they should be
TF> combined. Maybe we could combine them via a set of options:
TF> 1. Keep the process running, or execute it once per firing?
you mean: keep process running or exit after each firing, right?
(you might still execute the command once per firing *without* exiting
TF> 2. Execute directly, or via a shell command?
yep -- each one can come handy
TF> 3. Relations carry string input/output, or filenames?
.. or file handles!
TF> My own view is that it's a little weird to be using files at all, except
TF> perhaps for efficiency reasons ("the third party problem").
yes, efficiency is an issue.
I've been recently thinking about this as follows:
each port/datum should have several "types":
- the conventional "data type" (current PT2 type system)
- a "semantic type" (we're cooking something there ;-)
- a "transport type" (e.g., to processes might communicate natively
as Java objects, or via http, or via gridFTP, or via sockets, or via
CORBA, or via main memory! -- how's that for starters?? ;-)
- an event production/consumption type. This is for scheduling a la
SDF. But this "type" is not a property of an individual port. Rather
such a firing rule is a constraint expression
TF> Additionally, another possible function:
TF> 4. parameters to express how tokens are translated into input for the
TF> program, and (more importantly), how the program's output is tokenized for
TF> input into kepler.
TF> At a minimum, I think we need to be able to specify a regular expression
TF> giving the 'field separator' that will indicate that the program's output
TF> so far can be bundled up into a token. It can default to "\n".
I like that. That's very hand's on. So if you have a string such as
"foo\n bar\n baz\n" this could be configured to either produce one
token or three tolens (with \n the delimiter)...
Digression: another nice feature is to have "parameterized commands".
For example the command
mycmd -s $1 -t $2 -o $3
could be exposed as an actor with (in this case) say two input ports
for the s and t values, and one output port for the o value.
I guess this can already be done with the cmd line actor as is...
>> being directly connected. Command line redirection to/from files cannot
>> be done with this approach, although the whole thing is simpler because
>> it doesn't involve an intermediate program (the shell) to run the
TF> I agree that it's good to avoid using the shell when possible, although in
TF> practice the overhead is probably minimal. One case I recently
TF> encountered which makes the shell incredibly handy is if you need to
TF> perform some translation on the output of the process being executed.
TF> For instance, I wanted to provide the results of 'factor' to Kepler as an
TF> expression that would evaluate to an array. This can be done by filtering
TF> through a small Perl script that is given entirely on the command line.
TF> But that filtering requires the shell.
TF> The actor doesn't have to support execution-via-the-shell explicitly. I
TF> used the Exec actor with a command line like:
TF> bash -c "foo -x -y -z | bar > baz"
TF> In fact, you could implement the CommandLine actor as a composite actor,
TF> one that feeds the Exec actor the command:
TF> $SHELL -c "$CMD < $INFILE > $OUTFILE"
>> The Exec actor also has special threads for grabbing io streams (and I
>> believe these are needed, at times, on Windows).
TF> Combining all of these related actors has the advantage of putting all
TF> these various hacks that are necessary to get them to work on all
TF> platforms in one place.
TF> I suspect that the I/O-grabbing threads can be done away with somehow in
TF> the future.
>> 4) Both the CommandLine and Exec actors wait for the subprocess to
>> complete. The InteractiveExec actor is an attempt to let the underlying
>> subprocess continue operating and thus avoid repeated code startup
>> delays. It still has some problems with when to shut down the
TF> I think this can be given as a boolean option to the actor. Shutdown is
TF> an issue in both cases. One possibility would be to invent a "end of
TF> file" ('record deliminator'?) token, but then we'd want a general way of
TF> escaping it, too, etc. This is definitely a problem that needs to be
TF> addressed RE: the use of 'InteractiveExec'-type processes.
>> 5) Simply adding an input stream to the CommandLine actor would make it
>> quite similar to the Exec actor. There would be some confusion about
>> what the presence of both inputFileHandle and input stream parameters
>> might mean.
TF> Maybe there could be one input port and an option to toggle its semantics?
TF> Or (perhaps better), the actor could look to see what input ports exist,
TF> and declare an error if it has more than one of (inputFileName,
TF> I think 'inputFileHandle' is a misnomer for the port, since it expects a
TF> file name and not a file handle.
>> [I am not sure what it means to tell the commandline shell to use
>> standard input AND redirect the input from a file ?
TF> The shell issues a syntax error: "Ambiguous input redirect."
>> 6) Should we add a parameter to allow the subprocess that is launched to
>> return without completing?
TF> I think this is a good idea.
>> The process would be created on the first 'fire' event, and we would
>> have to figure out how to reliably shut it down later.
TF> Maybe a string parameter can describe the necessary "exit" command, and
TF> the actor can issue this when 'wrapping up'. If that doesn't do the trick
TF> within some period of time, the underlying process can be forcefully
TF> The current situation seems to be that Kepler freezes and has to be
TF> forcefully terminated itself.
>> 7) Also note that the Exec actor has some 'expert' parameters that allow
>> for setting execution directories and environment variables for the
>> subprocess being launched. These could be useful at times and should
>> probably be included.
TF> kepler-dev mailing list
TF> kepler-dev at ecoinformatics.org
More information about the Kepler-dev