[kepler-dev] Combining the CommandLine and Exec actors

Sun Aug 8 22:16:19 PDT 2004

On Sun, 8 Aug 2004, Dan Higgins wrote:

>     On Friday on IRC, you mentioned the possibility of combining the
> CommandLine and Exec actors. I thought I would just write down some
> thoughts on that idea.

Hi Dan,

This is a good summary of the differences between the various Exec-like
actors.  I've been pondering this lately and agree that they should be
combined.  Maybe we could combine them via a set of options:

1. Keep the process running, or execute it once per firing?

2. Execute directly, or via a shell command?

3. Relations carry string input/output, or filenames?

My own view is that it's a little weird to be using files at all, except
perhaps for efficiency reasons ("the third party problem").

Additionally, another possible function:

4.  parameters to express how tokens are translated into input for the
program, and (more importantly), how the program's output is tokenized for
input into kepler.

At a minimum, I think we need to be able to specify a regular expression
giving the 'field separator' that will indicate that the program's output
so far can be bundled up into a token.  It can default to "\n".

> being directly connected. Command line redirection to/from files cannot
> be done with this approach, although the whole thing is simpler because
> it doesn't involve an intermediate program (the shell) to run the
> subprocess.

I agree that it's good to avoid using the shell when possible, although in
practice the overhead is probably minimal.  One case I recently
encountered which makes the shell incredibly handy is if you need to
perform some translation on the output of the process being executed.
For instance, I wanted to provide the results of 'factor' to Kepler as an
expression that would evaluate to an array.  This can be done by filtering
through a small Perl script that is given entirely on the command line.
But that filtering requires the shell.

The actor doesn't have to support execution-via-the-shell explicitly.  I
used the Exec actor with a command line like:

	bash -c "foo -x -y -z | bar > baz"

In fact, you could implement the CommandLine actor as a composite actor,
one that feeds the Exec actor the command:

	$SHELL -c "$CMD < $INFILE > $OUTFILE"

> The Exec actor also has special threads for grabbing io streams (and I
> believe these are needed, at times, on Windows).

Combining all of these related actors has the advantage of putting all
these various hacks that are necessary to get them to work on all
platforms in one place.

I suspect that the I/O-grabbing threads can be done away with somehow in
the future.

> 4) Both the CommandLine and Exec actors wait for the subprocess to
> complete. The InteractiveExec actor is an attempt to let the underlying
> subprocess continue operating and thus avoid repeated code startup
> delays. It still has some problems with when to shut down the
> subprocess.

I think this can be given as a boolean option to the actor.  Shutdown is
an issue in both cases.  One possibility would be to invent a "end of
file" ('record deliminator'?) token, but then we'd want a general way of
escaping it, too, etc.   This is definitely a problem that needs to be
addressed RE: the use of 'InteractiveExec'-type processes.

> 5) Simply adding an input stream to the CommandLine actor would make it
> quite similar to the Exec actor. There would be some confusion about
> what the presence of both inputFileHandle and input stream parameters
> might mean.

Maybe there could be one input port and an option to toggle its semantics?

Or (perhaps better), the actor could look to see what input ports exist,
and declare an error if it has more than one of (inputFileName,
inputStream).

I think 'inputFileHandle' is a misnomer for the port, since it expects a
file name and not a file handle.

> [I am not sure what it means to tell the commandline shell to use
> standard input AND redirect the input from a file ?

The shell issues a syntax error: "Ambiguous input redirect."

> 6) Should we add a parameter to allow the subprocess that is launched to
> return without completing?

I think this is a good idea.

> The process would be created on the first 'fire' event, and we would
> have to figure out how to reliably shut it down later.

Maybe a string parameter can describe the necessary "exit" command, and
the actor can issue this when 'wrapping up'.  If that doesn't do the trick
within some period of time, the underlying process can be forcefully
terminated.

The current situation seems to be that Kepler freezes and has to be
forcefully terminated itself.

> 7) Also note that the Exec actor has some 'expert' parameters that allow
> for setting execution directories and environment variables for the
> subprocess being launched. These could be useful at times and should
> probably be included.

Definitely.

Tobin