[kepler-dev] Styx for remote data streaming?

Sat Aug 20 23:38:44 PDT 2005

Remember those 3rd party transfer and "handle" discussions as a
mechanism to avoid unnecessary data transports say between two remote
service invocations!?

Just came again across this Styx stuff -- seems interesting! 

Maybe sth we could adopt for Kepler!

Bertram

Source: http://www.resc.rdg.ac.uk/publications/Blower_AHM_2005.pdf

[...]
3.3 Data streaming

A key feature of the Styx Grid Service design is the ability to stream
data directly between remote service instances. The simplest way to
achieve this is as follows: The underlying executables are written so
that they read data from their standard input stream and write data to
their standard output, in the manner of a Unix filter. If the
executables were running on the same machine, the output of one could
be streamed into the input of the other using the pipe operator (e.g.
"prog1 | prog2").

When the executables are wrapped as Styx Grid Services the downstream
service can be instructed to read its input data from the output
stream of the upstream service by writing the URL of the output stream
into the stdin file of the downstream service before the downstream
service is started (step 5 in section 3.2). Note that any URL can be
specified as the input source for an SGS, including HTTP and FTP URLs;
GridFTP support will be added in future. This allows SGSs to be used
with other remote service types and data sources. 

*** The URL is a simply a pointer to a data source and the downstream
*** service does not care whether it represents a live data stream or
*** a pre-existing file. 
[emphasis added]
*** An important feature of data streaming is that a chain of services
*** can execute concurrently: downstream services can start processing
*** data while upstream services are still in progress. This can lead
*** to large performance increases over systems that use intermediate
*** files to transport data between services (in which the upstream
*** service generally has to finish before the downstream service can
*** start).

It is also possible to expose other output files as streams: the SGS
wrapper monitors these files, looking for changes to their length as
the service is running. In this way a service can effectively output
many streams of data in addition to the standard output and error
streams.