[kepler-dev] Re: Fw: Practical work at Link Up meeting
Bertram Ludaescher
ludaesch at sdsc.edu
Wed Nov 10 10:04:56 PST 2004
Soaplab fans (and others: please chime in!):
I'm still trying to figure out a bit more about the soaplab "trinity":
start, wait, retrieve
when invoking soaplab services.
It seems that this sequence of soaplab services can provide a finer
level of control than what standard web services allow. So that's a
good feature.
BTW: How do standard WS deal with long running invocations (says
on the order of hours)?
Does the usual request/response pattern have a time-out associated?
Certainly for those long running WS invocations an asynchronous
pattern a la soaplab is preferrable. But then again standard WS also
have an asynchronous way of invocation -- some quick links:
http://java.sun.com/blueprints/webservices/using/webservbp3.html
http://www.computerworld.com/developmenttopics/development/story/0,10801,79698,00.html
http://www.15seconds.com/issue/031124.htm
However, I suspect that many service providers might not implement
that pattern!? Is that right?
In either case, as a matter of 'separation of concerns', in a workflow
setting I would think that a component (actor) should not have to
worry much about the communication mechanism (synch or asynch) but
instead this should be left to the overall workflow coordinator
(director in PT2/Kepler speak); this is of course the point of
"actor-oriented modeling" PT2-style.
For example in a process network (PN) workflow, maximal parallelism is
already achieved:
-- any process WS2 sitting downstream of a long running web service
WS1 will have to wait for its inputs anyways: [WS1]-->[WS2]-->..
-- any other processes WS3, WS4, ... on parallel branches will not be
slowed down by [WS1]: [WS3]-->[WS4]-->..
So it seems that the details of asynch communication can be hidden at
the WF level. The director, on the other hand, might be dealing with
the details of synch/asynch calls and might have different treatment
for each of synch WS, asynch WS, and soaplab WS ... right?
Comments?
Bertram
>>>>> "BL" == Bertram Ludaescher <ludaesch at sdsc.edu> writes:
BL>
>>>>> "TO" == Tom Oinn <tmo at ebi.ac.uk> writes:
TO>
TO> Efrat Jaeger wrote:
>>> Hi all,
>>>
>>> One question that came up from the meeting was about the benefits of SoapLab
>>> services. Could you please elaborate on the advantages of wrapping a command
>>> line application as a soaplab service over standard web services.
TO>
TO> Hi Efrat,
TO>
TO> The primary advantage is exactly that - it's wrapping an existing asset
TO> rather than coding a new one. Bioinformaticians tend to have significant
TO> existing work in the form of shell scripts, native executables, PERL
TO> etc, so we needed a way for these resources to be accessed remotely.
BL>
BL> yep -- understood.
BL>
TO> In addition, we get some level of descriptive information as a byproduct
TO> of the soaplab service creation - free text description and
TO> categorisation for now but we're enhancing soaplab to include semantic
TO> metadata as well regarding input and output types and their
TO> inter-relations. Soaplab exposes these metadata through an explicit
TO> 'describe' operation which toolkits such as Taverna can use to get
TO> information about the service above and beyond that available through
TO> standard WSDL.
BL>
BL> also good.
BL>
BL> Is it also conceivable to use a "WSDL++" for the same purposes? For
BL> example, you might have a standard WSDL (so that standard WS clients
BL> can handle the WSDL as well) and then add the extra (ACD) information
BL> in whatever places you might put them "without harm" (if nothing else
BL> as a "formatted comment") ...
BL> Wouldn't this combine the best of both worlds?
BL> Or do the WSDLs you publish for soaplab services already contain the
BL> additional ACD info somewhere?
BL>
TO> There are also arguably technical advantages from being able to set
TO> inputs individually, most notably that we can have default values and
TO> that we can request only the output data we're interested in in the case
TO> of multiple potential return types. With a native web service we'd be
TO> forced to supply all parameters (or have multiple versions of each
TO> operation) and also forced to fetch all results in one operation,
TO> possibly resulting in considerably more traffic than is required.
BL>
BL> I see. Wouldn't the use of complex XML Schema types provide a possible
BL> alternative route? For example a WS input, given as a complex XML
BL> Schema type could have optional fields, default values, and even
BL> "list valued" inputs etc...
BL>
BL> Seems that one problem with this is the mapping of such complex XML
BL> Schema types to the programming language types (say in Java)
BL>
TO> We actually had (I'm not sure if it's around anywhere now) a workflow
TO> which did exactly what you suggest - showed the single actor versus
TO> multiple finer grained actors for two equivalent operations, there's a
TO> diagram of the workflow here :
TO> http://twiki.mygrid.info/twiki/pub/Mygrid/WorkFlow/workflow.bound.jpg
TO> although this is obviously a rather elderly version.
BL>
BL> I was looking at the image -- could you point out the which parts you
BL> mean here (i.e., where is the "standard" WS and were are the more
BL> fine-grained ones?)
BL>
BL> Coming back to Efrat's original question:
BL> Looking at the soaplab java client, the invocation of a soaplab service
BL> seems to be a sequence of three web service methods invocations, start, wait
BL> and getResults. I was wondering why can't the native web service actor be
BL> used for invoking soaplab services.
BL>
BL> Let me make a wild guess: with 'start' you launch a remote
BL> service. I guess 'wait' would do a polling (and if so, should probably
BL> be renamed!?), Then with getResults you retrieve the results.
BL>
BL> Seems this is implementing a simple async protocol so that a client
BL> doesn't have to wait for the completion?
BL>
BL> Is that right?
BL>
BL> An alternative approach in Kepler might be to push such low-level
BL> communication issues to the director level. For example, if a workflow
BL> has something else to do during the 'wait', it would already do so
BL> (since it's a multi-threaded/multi-process network) Conversely, a
BL> downstream process (i.e., "behind" the 'wait') would have to block
BL> anyways until the result is available.
BL> Did I get that right?
BL>
BL> cheers and thanks again for the insightful discussion..
BL>
BL> Bertram
BL>
BL> PS Can I cross post this thread to the kepler-dev list? I think others
BL> might benefit from this discussion as well (and might have
BL> addtl. feedback)
More information about the Kepler-dev
mailing list