[kepler-dev] Re: Fw: Practical work at Link Up meeting

Wed Nov 10 10:04:56 PST 2004

Soaplab fans (and others: please chime in!): 

I'm still trying to figure out a bit more about the soaplab "trinity":
	start, wait, retrieve
when invoking soaplab services.

It seems that this sequence of soaplab services can provide a finer
level of control than what standard web services allow. So that's a
good feature.  

BTW: How do standard WS deal with long running invocations (says
on the order of hours)?  

Does the usual request/response pattern have a time-out associated?

Certainly for those long running WS invocations an asynchronous
pattern a la soaplab is preferrable. But then again standard WS also
have an asynchronous way of invocation -- some quick links:

   http://java.sun.com/blueprints/webservices/using/webservbp3.html
   http://www.computerworld.com/developmenttopics/development/story/0,10801,79698,00.html
   http://www.15seconds.com/issue/031124.htm

However, I suspect that many service providers might not implement
that pattern!? Is that right?

In either case, as a matter of 'separation of concerns', in a workflow
setting I would think that a component (actor) should not have to
worry much about the communication mechanism (synch or asynch) but
instead this should be left to the overall workflow coordinator
(director in PT2/Kepler speak); this is of course the point of
"actor-oriented modeling" PT2-style. 

For example in a process network (PN) workflow, maximal parallelism is
already achieved: 

-- any process WS2 sitting downstream of a long running web service
WS1 will have to wait for its inputs anyways: [WS1]-->[WS2]-->..  

-- any other processes WS3, WS4, ... on parallel branches will not be
slowed down by [WS1]: [WS3]-->[WS4]-->..

So it seems that the details of asynch communication can be hidden at
the WF level. The director, on the other hand, might be dealing with
the details of synch/asynch calls and might have different treatment
for each of synch WS, asynch WS, and soaplab WS ... right?

Comments?

Bertram

>>>>> "BL" == Bertram Ludaescher <ludaesch at sdsc.edu> writes:
BL> 
>>>>> "TO" == Tom Oinn <tmo at ebi.ac.uk> writes:
TO> 
TO> Efrat Jaeger wrote:
>>> Hi all,
>>> 
>>> One question that came up from the meeting was about the benefits of SoapLab
>>> services. Could you please elaborate on the advantages of wrapping a command
>>> line application as a soaplab service over standard web services.
TO> 
TO> Hi Efrat,
TO> 
TO> The primary advantage is exactly that - it's wrapping an existing asset 
TO> rather than coding a new one. Bioinformaticians tend to have significant 
TO> existing work in the form of shell scripts, native executables, PERL 
TO> etc, so we needed a way for these resources to be accessed remotely.
BL> 
BL> yep -- understood.
BL> 
TO> In addition, we get some level of descriptive information as a byproduct 
TO> of the soaplab service creation - free text description and 
TO> categorisation for now but we're enhancing soaplab to include semantic 
TO> metadata as well regarding input and output types and their 
TO> inter-relations. Soaplab exposes these metadata through an explicit 
TO> 'describe' operation which toolkits such as Taverna can use to get 
TO> information about the service above and beyond that available through 
TO> standard WSDL.
BL> 
BL> also good. 
BL> 
BL> Is it also conceivable to use a "WSDL++" for the same purposes?  For
BL> example, you might have a standard WSDL (so that standard WS clients
BL> can handle the WSDL as well) and then add the extra (ACD) information
BL> in whatever places you might put them "without harm" (if nothing else
BL> as a "formatted comment") ... 
BL> Wouldn't this combine the best of both worlds? 
BL> Or do the WSDLs you publish for soaplab services already contain the
BL> additional ACD info somewhere?
BL> 
TO> There are also arguably technical advantages from being able to set 
TO> inputs individually, most notably that we can have default values and 
TO> that we can request only the output data we're interested in in the case 
TO> of multiple potential return types. With a native web service we'd be 
TO> forced to supply all parameters (or have multiple versions of each 
TO> operation) and also forced to fetch all results in one operation, 
TO> possibly resulting in considerably more traffic than is required.
BL> 
BL> I see. Wouldn't the use of complex XML Schema types provide a possible 
BL> alternative route? For example a WS input, given as a complex XML
BL> Schema type could have optional fields, default values, and even
BL> "list valued" inputs etc... 
BL> 
BL> Seems that one problem with this is the mapping of such complex XML
BL> Schema types to the programming language types (say in Java)
BL> 
TO> We actually had (I'm not sure if it's around anywhere now) a workflow 
TO> which did exactly what you suggest - showed the single actor versus 
TO> multiple finer grained actors for two equivalent operations, there's a 
TO> diagram of the workflow here : 
TO> http://twiki.mygrid.info/twiki/pub/Mygrid/WorkFlow/workflow.bound.jpg 
TO> although this is obviously a rather elderly version.
BL> 
BL> I was looking at the image -- could you point out the which parts you
BL> mean here (i.e., where is the "standard" WS and were are the more
BL> fine-grained ones?)
BL> 
BL> Coming back to Efrat's original question: 
BL>     Looking at the soaplab java client, the invocation of a soaplab service
BL>     seems to be a sequence of three web service methods invocations, start, wait
BL>     and getResults. I was wondering why can't the native web service actor be
BL>     used for invoking soaplab services. 
BL> 
BL> Let me make a wild guess: with 'start' you launch a remote
BL> service. I guess 'wait' would do a polling (and if so, should probably
BL> be renamed!?), Then with getResults you retrieve the results.
BL> 
BL> Seems this is implementing a simple async protocol so that a client
BL> doesn't have to wait for the completion? 
BL> 
BL> Is that right?
BL> 
BL> An alternative approach in Kepler might be to push such low-level
BL> communication issues to the director level. For example, if a workflow 
BL> has something else to do during the 'wait', it would already do so
BL> (since it's a multi-threaded/multi-process network) Conversely, a
BL> downstream process (i.e., "behind" the 'wait') would have to block
BL> anyways until the result is available.
BL> Did I get that right? 
BL> 
BL> cheers and thanks again for the insightful discussion..
BL> 
BL> Bertram
BL> 
BL> PS Can I cross post this thread to the kepler-dev list? I think others 
BL> might benefit from this discussion as well (and might have
BL> addtl. feedback)