[seek-dev] EcoGrid Query Service: Instances and Factories

Bertram Ludaescher ludaesch at sdsc.edu
Tue Jan 20 20:50:31 PST 2004


A first look at that new WSRF seems to indicate that those folks try
to handle the "stateless" nature of web services.

Is this r to some of the problems we're running into in the EcoGrid,
i.e., that after an initial invocation of a (standard) web service, we
need to maintain some state information and send around handles to
subsequent invocations? 

Another question that came up in our local SEEK meeting today: 
How do you query the EcoGrid registry, when all the catalogs that are
connected say SRB, Metacat, DiGIR, Xanthoria all have different
catalog structures. 

Seems a data (catalog) integration layer is missing?

Or am I missing something? 

Bertram

>>>>> "MJ" == Matt Jones <jones at nceas.ucsb.edu> writes:
MJ> 
MJ> EcoGrid team,
MJ> I've been working on the registry issues, and have run into some issues 
MJ> with the way Jing and Bing are using the Grid "Factory" pattern.  As 
MJ> they have implmented it, each wrapped server (like a metacat or srb) 
MJ> exposes a "Factory" which has a "createInstance" method.  When called, a 
MJ> new endpoint is created that exposes the methods in our 
MJ> EcoGridQeuryLevelOne interface spec.  This endpoint is then queried by 
MJ> only that one client.  However, this pattern makes it very hard for us 
MJ> to discover where the EcoGridQueryLevelOne services reside, because they 
MJ> don't exist at the time that the client is querying the registry.
MJ> 
MJ> I don't see what we gain from using the factory pattern here at all 
MJ> (given our underlying server systems), so I am proposing we eliminate 
MJ> use of Factories and just stick to EcoGridQueryLevelOne instances.  I 
MJ> think that doing so would substnatially simplify the demands on clients 
MJ> in the service discovery pahse while clients are querying the registry. 
MJ>   The details aof how I arrived at that proposal with Jing are in the 
MJ> attached IRC conversation.
MJ> 
MJ> Please review this information and let me know if you think it is a good 
MJ> idea to eliminate the use of factories.  Thanks,
MJ> 
MJ> Matt
MJ> 
MJ> PS We're on a tight time schedule, so prompt feedback is appreciated.
MJ> -- 
MJ> -------------------------------------------------------------------
MJ> Matt Jones                                     jones at nceas.ucsb.edu
MJ> http://www.nceas.ucsb.edu/    Fax: 425-920-2439    Ph: 907-789-0496
MJ> National Center for Ecological Analysis and Synthesis (NCEAS)
MJ> University of California Santa Barbara
MJ> Interested in ecological informatics? http://www.ecoinformatics.org
MJ> -------------------------------------------------------------------
MJ> <matt> hey jing
MJ> <matt> on another issue
MJ> <matt> what's the status on GT3 installs on dev and pine?
MJ> <jing> hi, matt
MJ> <matt> and are the metacat/ecogrid query services there all up to date?
MJ> <jing> yes,
MJ> <jing> they are
MJ> <matt> why don't I see an "EcoGridQueryLevelOne" service in the ogsa/services list?
MJ> <matt> http://dev.nceas.ucsb.edu:8080/ogsa/services
MJ> <jing> one second
MJ> <matt> the one on dev advertises itself as a metacat service according to the wsdl:
MJ> <matt> http://dev.nceas.ucsb.edu:8080/ogsa/services/edu/ucsb/nceas/metacat/MetacatService?wsdl
MJ> <matt> and pine doesn't come up for me, although it used to yesterday
MJ> <matt> http://pine.nceas.ucsb.edu:8080/ogsa/services
MJ> <matt> gives me connection refused
MJ> <jing> yesterday, pine died and I restart it 
MJ> <jing> but I didn't start tomcat
MJ> <matt> yesterday, when i looked at pine there was a service named "null" that had a  query and get method
MJ> <jing> :)
MJ> <matt> but the wsdl wouldn't come up for it
MJ> <matt> so, are you sure they are both up to date :)  ?
MJ> <jing> you mean up ? sorry, I think you mean updated:)
MJ> <matt> sorry, i meant both
MJ> <matt> hard to test against one that isn't running
MJ> <jing> okay, I will start pine
MJ> <matt> i'm working on the registry stuff, and I need to register some existing ecogrid services
MJ> <jing> okay
MJ> <matt> so it would be good to get a metacat ecogrid interface running on pine and dev, and a srb one running somewhere too
MJ> <matt> then i can register all three and figure out how the service discovery code will work
MJ> <jing> good.
MJ> <matt> can you ask Bing to let us know the location for the srb instance?
MJ> <jing> I called him 5 minutes ago, but he is not in office. I will send hime an email
MJ> <matt> ok
MJ> <jing> pine is up
MJ> <matt> thanks
MJ> <jing> sure
MJ> <matt> again, i don't see the ecogrid service in the list
MJ> <matt> is this it: http://pine.nceas.ucsb.edu:8080/ogsa/services/org/ecoinformatics/ecogrid/MetacatFactoryService?wsdl
MJ> <matt> ?
MJ> <-- sam has quit (Quit: )
MJ> <jing> yes, this is the services,
MJ> <jing> I didn't ecogridquerylevelone
MJ> <jing> *call it*
MJ> <matt> the whole Factory thing's going to get in our way a bit
MJ> <matt> we never specified a factory interface for ecogrid
MJ> <matt> so if I want to find all running services that can do "EcoGridQueryLevelOne" methods, what do I look for in a registry?
MJ> <matt> there's no relationship between the facotry interface and the instance interface, right?
MJ> <jing> I think so
MJ> <matt> i had been planning on querying the registry for a wsdl of the same namespace as EcoGridQueryLevelOne, and getting the endpoint for that
MJ> <matt> but, if we use a factory, that endpoint is dynamically created
MJ> <matt> and so is exceedingly hard to locate at run time
MJ> <jing> The client can use the factory to create instance by it self
MJ> <matt> ok.
MJ> <jing> so just told the endpoint of factory
MJ> <matt> lets say i want to run an ecogrid "query"
MJ> <jing> our client has the ability
MJ> <matt> i know the namespace of the method i want, say its "urn:foo:EcoGridQueryNS
MJ> <matt> what do I ask the registry to find the factory?
MJ> <matt> ie, do factories know what the NS is for the instances they create?
MJ> <matt> i agree our client can do it, but only if it already knows the factory endpoint
MJ> <matt> what if it doesn't, and instead is trying to discover the endpoint from a registry at run time
MJ> <matt> ?
MJ> <matt> does the MetacatFactory and SRBFactory have the same namespace?
MJ> <matt> is it the smae namespace as EcoGridQueryLevelOne?
MJ> <jing> can we put the NS in regstry service data?
MJ> <jing> yes, they are same.
MJ> <matt> really?
MJ> <matt> that's weird
MJ> <matt> is it the smae implementation?
MJ> <jing> no, not same imple
MJ> <jing> what NS do you mean?
MJ> <matt> should it be called "EcoGridQueryLevelOneFactory"?
MJ> <matt> when you create an instance, it shouldn't matter if the instance wraps srb or metacat, because the query api is identical, right?
MJ> <jing> yes.
MJ> <matt> why did you decide we needed a factory?
MJ> <matt> the current metacat doesn't have this concept, right?
MJ> <jing> no
MJ> <jing> metacat doesn't.
MJ> <matt> without the factory, the process would be something like this:
MJ> <matt> client asks registry for services with NS foo, registry returns 6 endpoints, client queries endpoints (or registry) with findServiceData to see what they contain, client calls "query" on endpoints for selected services
MJ> <matt> WITH the factory, its harder:
MJ> <matt> client asks registry for services with NS fooFactory, registry returns 6 endpoints, client requests instances of foo NS be created from those endpoints and somehow gets endpoints for those, client queries endpoints (or registry) with findServiceData to see what they contain, client calls "query" on endpoints for selected services
MJ> <matt> overall, much more complicated process
MJ> <jing> yes, I agree with you.
MJ> <jing> but factory is real grid service.
MJ> <matt> what do we gain from using the Facotry pattern?
MJ> <matt> what do you mean by "real"?
MJ> <jing> it is not stateless.
MJ> <matt> stateless?
MJ> <matt> the MetacatStringService has state AFAIK
MJ> <jing> one second
MJ> <matt> ok
MJ> <jing> http://www.casa-sotomayor.net/gt3-tutorial/core/grid_services/index.html
MJ> <jing> http://www.casa-sotomayor.net/gt3-tutorial/core/grid_services/math_factory.html
MJ> <matt> ok, i read the first one in terms of the definition of 'real'
MJ> <matt> when they say stateless, they mean that running a method on the instance will not affect any subsequent runs of the method
MJ> <matt> when they say non-transient, they mean the service outlives the client
MJ> <matt> when you implement the metacatFactory, neither of these things change, right?
MJ> <jing> yes.
MJ> <matt> so, even though we use the Factory pattern, we're not actually acheiving what they want, because the underlying metacat server is shared among instances
MJ> <matt> at least that's my initial take on it
MJ> <matt> to do it their way, you would need to start a *separate* metacat server each time you started a new instance
MJ> <jing> we don't do that.
MJ> <matt> and that would intriduce some huge, fundamental problems, mainly in locking
MJ> <matt> ie, metacat assumes there's a one metacat per database releationship
MJ> <matt> so, to revisit my earlier question....
MJ> <matt> what do we gain from using the Facotry pattern?
MJ> <jing> :)
MJ> <matt> :)
MJ> <matt> does the srb start a new srb for every instance?
MJ> <jing> difficulity
MJ> <jing> I don't thinks so.
MJ> <matt> that was my guess too
MJ> <matt> the multiple instance thing makes some sense from a computation perspective
MJ> <matt> ie, run the same algorithm independently for lots of clients
MJ> <matt> but doesn't make sense when talking about exposing data, because we want the underlying data source to be shared
MJ> <matt> if metacat were'nt wrapped as a factory, we have serial access to the single instance, right?
MJ> <matt> ie, a client has to wait for other clients to complete before it will process?
MJ> <matt> or does tomcat somehow handle multithreading?
MJ> <jing> I think tomcat somehow handle it.
MJ> <matt> can two morpho clients simultaneously hit a single metacat instance now?
MJ> <jing> yes, i think so.
MJ> <matt> i wonder how that works.
MJ> <matt> it seems like it would be the major advantage of factories
MJ> <jing> multithreading.
MJ> <matt> and is just what raja was talking about the other day on the phone
MJ> <matt> yeah, probably
MJ> <matt> you think tomcat just starts up a new thread and calls "doPost" with the HTTPRequest object?
MJ> <jing> yes.
MJ> <matt> that must be what happens
MJ> <matt> we have some synchronization code in metacat to avoid resource conflicts, right?
MJ> <matt> iirc, on the accession # generator
MJ> <matt> and maybe other places
MJ> <jing> yes.
MJ> <matt> how do we avoid 2 clients trying to modify the same docid at the same time?
MJ> <matt> do we rely on db locking?
MJ> <jing> I forgot it :)
MJ> <matt> yeah, me too.
MJ> <matt> probably something we should explore
MJ> <jing> right.
MJ> <matt> my guess: we've got a latent bug that doesn't reveal itself because its rare for more than one person to write to an eml file
MJ> <jing> sounds reseaonable.
MJ> <matt> ok. well, lets stew on this overnight and get back in touch about the whole Factory/registry thing tomorrow
MJ> <jing> sure.
MJ> <matt> i think we can eliminate the factories without loss of functionality, and gain some simplicity
MJ> <matt> but i'll let you digest it some more and see if you think of something
MJ> <matt> maybe we shoudl send this conversation to Bing and Dave and others too?
MJ> <jing> yes, let them know.
MJ> <matt> ok, i'll send an email...
MJ> <matt> thanks
MJ> <jing> thank you!



More information about the Seek-dev mailing list