[seek-dev] EcoGrid Query Service: Instances and Factories

Jing Tao tao at nceas.ucsb.edu
Wed Jan 21 09:13:09 PST 2004

Hi, Matt:

It seems everyone agree to the approach which uses instance rather than 
factory. Let's do it.

Today I am starting the changes base on the resultset.xsd. When it is 
done, I will start change the factory. Since I will take Thursday off and 
hope everything will be back at the late of Friday.



> >>>>> "MJ" == Matt Jones <jones at nceas.ucsb.edu> writes:
> MJ> 
> MJ> EcoGrid team,
> MJ> I've been working on the registry issues, and have run into some issues 
> MJ> with the way Jing and Bing are using the Grid "Factory" pattern.  As 
> MJ> they have implmented it, each wrapped server (like a metacat or srb) 
> MJ> exposes a "Factory" which has a "createInstance" method.  When called, a 
> MJ> new endpoint is created that exposes the methods in our 
> MJ> EcoGridQeuryLevelOne interface spec.  This endpoint is then queried by 
> MJ> only that one client.  However, this pattern makes it very hard for us 
> MJ> to discover where the EcoGridQueryLevelOne services reside, because they 
> MJ> don't exist at the time that the client is querying the registry.
> MJ> 
> MJ> I don't see what we gain from using the factory pattern here at all 
> MJ> (given our underlying server systems), so I am proposing we eliminate 
> MJ> use of Factories and just stick to EcoGridQueryLevelOne instances.  I 
> MJ> think that doing so would substnatially simplify the demands on clients 
> MJ> in the service discovery pahse while clients are querying the registry. 
> MJ>   The details aof how I arrived at that proposal with Jing are in the 
> MJ> attached IRC conversation.
> MJ> 
> MJ> Please review this information and let me know if you think it is a good 
> MJ> idea to eliminate the use of factories.  Thanks,
> MJ> 
> MJ> Matt
> MJ> 
> MJ> PS We're on a tight time schedule, so prompt feedback is appreciated.
> MJ> -- 
> MJ> -------------------------------------------------------------------
> MJ> Matt Jones                                     jones at nceas.ucsb.edu
> MJ> http://www.nceas.ucsb.edu/    Fax: 425-920-2439    Ph: 907-789-0496
> MJ> National Center for Ecological Analysis and Synthesis (NCEAS)
> MJ> University of California Santa Barbara
> MJ> Interested in ecological informatics? http://www.ecoinformatics.org
> MJ> -------------------------------------------------------------------
> MJ> <matt> hey jing
> MJ> <matt> on another issue
> MJ> <matt> what's the status on GT3 installs on dev and pine?
> MJ> <jing> hi, matt
> MJ> <matt> and are the metacat/ecogrid query services there all up to date?
> MJ> <jing> yes,
> MJ> <jing> they are
> MJ> <matt> why don't I see an "EcoGridQueryLevelOne" service in the ogsa/services list?
> MJ> <matt> http://dev.nceas.ucsb.edu:8080/ogsa/services
> MJ> <jing> one second
> MJ> <matt> the one on dev advertises itself as a metacat service according to the wsdl:
> MJ> <matt> http://dev.nceas.ucsb.edu:8080/ogsa/services/edu/ucsb/nceas/metacat/MetacatService?wsdl
> MJ> <matt> and pine doesn't come up for me, although it used to yesterday
> MJ> <matt> http://pine.nceas.ucsb.edu:8080/ogsa/services
> MJ> <matt> gives me connection refused
> MJ> <jing> yesterday, pine died and I restart it 
> MJ> <jing> but I didn't start tomcat
> MJ> <matt> yesterday, when i looked at pine there was a service named "null" that had a  query and get method
> MJ> <jing> :)
> MJ> <matt> but the wsdl wouldn't come up for it
> MJ> <matt> so, are you sure they are both up to date :)  ?
> MJ> <jing> you mean up ? sorry, I think you mean updated:)
> MJ> <matt> sorry, i meant both
> MJ> <matt> hard to test against one that isn't running
> MJ> <jing> okay, I will start pine
> MJ> <matt> i'm working on the registry stuff, and I need to register some existing ecogrid services
> MJ> <jing> okay
> MJ> <matt> so it would be good to get a metacat ecogrid interface running on pine and dev, and a srb one running somewhere too
> MJ> <matt> then i can register all three and figure out how the service discovery code will work
> MJ> <jing> good.
> MJ> <matt> can you ask Bing to let us know the location for the srb instance?
> MJ> <jing> I called him 5 minutes ago, but he is not in office. I will send hime an email
> MJ> <matt> ok
> MJ> <jing> pine is up
> MJ> <matt> thanks
> MJ> <jing> sure
> MJ> <matt> again, i don't see the ecogrid service in the list
> MJ> <matt> is this it: http://pine.nceas.ucsb.edu:8080/ogsa/services/org/ecoinformatics/ecogrid/MetacatFactoryService?wsdl
> MJ> <matt> ?
> MJ> <-- sam has quit (Quit: )
> MJ> <jing> yes, this is the services,
> MJ> <jing> I didn't ecogridquerylevelone
> MJ> <jing> *call it*
> MJ> <matt> the whole Factory thing's going to get in our way a bit
> MJ> <matt> we never specified a factory interface for ecogrid
> MJ> <matt> so if I want to find all running services that can do "EcoGridQueryLevelOne" methods, what do I look for in a registry?
> MJ> <matt> there's no relationship between the facotry interface and the instance interface, right?
> MJ> <jing> I think so
> MJ> <matt> i had been planning on querying the registry for a wsdl of the same namespace as EcoGridQueryLevelOne, and getting the endpoint for that
> MJ> <matt> but, if we use a factory, that endpoint is dynamically created
> MJ> <matt> and so is exceedingly hard to locate at run time
> MJ> <jing> The client can use the factory to create instance by it self
> MJ> <matt> ok.
> MJ> <jing> so just told the endpoint of factory
> MJ> <matt> lets say i want to run an ecogrid "query"
> MJ> <jing> our client has the ability
> MJ> <matt> i know the namespace of the method i want, say its "urn:foo:EcoGridQueryNS
> MJ> <matt> what do I ask the registry to find the factory?
> MJ> <matt> ie, do factories know what the NS is for the instances they create?
> MJ> <matt> i agree our client can do it, but only if it already knows the factory endpoint
> MJ> <matt> what if it doesn't, and instead is trying to discover the endpoint from a registry at run time
> MJ> <matt> ?
> MJ> <matt> does the MetacatFactory and SRBFactory have the same namespace?
> MJ> <matt> is it the smae namespace as EcoGridQueryLevelOne?
> MJ> <jing> can we put the NS in regstry service data?
> MJ> <jing> yes, they are same.
> MJ> <matt> really?
> MJ> <matt> that's weird
> MJ> <matt> is it the smae implementation?
> MJ> <jing> no, not same imple
> MJ> <jing> what NS do you mean?
> MJ> <matt> should it be called "EcoGridQueryLevelOneFactory"?
> MJ> <matt> when you create an instance, it shouldn't matter if the instance wraps srb or metacat, because the query api is identical, right?
> MJ> <jing> yes.
> MJ> <matt> why did you decide we needed a factory?
> MJ> <matt> the current metacat doesn't have this concept, right?
> MJ> <jing> no
> MJ> <jing> metacat doesn't.
> MJ> <matt> without the factory, the process would be something like this:
> MJ> <matt> client asks registry for services with NS foo, registry returns 6 endpoints, client queries endpoints (or registry) with findServiceData to see what they contain, client calls "query" on endpoints for selected services
> MJ> <matt> WITH the factory, its harder:
> MJ> <matt> client asks registry for services with NS fooFactory, registry returns 6 endpoints, client requests instances of foo NS be created from those endpoints and somehow gets endpoints for those, client queries endpoints (or registry) with findServiceData to see what they contain, client calls "query" on endpoints for selected services
> MJ> <matt> overall, much more complicated process
> MJ> <jing> yes, I agree with you.
> MJ> <jing> but factory is real grid service.
> MJ> <matt> what do we gain from using the Facotry pattern?
> MJ> <matt> what do you mean by "real"?
> MJ> <jing> it is not stateless.
> MJ> <matt> stateless?
> MJ> <matt> the MetacatStringService has state AFAIK
> MJ> <jing> one second
> MJ> <matt> ok
> MJ> <jing> http://www.casa-sotomayor.net/gt3-tutorial/core/grid_services/index.html
> MJ> <jing> http://www.casa-sotomayor.net/gt3-tutorial/core/grid_services/math_factory.html
> MJ> <matt> ok, i read the first one in terms of the definition of 'real'
> MJ> <matt> when they say stateless, they mean that running a method on the instance will not affect any subsequent runs of the method
> MJ> <matt> when they say non-transient, they mean the service outlives the client
> MJ> <matt> when you implement the metacatFactory, neither of these things change, right?
> MJ> <jing> yes.
> MJ> <matt> so, even though we use the Factory pattern, we're not actually acheiving what they want, because the underlying metacat server is shared among instances
> MJ> <matt> at least that's my initial take on it
> MJ> <matt> to do it their way, you would need to start a *separate* metacat server each time you started a new instance
> MJ> <jing> we don't do that.
> MJ> <matt> and that would intriduce some huge, fundamental problems, mainly in locking
> MJ> <matt> ie, metacat assumes there's a one metacat per database releationship
> MJ> <matt> so, to revisit my earlier question....
> MJ> <matt> what do we gain from using the Facotry pattern?
> MJ> <jing> :)
> MJ> <matt> :)
> MJ> <matt> does the srb start a new srb for every instance?
> MJ> <jing> difficulity
> MJ> <jing> I don't thinks so.
> MJ> <matt> that was my guess too
> MJ> <matt> the multiple instance thing makes some sense from a computation perspective
> MJ> <matt> ie, run the same algorithm independently for lots of clients
> MJ> <matt> but doesn't make sense when talking about exposing data, because we want the underlying data source to be shared
> MJ> <matt> if metacat were'nt wrapped as a factory, we have serial access to the single instance, right?
> MJ> <matt> ie, a client has to wait for other clients to complete before it will process?
> MJ> <matt> or does tomcat somehow handle multithreading?
> MJ> <jing> I think tomcat somehow handle it.
> MJ> <matt> can two morpho clients simultaneously hit a single metacat instance now?
> MJ> <jing> yes, i think so.
> MJ> <matt> i wonder how that works.
> MJ> <matt> it seems like it would be the major advantage of factories
> MJ> <jing> multithreading.
> MJ> <matt> and is just what raja was talking about the other day on the phone
> MJ> <matt> yeah, probably
> MJ> <matt> you think tomcat just starts up a new thread and calls "doPost" with the HTTPRequest object?
> MJ> <jing> yes.
> MJ> <matt> that must be what happens
> MJ> <matt> we have some synchronization code in metacat to avoid resource conflicts, right?
> MJ> <matt> iirc, on the accession # generator
> MJ> <matt> and maybe other places
> MJ> <jing> yes.
> MJ> <matt> how do we avoid 2 clients trying to modify the same docid at the same time?
> MJ> <matt> do we rely on db locking?
> MJ> <jing> I forgot it :)
> MJ> <matt> yeah, me too.
> MJ> <matt> probably something we should explore
> MJ> <jing> right.
> MJ> <matt> my guess: we've got a latent bug that doesn't reveal itself because its rare for more than one person to write to an eml file
> MJ> <jing> sounds reseaonable.
> MJ> <matt> ok. well, lets stew on this overnight and get back in touch about the whole Factory/registry thing tomorrow
> MJ> <jing> sure.
> MJ> <matt> i think we can eliminate the factories without loss of functionality, and gain some simplicity
> MJ> <matt> but i'll let you digest it some more and see if you think of something
> MJ> <matt> maybe we shoudl send this conversation to Bing and Dave and others too?
> MJ> <jing> yes, let them know.
> MJ> <matt> ok, i'll send an email...
> MJ> <matt> thanks
> MJ> <jing> thank you!
> _______________________________________________
> seek-dev mailing list
> seek-dev at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/seek-dev

Jing Tao
National Center for Ecological
Analysis and Synthesis (NCEAS)
735 State St. Suite 204
Santa Barbara, CA 93101

More information about the Seek-dev mailing list