[kepler-dev] introduction & distributed ptolemy
Bertram Ludaescher
ludaesch at sdsc.edu
Tue Jun 15 03:38:18 PDT 2004
Edward:
That sounds interesting. I agree that references are not without
problems. However for certain applications (very large and distributed
data) they may be the only choice to avoid annecessary traffic.
The point about not allowing modifications is a good one. I had
implicitely assumed (but that should be made explicit) that references
are "clean" ie referentially transparent. A new id/version-id would be
created for updated data. It is a challenge in its own right to deal
with huge datasets with lots of small variants (a RCS /delta style
approach might help; the grid community is also doing sth there I
suppose).
Bertram
>>>>> "EAL" == Edward A Lee <eal at eecs.berkeley.edu> writes:
EAL>
EAL> A major potential pitfall in passing around references to data
EAL> instead of the data is the accidental introduction of nondeterminism.
EAL> We have to do this in a way that avoids this nondeterminism.
EAL> Consider for example a model that broadcasts the same reference
EAL> to two distinct destinations, and those destinations use the reference
EAL> to modify the data... Then, the order in which the destinations happen
EAL> to get to the data will change the results...
EAL>
EAL> One solution is to prohibit modification of the data (this is
EAL> why Ptolemy II tokens are immutable). But there are other
EAL> solutions.
EAL>
EAL> In Ptolemy Classic, this was handled using reference counts, which
EAL> automatically determine when it is necessary to copy the data.
EAL> But it's still a conservative approach.
EAL>
EAL> A third solution is to go with a different model of computation
EAL> that makes the ordering explicit. To me, this is the most interesting
EAL> one, and the one that effectively amounts to language design...
EAL>
EAL> Edward
EAL>
EAL>
EAL> At 08:01 AM 6/14/2004 -0700, Bertram Ludaescher wrote:
EAL>
>> Indeed the problem of efficient data transfer between actors has come
>> up several times before (including on kepler-dev I think).
>>
>> The problem we face e.g. as part of Roadnet (Tobin's current work) but
>> also as part of data-intesnive workflows (SDM/SPA and other projects)
>> is that actor communication should not involve physical token flow
>> through the "master application" (the Ptolemy/Kepler "console" running
>> on an "operator's" laptop say ..) but be directly between the actor
>> processes. Ideally the ports of an actor should (or at least could)
>> have multiple types:
>> - the data type (including say XML Schema type),
>> - the semantic type (e.g. a concept expression describing more formally
>> what else might be known about the data flowing through the port)
>> [[aside for Ferdinando: our "reductionist/separatist approach" does not
>> preclude forever an integrated modeling solution - it's just bottom up
>> to get sth useful soon/in finite time ;-]]
>> - the event consumption/production type (useful for scheduling a la
>> SDF)
>> - the communication type (through the Ptolemy/Kepler client, directly
>> via say FTP or HTTP) etc
>>
>> At some levels of modeling ones does explicitely hide such detail from
>> the modeler/user but at other levels this might be a good way of
>> overcoming some scalability issues (if you have terabyte data streams
>> you want them to go directly where they need to)
>>
>> A related problem of web servies (as actors) is that they send results
>> back to the caller (Kepler) and don't forward them to the subsequent
>> actor making large data transfers virtually impossible..
>>
>> A simple extension to the web service model (anyone knows whether
>> that's already done???) would allow for data to include *references*
>> so that a process would be able return to Kepler just a reference to
>> the result data and that reference would be passed on to the consuming
>> actor who then understands how to derefernce it. This simple
>> extension seems to be an easy solution to what we called before the
>> 3rd party transfer problem:
>>
--> [Actor A] ---> [ Actor B] --> ...
>>
>> To stream large data set D from A to B w/o going through
>> Ptolemy/Kepler one can simply send instead a handle &D and then B,
>> upon receiving &D, understands and dereferenes it by calling the
>> appropriate protocol (FTP/gridFTP, HTTP, SRB,...)
>>
>> Note that there are already explicit Kepler actors (SRBread/SRBwrite,
>> gridFTP) for large data transfer. More elegant would it be to just
>> send handles in the form, e.g., dereference(http://...<ref-to-D>..)
>> Note that the special tag 'derefence' is needed since not every URL
>> should be dereferenced (a URL can be perfectly valid data all by
>> itself)
>>
>> It would be good if we could (a) define our extensions in line with
>> web services extensions that deal with dereferencing message parts (if
>> such exists) and (b) can work on a joint
>> Kepler/Ptolemy/Roadnet/SEEK/SDM etc approach (in fact, Kepler is such
>> a joint forum for co-designing this together..)
>>
>> Bertram
>>
>> PS Tobin: I recently met Kent and heard good news about ORB access in
>> Kepler already. You can also check with Efrat at SDSC on 3rd party
>> transfer issues while you're at SDSC..
>>
>> >>>>> "EAL" == Edward A Lee <eal at eecs.berkeley.edu> writes:
EAL>
EAL> At 05:48 PM 6/11/2004 -0700, Tobin Fricke wrote:
>> >> A basic question I have is, is there a defined network transport for
>> >> Ptolemy relations? I expect that this question isn't really well-formed
>> >> as I still have some reading to do on how relations actually work.
>> >> Nonetheless, there is the question of, if we have different instances of
>> >> Ptolemy talking to each other across the network, how are the data streams
>> >> transmitted? In our case one option is to use the ORB as the stream
>> >> transport, equipping each sub-model with ORB source and ORB sink
>> >> components; and perhaps this could be done implicitly to automatically
>> >> distribute a model across the network. But this line of thinking is
>> >> strongly tied to the idea of data streams and may not be appropriate for
>> >> the more general notion of relations in Ptolemy.
EAL>
EAL> We have done quite a bit of experimentation with distributed
EAL> Ptolemy II models, but haven't completely settled on any one
EAL> approach... Most of the recent work in this area has been
EAL> done by Yang Zhao, whom I've cc'd for additional comments...
EAL> Here are some notes:
EAL>
EAL> - A model can contain a component that is defined elsewhere
EAL> on the network, referenced at a URL. There is a demo
EAL> in the quick tour that runs a submodel that sits on our
EAL> web server.
EAL>
EAL> - The Corba library provides a mechanism for transporting
EAL> tokens from one model to another using either push or
EAL> pull style interactions. The software is in the
EAL> ptolemy.actor.corba package, but there are currently
EAL> no good (easily run) demos, and documentation is sparse.
EAL>
EAL> - The MobileModel actor accepts a model definition on an
EAL> input port and then executes that model. Yang has used
EAL> this with the Corba actors to build models where one
EAL> model constructs another model and sends it to another
EAL> machine on the network to execute.
EAL>
EAL> - The JXTA library (ptolemy.actor.lib.jxta) uses Sun's
EAL> XML-based P2P mechanism. Yang has used this to construct
EAL> a distributed chat room application.
EAL>
EAL> - The ptolemy.actor.lib.net has two actors DatagramReader
EAL> and DatagramWriter that provide low-level mechanisms for
EAL> models to communicate over the net. Three or four years
EAL> ago Win Williams used this to created a distributed model
EAL> where two computers on the net were connected to
EAL> motor controllers and users could "arm wrestle" over
EAL> the network ... when one of the users turned his motor,
EAL> the other motor would turn, and they could fight each
EAL> other, trying to turn the motors in opposite directions.
EAL>
EAL> - Some years ago we also did some experimentation with
EAL> Sun's JINI P2P mechanism, but this has been largely
EAL> supplanted by JXTA.
EAL>
EAL> - The security library (ptolemy.actor.lib.security)
EAL> provides encryption and decryption and authentication
EAL> based on digital signatures.
EAL>
EAL> Most of these mechanisms have not been well packaged,
EAL> and we haven't worked out the "lifecycle management" issues
EAL> (how to start up a distributed model systematically, how
EAL> to manage network failures).
EAL>
EAL> In my view, working out these issues is a top priority...
EAL> I would be delighted to work with you or anyone else on this...
EAL>
EAL> Edward
EAL>
EAL>
EAL>
EAL>
EAL>
EAL> ------------
EAL> Edward A. Lee, Professor
EAL> 518 Cory Hall, UC Berkeley, Berkeley, CA 94720
EAL> phone: 510-642-0455, fax: 510-642-2739
EAL> eal at eecs.Berkeley.EDU, http://ptolemy.eecs.berkeley.edu/~eal
EAL>
EAL> _______________________________________________
EAL> kepler-dev mailing list
EAL> kepler-dev at ecoinformatics.org
EAL> http://www.ecoinformatics.org/mailman/listinfo/kepler-dev
EAL>
EAL> ------------
EAL> Edward A. Lee, Professor
EAL> 518 Cory Hall, UC Berkeley, Berkeley, CA 94720
EAL> phone: 510-642-0455, fax: 510-642-2739
EAL> eal at eecs.Berkeley.EDU, http://ptolemy.eecs.berkeley.edu/~eal
More information about the Kepler-dev
mailing list