[kepler-dev] (no subject)

Tue Jun 15 07:31:31 PDT 2004

> I see. Indeed, workflows (or dataflows for that matter) might "smell"
> a bit more procedural than what one would like. But don't you think it
> is a suitable formalism to describe analytical processes and
> procedures? What else would one use? Sometimes (often) a scientist
> might know exactly the algorithm, dataflow network, procedure she
> wants to execute. Why not allow the scientist to express the desired
> flow as such? 

Yes - but I don't think we should let a less-than-entirely-typical
use case determine the design. In fact, I think a better case can be
made for the semantically aware workflow actually BEING (embodying) a
concept - that can be modelled through a conceptual approach. It doesn't
take much -even conceptually- to substitute the connection arrows with
semantically aware relationships. The mechanics of type checking
(semantic and otherwise, if I'm right in thinking that semantic types
compatibility subsumes storage type compatibility) reduces to that of
instance validation and is naturally done by SMS. And what you get is a
nice logical diagram that AMS doesn't have to work hard to transpose
into a workflow (the [unpublished] IMA design may illustrate that if
interested). 

Anyway - I also don't see why the user may not see or directly create
the workflow if preferred. I just think that most of our users will
prefer another approach - and that the other approach is cleaner and
more SEEK-like. Coming out of the closet, what I'm thinking of as a
possible, more intuitive visual/processing environment for SEEK is a
two-level view in kepler, where the top view is essentially the IMA, and
the second level (a tabbed UI can be used to switch, like in STELLA) is
Kepler - users may work at whatever level they want, but I think the
roles of SMS and AMS are cleanly separable, and that's what I think is
still a bit muddy in this discussion. I tend to see the separation as a design
advantage. BTW we also discussed to have a third "view" for a control view with
sliders for parameters etc - so it may fit right in, graphical design-wise.

Here an IMA-inspired idea of how that semantic view may look like:

The top-level semantic view is essentially a GrOWL-like environment
(Kepler's UI works fine, too) to instantiate a concept that incarnates
the RESULT of your workflow, using the proper subclass: e.g. "ecological
diversity" is abstract and cannot be instantiated, but "Shannon
diversity" can - and has associated code or declarations that will
create the proper actors when the instance is created. So the user
decides to create an instance of Shannon diversity (or Simpson
diversity), and the semantic editor checks that all relationships are
satisfied and conformant (again, a lot of IMA details for discussion of
the "conformance" concept if interested - it's a bit more than isa: for
the semantic type having to do with space/time/classification
compatibility and representation), giving access to the Ecogrid to
retrieve instances of related objects to link into an instance. The
semantic equivalent of the "run" button is the "create" button - which
creates the instance and fires up the SMS component that consults the KB
about what actors calculate the states of each instance and connects
them together, creating the proper topological sort and iteration
sequence according to what time, space etc is adopted by the linked
objects, and inserting transformations as needed. The default
visualization mode of the final concept's state is defined by its
semantics and its adopted observation context (do you REALLY want your
innocent ecologist to have to create a visualization actor and connect
it up?)

Additional benefit: given that simpson's index and shannon use the same
data ( == the concept has the same semantics), the user may later calculate it
by simply switching the parent class of the main concept (an operation that
can be presented obviously under a different name than "switching the class").
BTW there are other additional benefits but I've written enough - mainly the 
"concept-based data mining" I clumsily outlined in the Word rant I attached
earlier.

Back to workflows, It does not take much to envision a conceptual approach like
this for
any explicit workflow - just give names to the arrows, make them
cardinality one, and define the actors in terms of what their state is,
not what they do (usually it's the same thing). We may work out some
workflows as an exercise. In my experience, few workflows need that,
because they always serve to calculate something and that something is
enough to characterize it all. Even with the fully worked out workflow,
a difference is that you don't have to loop over a dataseries or the
polygons of a map, or worry about arrays. If your innocent ecologist
wants to (and believe me, s/he doesn't) she can do it in the workflow
view...

The nice thing here is that, messy as it may seem, everything is
actually very clean if done this way: your model, pipeline or whatever,
can be expressed and stored in RDF, point to OWL ontologies, and be
translated at will into whatever workflow language you need as long as
the actor collection is characterized properly. What I'm implementing in
the IMA is a "workflow policy" - a template class that can be
substituted in a runtime system to create e.g. an "interpreter" that
calculates stuff right away (e.g. when modeling collaboratively through
the web), or compile the workflow in very efficient, template-based C++
and gives you an executable for those large spatial models my folks like
to make. A good intermediate representation goes a long way!

Cheers
ferd

> FV> 
> FV> Maybe we're mixing the sides up somewhat, and if so, is this ok... or is
> FV> it going to postpone the beautiful "moment of clarity" when we all
> FV> realize that we've all been thinking the same thing all along?
> 
> probably that moment of clarity and realization will come along
> sometime soon ... if not, we need further research .. that's also good 
> =B-)
> 
> cheers
> 
> Bertram
> 
> FV> 
> FV> Cheers,
> FV> ferdinand
> FV> 
> FV> 
> >> - the event consumption/production type (useful for scheduling a la
> >> SDF)
> >> - the communication type (through the Ptolemy/Kepler client, directly
> >> via say FTP or HTTP) etc
> >> 
> >> At some levels of modeling ones does explicitely hide such detail from 
> >> the modeler/user but at other levels this might be a good way of
> >> overcoming some scalability issues (if you have terabyte data streams
> >> you want them to go directly where they need to)
> >> 
> >> A related problem of web servies (as actors) is that they send results 
> >> back to the caller (Kepler) and don't forward them to the subsequent
> >> actor making large data transfers virtually  impossible..
> >> 
> >> A simple extension to the web service model (anyone knows whether
> >> that's already done???) would allow for data to include *references*
> >> so that a process would be able return to Kepler just a reference to
> >> the result data and that reference would be passed on to the consuming 
> >> actor who then understands how to derefernce it. This simple
> >> extension seems to be an easy solution to what we called before the
> >> 3rd party transfer problem:
> >> 
> --> [Actor A] ---> [ Actor B] --> ...
> >> 
> >> To stream large data set D from A to B w/o going through
> >> Ptolemy/Kepler one can simply send instead a handle &D and then B,
> >> upon receiving &D, understands and dereferenes it by calling the
> >> appropriate protocol (FTP/gridFTP, HTTP, SRB,...)
> >> 
> >> Note that there are already explicit Kepler actors (SRBread/SRBwrite,
> >> gridFTP) for large data transfer. More elegant would it be to just
> >> send handles in the form, e.g., dereference(http://...<ref-to-D>..)
> >> Note that the special tag 'derefence' is needed since not every URL
> >> should be dereferenced (a URL can be perfectly valid data all by
> >> itself)
> >> 
> >> It would be good if we could (a) define our extensions in line with
> >> web services extensions that deal with dereferencing message parts (if 
> >> such exists) and (b) can work on a joint
> >> Kepler/Ptolemy/Roadnet/SEEK/SDM etc approach (in fact, Kepler is such
> >> a joint forum for co-designing this together..)
> >> 
> >> Bertram
> >> 
> >> PS Tobin: I recently met Kent and heard good news about ORB access in
> >> Kepler already. You can also check with Efrat at SDSC on 3rd party
> >> transfer issues while you're at SDSC..
> >> 
> >> >>>>> "EAL" == Edward A Lee <eal at eecs.berkeley.edu> writes:
> EAL> 
> EAL> At 05:48 PM 6/11/2004 -0700, Tobin Fricke wrote:
> >> >> A basic question I have is, is there a defined network transport for
> >> >> Ptolemy relations?  I expect that this question isn't really
well-formed
> >> >> as I still have some reading to do on how relations actually work.
> >> >> Nonetheless, there is the question of, if we have different instances
of
> >> >> Ptolemy talking to each other across the network, how are the data
streams
> >> >> transmitted?  In our case one option is to use the ORB as the stream
> >> >> transport, equipping each sub-model with ORB source and ORB sink
> >> >> components; and perhaps this could be done implicitly to automatically
> >> >> distribute a model across the network.  But this line of thinking is
> >> >> strongly tied to the idea of data streams and may not be appropriate
for
> >> >> the more general notion of relations in Ptolemy.
> EAL> 
> EAL> We have done quite a bit of experimentation with distributed
> EAL> Ptolemy II models, but haven't completely settled on any one
> EAL> approach... Most of the recent work in this area has been
> EAL> done by Yang Zhao, whom I've cc'd for additional comments...
> EAL> Here are some notes:
> EAL> 
> EAL> - A model can contain a component that is defined elsewhere
> EAL> on the network, referenced at a URL.  There is a demo
> EAL> in the quick tour that runs a submodel that sits on our
> EAL> web server.
> EAL> 
> EAL> - The Corba library provides a mechanism for transporting
> EAL> tokens from one model to another using either push or
> EAL> pull style interactions.  The software is in the
> EAL> ptolemy.actor.corba package, but there are currently
> EAL> no good (easily run) demos, and documentation is sparse.
> EAL> 
> EAL> - The MobileModel actor accepts a model definition on an
> EAL> input port and then executes that model.  Yang has used
> EAL> this with the Corba actors to build models where one
> EAL> model constructs another model and sends it to another
> EAL> machine on the network to execute.
> EAL> 
> EAL> - The JXTA library (ptolemy.actor.lib.jxta) uses Sun's
> EAL> XML-based P2P mechanism.  Yang has used this to construct
> EAL> a distributed chat room application.
> EAL> 
> EAL> - The ptolemy.actor.lib.net has two actors DatagramReader
> EAL> and DatagramWriter that provide low-level mechanisms for
> EAL> models to communicate over the net.  Three or four years
> EAL> ago Win Williams used this to created a distributed model
> EAL> where two computers on the net were connected to
> EAL> motor controllers and users could "arm wrestle" over
> EAL> the network ... when one of the users turned his motor,
> EAL> the other motor would turn, and they could fight each
> EAL> other, trying to turn the motors in opposite directions.
> EAL> 
> EAL> - Some years ago we also did some experimentation with
> EAL> Sun's JINI P2P mechanism, but this has been largely
> EAL> supplanted by JXTA.
> EAL> 
> EAL> - The security library (ptolemy.actor.lib.security)
> EAL> provides encryption and decryption and authentication
> EAL> based on digital signatures.
> EAL> 
> EAL> Most of these mechanisms have not been well packaged,
> EAL> and we haven't worked out the "lifecycle management" issues
> EAL> (how to start up a distributed model systematically, how
> EAL> to manage network failures).
> EAL> 
> EAL> In my view, working out these issues is a top priority...
> EAL> I would be delighted to work with you or anyone else on this...
> EAL> 
> EAL> Edward
> EAL> 
> EAL> 
> EAL> 
> EAL> 
> EAL> 
> EAL> ------------
> EAL> Edward A. Lee, Professor
> EAL> 518 Cory Hall, UC Berkeley, Berkeley, CA 94720
> EAL> phone: 510-642-0455, fax: 510-642-2739
> EAL> eal at eecs.Berkeley.EDU, http://ptolemy.eecs.berkeley.edu/~eal
> EAL> 
> EAL> _______________________________________________
> EAL> kepler-dev mailing list
> EAL> kepler-dev at ecoinformatics.org
> EAL> http://www.ecoinformatics.org/mailman/listinfo/kepler-dev
> >> _______________________________________________
> >> kepler-dev mailing list
> >> kepler-dev at ecoinformatics.org
> >> http://www.ecoinformatics.org/mailman/listinfo/kepler-dev
> FV> -- 
--