[seek-kr-sms] Re: Transformation Steps (Bug 1070)

Fri Nov 7 10:44:21 PST 2003

Hi, 

I had some comments as well, which I intersperse below.

- Shawn

On Thu, 6 Nov 2003, Matt Jones wrote:

> Hey Rod,
> 
> You raise some interesting points about transformation.  We really 
> haven't talked through the implementation strategies very well.  I agree 
> that implementation can impose some major constraints on design :)  So 
> its about time that we considered it.  I forwarded this to seek-kr-sms 
> so that Bertram and Shawn and Rich could benefit from the conversation 
> as well.  The bug describing the need for a transformation system 
> (http://bugzilla.ecoinformatics.org/show_bug.cgi?id=1070) is really just 
> a placeholder for what we need -- a thorough design proposal for a 
> transformation system.  My assumption is that Bertram, Shawn, and the 
> SMS group are mainly responsible for that design and implementation, so 
> I have reassigned the bug to Shawn.
> 
> My comments inline...
> 
> Rod Spears wrote:
> > Matt,
> > 
> > I have been doing a lot of thinking (and a little reading) about SM. It 
> > appears there is all this research and theories on how to match up 
> > certain "nodes" between one or more ontologies. And there are these 
> > special algorithms for doing this kind of "matching". Which I am 
> > assuming is a quick summary of Bertram's research/field of study. There 
> > also seems to be some software systems already working that does this. 
> > Is this correct? Has Bertram written software that works? Or is it still 
> > being developed?
> 
> There is a lot of software that does reasoning.  Much of it is 
> proprietary.  I don't think Bertram has written these engines, but I 
> could easily be wrong.  Currently he seems to prefer systems like Prolog 
> for developing prototypes -- I'm not sure if this will scale to our 
> application, but we'll see.

You might want to take a look at the slides that we presented in Santa 
Barbara a few weeks ago.  (Currently, we don't have a write-up of the 
ideas presented there, but we are working towards a few papers that 
describe these ideas in more detail.)  You can get the slides at:

http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/seek/projects/kr-sms/presentations/SantaBarbaraOct2003/semtypes-oct23.ppt

Basically, there are two issues of concern.  One is structural 
transformation, e.g., converting datasets and other information passed 
among components so that we can run a "scientific pipeline."  These 
structural transformations can be fairly simple to really complex. We want 
to, when possible, perform automatic transformation so that a user can 
simply "hook-up" components to form a workflow or ecological model without 
having to worry about all of the transformation details (which I am under 
the impression is the major bottleneck for ecologists).

We believe that ontological information can help us get towards automatic 
transformation.  By "attaching" semantic concepts to substructure, we have 
the opportunity of "matching" substructure for more complex 
transformation.  These ideas are briefly presented in the slides (e.g., 
see slides 41--47).  

One issue in doing this, is determining whether the ontological concepts
are compatible, which as Matt mentions, is a fairly straightforward
computation.  There are some issues of efficiency and some very
theoretical issues in terms of languages (for ontologies), decidability
(of checking compatibility), and so on.  But, for the simple cases all of
this is worked out and there are off-the-shelf components that exist.

> 
> > I haven't done enough reading on SM to know if once it "matches" two 
> > "nodes" whether it has any ability or knowledge on how to translate from 
> > one to the other (Bug 1070). Is that aspect of the problem addressed in 
> > any of the papers? It is mentioned in Bug 1070 as item #1:
> > 
> > "/1) use the SMS to locate candidate transformation steps T1..TN based 
> > on type signature and ontologies/"
> > 
> > To me this means SMS is capable of conceptually getting from T1 to TN 
> > but does it imply that there are the necessary conversion implements to 
> > get there?
> 
> Well, first of all, T1...TN were meant to indicate a series of 
> transformations needed to transform some output (e.g., of Step S1) to 
> some input (e.g., of Step S2).  So it could really be represented by one 
> transformation step, but the multiple were indicated to show that there 
> might be several distinct phases in the transformation (e.g., first 
> convert the units, then scale the values).
> 
> In terms of implementation, it seems to me that we could use any system 
> that can handle the calculation, and we need not limit ourselves to just 
> one.  The transformation step gets inserted in the workflow as just 
> another step, and so the system (in this case Ptolemy) will take care of 
> marshalling values into the right format to deliver them from step to 
> step.  So, for example, we could write a SAS step that does some 
> standard statistical transformations (such as normalizing data), and 
> some Java steps for another series of transforms, and some matlab steps 
> for matrix operations (e.g., identity transform).  Then, when a user 
> tries to link two steps, the reasoning engine can determine which of the 
>   transformations needs to be applied.
> 
> Lets refer to the conceptualized set of operations needed to get from an 
> output to an input as the "transformation plan".  This is generated by 
> the reasoning engine.  There is still a need for an "execution plan" 
> which is an exact series of steps to be executed in order to accomplish 
> the transformation plan.  Presumably there are multiple potential 
> execution plans for every transformation plan (e.g., transformation 
> steps can be implemented in multiple languages).  So choosing a 
> particular execution plan isn't trivial either, and it involves both 
> satisfying the transformation plan and optimizing for efficient execution.

I agree here with what Matt says.

> 
> > ------------------------------------------------------------------------
> > 
> > Item #2 - /"determine how to generate transformation steps automatically 
> > for simple transforms such as unit conversions"/
> > 
> > This seems straight forward, it could just be a service with a bunch of 
> > mappings from one to the other.
> > 
> > But it begs the question of once we know the mapping how do we get it 
> > mapped?
> > 
> > Meaning the service has the knowledge that T1 can be "easily" mapped to 
> > T2, but how do you get the implementation of that mapping to place where 
> > it can be done effeciently? (sort of like item #1)
> > 
> > Who does the translation of the value? (I assume the SMS module?)
> 
> This is basically what I was discussing.  Lets take the simple example 
> of unit conversions.  EML has a unit disctionary, which is easily 
> translatable into an ontology with quantified relations among the units 
> (e.g., the formula for converting between two compatible units in the 
> dictionary is known or can be derived).  The SMS reasoner would first 
> determine if two units are convertible (e.g., both are 
> VolumetricDensity), and then could write a transformation step to do the 
> conversion.  Writing the transformation step could be as simple as 
> wirting a Matlab expression for the Matlab actor in Ptolemy.  Or it 
> might be generating and compiling some custom code.  Either way, a 
> transformation step is generated and inserted into the workflow for the 
> user.

I agree with Matt here too.  I would add a few observations.  First, we
envision a library, or repository, of common transformations and knowledge
of what circumstances the transformations can be applied (much of this
should be automatic -- i.e., determining when a transformation *could* be
applied).  Note that transformation becomes interesting when the
tranformation cannot be applied in all cases, only certain situations.

Unit converion is a good example. In particular, two units may have
dimensions that could result in a functional transformation. However, just
because dimensions are "compatible" via a transformation, does not
necessarily mean the conversion is valid in all situations. For example,
we can perform a reciprocal operation to convert from hertz to seconds,
however, performing such an operation in all cases isn't necessarily
appropriate.  I think of these cases as type casting in programming
languages. Languages such as Java only allow value substitution when going
from subclasses to superclasses. If I want to go in the other direction, I
have to explicitly "cast" the value down to the subclass.  Similarly,
there may be a number of transformations that are analogous to
"downcasting," where we need the user's permission (or additional
reasoning) to perform the transformation. In general, I think these
non-straightforward transformations are more interesting transformations,
since they are probably more useful in general. (Everyone knows how to do
the simple unit conversions, however, conversions that rely on laws of
physics or dimensional analysis are probably not as obvious and take time
for a scientist to figure out, making transformation a time consuming
process.)

We give a very simple example of this type of conversion on slides 
17--20. Note that the example is for considering types that allow null 
values, not units.

> 
> > ------------------------------------------------------------------------
> > 
> > Item #3 - /"create a simple GUI for creating transformation steps that 
> > map between two existing steps"/
> > 
> > The idea here is that a user can provide a certain level of "missing" 
> > knowledge that T1 can be converted to T2 which can be converted T3. Well 
> > first, it seems that if we have a bunch of mappings that a 
> > lookup-algorithm could just as easy do a bunch of lookups to get from T1 
> > to T3. So to me it seems that this is really a tool for taking some new 
> > specialized "value" in some unknown domain and getting it converted to a 
> > known "domain" so the automatic mapping can take place. Does this sound 
> > correct?
> 
> Sounds right.

The verbage in the item is a bit ambiguous -- I am assuming "existing
step" means, e.g., a ptolemy actor or some computation in an ecological
model. Anwyay, I don't have a good feeling for whether a simple lookup
algorithm is all that is needed. In general, it seems like it could be 
more complex. I am not really sure...

> 
> > If that is true or if that isn't the intent of item #3, certainly what I 
> > have described needs happen.
> > 
> > So assuming there is a domain of values that currently doesn't have a 
> > conversion to a known domain, how do we get that implementation into the 
> > system? Who would provide the implemention? Maybe the GUI tool 
> > referenced above enables the user to describe "how" the value could be 
> > converted into a known domain value. The tool is then capable of 
> > generating the implemention, compiling it and registering it. Hmmmm, I 
> > can how this can be done easily for a scripting language or Java, but C 
> > or C++ would be more problematic as a general XP solution
> 
> Many transformations will be a combination of casting, simple 
> conversions (such as unit conversions), and schema rearrangement 
> (database operations).  I am hoping that the user won't have to write 
> too many transform steps by hand.  We should talk about this further.

I really think we want to use a repository of existing, known,
conversions.  What you describe sounds very ad-hoc, either the desired
conversion is not available, or the user has bypasssed the system to
create their own conversion.  An interesting question that you bring up is
what should happen when the system does not have the ability to perform a
conversion, but the user knows the items can be converted.  How should
this case be handled? A scripting language (such as prolog ;-) would
handle what you describe above better than compilation and all of that.

> 
> > ------------------------------------------------------------------------
> > Item #4 - /"determine the pros and cons of having transformation steps 
> > be directly associated with links (e.g., a link property) rather than 
> > simply introducing new transform steps that do the same tasks directly 
> > into the pipeline"/
> > 
> > I don't understand what is meant by "links"
> 
> Links are the edges in the workflow graph.  In terms of modeling the 
> workflows, one could consider the link (edge) as a real object that can 
> "do" computation itself -- ie, a link could be a step.  Alternatively, 
> the new transformation calculations can be inserted in the graph as new 
> steps.  I think it is more or less a UI issue, but there may be a some 
> reasoning implications of doing it one way or another.  I prefer the 
> latter.  Jenny Wang preferred the former, or at least she did a year ago 
> at the San Diego meeting.  Here's an illustration of the two:
> 
>                                     T1         T2
> Transformations are links:    S1 ------> S2 ------> S3
> 
> Transformations are steps:    S1 --> T1 --> S2 --> T2 --> S3

I actually prefer transformations as explicit steps, at least for
ecologists/scientists, since they deal with these transformations all of
the time (so it wouldn't scare people off to have them represented
explicitly).  The only caveat would be for the totally trivial
transformations (like converting feet to meters), it might be a bit
overkill to treat them as separate steps.  Instead, these could just be
built into Ptolemy directly as special services that occur behind the
scene.  A separate issue in the debate, is if we consider transformations
as links, how can a user "debug" a misfunctioning workflow.

> 
> > ------------------------------------------------------------------------
> > I think there are some interesting requirements of the translation 
> > implementation:
> > a) Node domain mapper module
> > b) Tool to provide "new domain" to "exsiting domain" mappings AND 
> > implementations
> > c) Cross platform
> > d) Fairly effecient at runtime.
> > e) Dynamically extensibly (see item b)
> > 
> > Although we always hate to let the implementation cloud our thinking 
> > about design, the translation system may be bettered served by selecting 
> > an implementation language up front and it seems that a scripting 
> > language may not be best.
> 
> I think we should start with one, but not limit ourselves to one.  We 
> already have a couple available in Ptolemy (the Ptolemy expression 
> language, and Matlab expressions).  We can also write new actors in 
> Ptolemy that support expression languages or more complex code.  Hey, we 
> can even have an actor that dynamically writes , compiles, and executes 
> Java or C code if we want (security implications notwitstanding).
> 
> > I could envision a translation system that was implemented in Java where 
> > all the mappings were individual classes. Certainly there could be a 
> > common interface and/or even XMLSchema to describe a mapping class. Java 
> > would also enable us to use introspection of any given mapping class to 
> > determine what it does and how to register it.  It would be platform 
> > independent and dynamically scalable.
> > 

Java is a nice language. But I don't really understand what you mean in 
terms of introspection.  Java provides a reflection capability, e.g., I 
can see what class an object is an instance of, and what methods an 
object supports, but reflection doesn't tell me what the object computes. 
I think that it is a semantic issue and brings up a very good 
point.   How do we capture "what" a transformation does, to automatically 
apply the transformation. Is this information buried in the "SMS Module" 
or is there a declarative and extensible way to describe this information.  
In many ways, this is what we want to capture in semantic types. 

I would be interested in hearing more of your ideas / intuitions about the 
problem in general.  Thanks for starting the thread!

Shawn

> 
> Sure.
> 
> > 
> > So anyway, I hope these thoughts are helpful.
> > Rod
> > 
> 
> 
> They certainly were.  I think you and I are similar in that we want to 
> build a functional implementation.  So far, the SMS work has been 
> focused on fairly theoretical issues.  Grounding it in implementation 
> now I think is very appropriate :-0
> 
> Matt
>