[seek-kr-sms] Re: Transformation Steps (Bug 1070)

Thu Nov 6 09:05:59 PST 2003

Hey Rod,

You raise some interesting points about transformation.  We really 
haven't talked through the implementation strategies very well.  I agree 
that implementation can impose some major constraints on design :)  So 
its about time that we considered it.  I forwarded this to seek-kr-sms 
so that Bertram and Shawn and Rich could benefit from the conversation 
as well.  The bug describing the need for a transformation system 
(http://bugzilla.ecoinformatics.org/show_bug.cgi?id=1070) is really just 
a placeholder for what we need -- a thorough design proposal for a 
transformation system.  My assumption is that Bertram, Shawn, and the 
SMS group are mainly responsible for that design and implementation, so 
I have reassigned the bug to Shawn.

My comments inline...

Rod Spears wrote:
> Matt,
> 
> I have been doing a lot of thinking (and a little reading) about SM. It 
> appears there is all this research and theories on how to match up 
> certain "nodes" between one or more ontologies. And there are these 
> special algorithms for doing this kind of "matching". Which I am 
> assuming is a quick summary of Bertram's research/field of study. There 
> also seems to be some software systems already working that does this. 
> Is this correct? Has Bertram written software that works? Or is it still 
> being developed?

There is a lot of software that does reasoning.  Much of it is 
proprietary.  I don't think Bertram has written these engines, but I 
could easily be wrong.  Currently he seems to prefer systems like Prolog 
for developing prototypes -- I'm not sure if this will scale to our 
application, but we'll see.

> I haven't done enough reading on SM to know if once it "matches" two 
> "nodes" whether it has any ability or knowledge on how to translate from 
> one to the other (Bug 1070). Is that aspect of the problem addressed in 
> any of the papers? It is mentioned in Bug 1070 as item #1:
> 
> "/1) use the SMS to locate candidate transformation steps T1..TN based 
> on type signature and ontologies/"
> 
> To me this means SMS is capable of conceptually getting from T1 to TN 
> but does it imply that there are the necessary conversion implements to 
> get there?

Well, first of all, T1...TN were meant to indicate a series of 
transformations needed to transform some output (e.g., of Step S1) to 
some input (e.g., of Step S2).  So it could really be represented by one 
transformation step, but the multiple were indicated to show that there 
might be several distinct phases in the transformation (e.g., first 
convert the units, then scale the values).

In terms of implementation, it seems to me that we could use any system 
that can handle the calculation, and we need not limit ourselves to just 
one.  The transformation step gets inserted in the workflow as just 
another step, and so the system (in this case Ptolemy) will take care of 
marshalling values into the right format to deliver them from step to 
step.  So, for example, we could write a SAS step that does some 
standard statistical transformations (such as normalizing data), and 
some Java steps for another series of transforms, and some matlab steps 
for matrix operations (e.g., identity transform).  Then, when a user 
tries to link two steps, the reasoning engine can determine which of the 
  transformations needs to be applied.

Lets refer to the conceptualized set of operations needed to get from an 
output to an input as the "transformation plan".  This is generated by 
the reasoning engine.  There is still a need for an "execution plan" 
which is an exact series of steps to be executed in order to accomplish 
the transformation plan.  Presumably there are multiple potential 
execution plans for every transformation plan (e.g., transformation 
steps can be implemented in multiple languages).  So choosing a 
particular execution plan isn't trivial either, and it involves both 
satisfying the transformation plan and optimizing for efficient execution.

> ------------------------------------------------------------------------
> 
> Item #2 - /"determine how to generate transformation steps automatically 
> for simple transforms such as unit conversions"/
> 
> This seems straight forward, it could just be a service with a bunch of 
> mappings from one to the other.
> 
> But it begs the question of once we know the mapping how do we get it 
> mapped?
> 
> Meaning the service has the knowledge that T1 can be "easily" mapped to 
> T2, but how do you get the implementation of that mapping to place where 
> it can be done effeciently? (sort of like item #1)
> 
> Who does the translation of the value? (I assume the SMS module?)

This is basically what I was discussing.  Lets take the simple example 
of unit conversions.  EML has a unit disctionary, which is easily 
translatable into an ontology with quantified relations among the units 
(e.g., the formula for converting between two compatible units in the 
dictionary is known or can be derived).  The SMS reasoner would first 
determine if two units are convertible (e.g., both are 
VolumetricDensity), and then could write a transformation step to do the 
conversion.  Writing the transformation step could be as simple as 
wirting a Matlab expression for the Matlab actor in Ptolemy.  Or it 
might be generating and compiling some custom code.  Either way, a 
transformation step is generated and inserted into the workflow for the 
user.

> ------------------------------------------------------------------------
> 
> Item #3 - /"create a simple GUI for creating transformation steps that 
> map between two existing steps"/
> 
> The idea here is that a user can provide a certain level of "missing" 
> knowledge that T1 can be converted to T2 which can be converted T3. Well 
> first, it seems that if we have a bunch of mappings that a 
> lookup-algorithm could just as easy do a bunch of lookups to get from T1 
> to T3. So to me it seems that this is really a tool for taking some new 
> specialized "value" in some unknown domain and getting it converted to a 
> known "domain" so the automatic mapping can take place. Does this sound 
> correct?

Sounds right.

> If that is true or if that isn't the intent of item #3, certainly what I 
> have described needs happen.
> 
> So assuming there is a domain of values that currently doesn't have a 
> conversion to a known domain, how do we get that implementation into the 
> system? Who would provide the implemention? Maybe the GUI tool 
> referenced above enables the user to describe "how" the value could be 
> converted into a known domain value. The tool is then capable of 
> generating the implemention, compiling it and registering it. Hmmmm, I 
> can how this can be done easily for a scripting language or Java, but C 
> or C++ would be more problematic as a general XP solution

Many transformations will be a combination of casting, simple 
conversions (such as unit conversions), and schema rearrangement 
(database operations).  I am hoping that the user won't have to write 
too many transform steps by hand.  We should talk about this further.

> ------------------------------------------------------------------------
> Item #4 - /"determine the pros and cons of having transformation steps 
> be directly associated with links (e.g., a link property) rather than 
> simply introducing new transform steps that do the same tasks directly 
> into the pipeline"/
> 
> I don't understand what is meant by "links"

Links are the edges in the workflow graph.  In terms of modeling the 
workflows, one could consider the link (edge) as a real object that can 
"do" computation itself -- ie, a link could be a step.  Alternatively, 
the new transformation calculations can be inserted in the graph as new 
steps.  I think it is more or less a UI issue, but there may be a some 
reasoning implications of doing it one way or another.  I prefer the 
latter.  Jenny Wang preferred the former, or at least she did a year ago 
at the San Diego meeting.  Here's an illustration of the two:

                                    T1         T2
Transformations are links:    S1 ------> S2 ------> S3

Transformations are steps:    S1 --> T1 --> S2 --> T2 --> S3

> ------------------------------------------------------------------------
> I think there are some interesting requirements of the translation 
> implementation:
> a) Node domain mapper module
> b) Tool to provide "new domain" to "exsiting domain" mappings AND 
> implementations
> c) Cross platform
> d) Fairly effecient at runtime.
> e) Dynamically extensibly (see item b)
> 
> Although we always hate to let the implementation cloud our thinking 
> about design, the translation system may be bettered served by selecting 
> an implementation language up front and it seems that a scripting 
> language may not be best.

I think we should start with one, but not limit ourselves to one.  We 
already have a couple available in Ptolemy (the Ptolemy expression 
language, and Matlab expressions).  We can also write new actors in 
Ptolemy that support expression languages or more complex code.  Hey, we 
can even have an actor that dynamically writes , compiles, and executes 
Java or C code if we want (security implications notwitstanding).

> I could envision a translation system that was implemented in Java where 
> all the mappings were individual classes. Certainly there could be a 
> common interface and/or even XMLSchema to describe a mapping class. Java 
> would also enable us to use introspection of any given mapping class to 
> determine what it does and how to register it.  It would be platform 
> independent and dynamically scalable.
> 

Sure.

> 
> So anyway, I hope these thoughts are helpful.
> Rod
> 

They certainly were.  I think you and I are similar in that we want to 
build a functional implementation.  So far, the SMS work has been 
focused on fairly theoretical issues.  Grounding it in implementation 
now I think is very appropriate :-0

Matt
-- 
-------------------------------------------------------------------
Matt Jones                                     jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Fax: 425-920-2439    Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
University of California Santa Barbara
Interested in ecological informatics? http://www.ecoinformatics.org
-------------------------------------------------------------------