[seek-kr-sms] Re: Transformation Steps (Bug 1070)
Matt Jones
jones at nceas.ucsb.edu
Thu Nov 6 09:05:59 PST 2003
Hey Rod,
You raise some interesting points about transformation. We really
haven't talked through the implementation strategies very well. I agree
that implementation can impose some major constraints on design :) So
its about time that we considered it. I forwarded this to seek-kr-sms
so that Bertram and Shawn and Rich could benefit from the conversation
as well. The bug describing the need for a transformation system
(http://bugzilla.ecoinformatics.org/show_bug.cgi?id=1070) is really just
a placeholder for what we need -- a thorough design proposal for a
transformation system. My assumption is that Bertram, Shawn, and the
SMS group are mainly responsible for that design and implementation, so
I have reassigned the bug to Shawn.
My comments inline...
Rod Spears wrote:
> Matt,
> I have been doing a lot of thinking (and a little reading) about SM. It
> appears there is all this research and theories on how to match up
> certain "nodes" between one or more ontologies. And there are these
> special algorithms for doing this kind of "matching". Which I am
> assuming is a quick summary of Bertram's research/field of study. There
> also seems to be some software systems already working that does this.
> Is this correct? Has Bertram written software that works? Or is it still
> being developed?
There is a lot of software that does reasoning. Much of it is
proprietary. I don't think Bertram has written these engines, but I
could easily be wrong. Currently he seems to prefer systems like Prolog
for developing prototypes -- I'm not sure if this will scale to our
application, but we'll see.
> I haven't done enough reading on SM to know if once it "matches" two
> "nodes" whether it has any ability or knowledge on how to translate from
> one to the other (Bug 1070). Is that aspect of the problem addressed in
> any of the papers? It is mentioned in Bug 1070 as item #1:
> "/1) use the SMS to locate candidate transformation steps T1..TN based
> on type signature and ontologies/"
> To me this means SMS is capable of conceptually getting from T1 to TN
> but does it imply that there are the necessary conversion implements to
> get there?
Well, first of all, T1...TN were meant to indicate a series of
transformations needed to transform some output (e.g., of Step S1) to
some input (e.g., of Step S2). So it could really be represented by one
transformation step, but the multiple were indicated to show that there
might be several distinct phases in the transformation (e.g., first
convert the units, then scale the values).
In terms of implementation, it seems to me that we could use any system
that can handle the calculation, and we need not limit ourselves to just
one. The transformation step gets inserted in the workflow as just
another step, and so the system (in this case Ptolemy) will take care of
marshalling values into the right format to deliver them from step to
step. So, for example, we could write a SAS step that does some
standard statistical transformations (such as normalizing data), and
some Java steps for another series of transforms, and some matlab steps
for matrix operations (e.g., identity transform). Then, when a user
tries to link two steps, the reasoning engine can determine which of the
transformations needs to be applied.
Lets refer to the conceptualized set of operations needed to get from an
output to an input as the "transformation plan". This is generated by
the reasoning engine. There is still a need for an "execution plan"
which is an exact series of steps to be executed in order to accomplish
the transformation plan. Presumably there are multiple potential
execution plans for every transformation plan (e.g., transformation
steps can be implemented in multiple languages). So choosing a
particular execution plan isn't trivial either, and it involves both
satisfying the transformation plan and optimizing for efficient execution.
> ------------------------------------------------------------------------
> Item #2 - /"determine how to generate transformation steps automatically
> for simple transforms such as unit conversions"/
> This seems straight forward, it could just be a service with a bunch of
> mappings from one to the other.
> But it begs the question of once we know the mapping how do we get it
> mapped?
> Meaning the service has the knowledge that T1 can be "easily" mapped to
> T2, but how do you get the implementation of that mapping to place where
> it can be done effeciently? (sort of like item #1)
> Who does the translation of the value? (I assume the SMS module?)
This is basically what I was discussing. Lets take the simple example
of unit conversions. EML has a unit disctionary, which is easily
translatable into an ontology with quantified relations among the units
(e.g., the formula for converting between two compatible units in the
dictionary is known or can be derived). The SMS reasoner would first
determine if two units are convertible (e.g., both are
VolumetricDensity), and then could write a transformation step to do the
conversion. Writing the transformation step could be as simple as
wirting a Matlab expression for the Matlab actor in Ptolemy. Or it
might be generating and compiling some custom code. Either way, a
transformation step is generated and inserted into the workflow for the
> ------------------------------------------------------------------------
> Item #3 - /"create a simple GUI for creating transformation steps that
> map between two existing steps"/
> The idea here is that a user can provide a certain level of "missing"
> knowledge that T1 can be converted to T2 which can be converted T3. Well
> first, it seems that if we have a bunch of mappings that a
> lookup-algorithm could just as easy do a bunch of lookups to get from T1
> to T3. So to me it seems that this is really a tool for taking some new
> specialized "value" in some unknown domain and getting it converted to a
> known "domain" so the automatic mapping can take place. Does this sound
> correct?
Sounds right.
> If that is true or if that isn't the intent of item #3, certainly what I
> have described needs happen.
> So assuming there is a domain of values that currently doesn't have a
> conversion to a known domain, how do we get that implementation into the
> system? Who would provide the implemention? Maybe the GUI tool
> referenced above enables the user to describe "how" the value could be
> converted into a known domain value. The tool is then capable of
> generating the implemention, compiling it and registering it. Hmmmm, I
> can how this can be done easily for a scripting language or Java, but C
> or C++ would be more problematic as a general XP solution
Many transformations will be a combination of casting, simple
conversions (such as unit conversions), and schema rearrangement
(database operations). I am hoping that the user won't have to write
too many transform steps by hand. We should talk about this further.
> ------------------------------------------------------------------------
> Item #4 - /"determine the pros and cons of having transformation steps
> be directly associated with links (e.g., a link property) rather than
> simply introducing new transform steps that do the same tasks directly
> into the pipeline"/
> I don't understand what is meant by "links"
Links are the edges in the workflow graph. In terms of modeling the
workflows, one could consider the link (edge) as a real object that can
"do" computation itself -- ie, a link could be a step. Alternatively,
the new transformation calculations can be inserted in the graph as new
steps. I think it is more or less a UI issue, but there may be a some
reasoning implications of doing it one way or another. I prefer the
latter. Jenny Wang preferred the former, or at least she did a year ago
at the San Diego meeting. Here's an illustration of the two:
T1 T2
Transformations are links: S1 ------> S2 ------> S3
Transformations are steps: S1 --> T1 --> S2 --> T2 --> S3
> ------------------------------------------------------------------------
> I think there are some interesting requirements of the translation
> implementation:
> a) Node domain mapper module
> b) Tool to provide "new domain" to "exsiting domain" mappings AND
> implementations
> c) Cross platform
> d) Fairly effecient at runtime.
> e) Dynamically extensibly (see item b)
> Although we always hate to let the implementation cloud our thinking
> about design, the translation system may be bettered served by selecting
> an implementation language up front and it seems that a scripting
> language may not be best.
I think we should start with one, but not limit ourselves to one. We
already have a couple available in Ptolemy (the Ptolemy expression
language, and Matlab expressions). We can also write new actors in
Ptolemy that support expression languages or more complex code. Hey, we
can even have an actor that dynamically writes , compiles, and executes
Java or C code if we want (security implications notwitstanding).
> I could envision a translation system that was implemented in Java where
> all the mappings were individual classes. Certainly there could be a
> common interface and/or even XMLSchema to describe a mapping class. Java
> would also enable us to use introspection of any given mapping class to
> determine what it does and how to register it. It would be platform
> independent and dynamically scalable.
> So anyway, I hope these thoughts are helpful.
> Rod
They certainly were. I think you and I are similar in that we want to
build a functional implementation. So far, the SMS work has been
focused on fairly theoretical issues. Grounding it in implementation
now I think is very appropriate :-0
Matt Jones jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/ Fax: 425-920-2439 Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
University of California Santa Barbara
Interested in ecological informatics? http://www.ecoinformatics.org
More information about the Seek-kr-sms
mailing list