[seek-dev] Re: [Fwd: [seek-kr-sms] Taxon/KR integration prototype proposal]

Ferdinando Villa ferdinando.villa at uvm.edu
Wed Apr 21 05:52:04 PDT 2004

Interesting indeed.

Here's a third:

(c) some scientific workflow whose meaning is only that of a workflow (a
connected series of processing steps) and becomes a process model, a
statistical analysis, a food web, etc. when you add the data - i.e.
data-defined more than data-driven (data give it its complete "identity"
by propagating temporal, spatial, ecological semantic contexts through
the workflow)....

note that only after you "color" the scientific workflow with the data,
thus giving it a complete semantic characterization, the workflow is
capable of being a legitimate part of a larger-scale workflow... because
from then on it's a "thing" that relates to the domain, and not before,
and it's capable of defining another larger workflow. Also note that the
transformation strategy (= ordered set of transformations) that makes
the processing workable depends potentially on the semantics of both the
data and the workflow steps - a difficulty that over the years has led
me to want to erase the conceptual distinction between data and
"processing steps" and to adopt the boring idea of a "module" instead.

Italian food for thought! ciao f

On Tue, 2004-04-20 at 21:55, Bertram Ludaescher wrote:
> Interesting discussioN!
> Btw: I think of two very different things when I hear data-driven:
> (a) what you guys say below: I have some data, let's see what I can do 
> with it, and 
> (b) some scientific workflow that is data-driven (optionally in a very
> technical sense as in Ptolemy/Kepler)
> Bertram
> >>>>> "SB" == Shawn Bowers <bowers at sdsc.edu> writes:
> SB> 
> SB> Comments on one of your comments :-)
> SB> 
> SB> (Also, I CC'd seek-dev in case anyone else is interested in the thread)
> SB> 
> SB> Deana Pennington wrote:
> SB> 
> >>> - In the scenario, interestingly, the researcher first searches for 
> >>> appropriate workflows and once found, searches for the data.  It seems 
> >>> like it could go either way: the researcher may have found roughly the 
> >>> data to support their hypothesis, and then wants to find the right 
> >>> workflow/analysis to use on the data.  The latter seems like it has 
> >>> more potential need for data integration/transformation in that as a 
> >>> researcher looking for data, you wouldn't be restricted by everything 
> >>> being "uniform" just so you could plug it into the right model (of 
> >>> course, you wouldn't necessarily be limited in this way by finding 
> >>> analyses first, but I think it becomes a much harder problem).  
> >>> Instead, you would be looking for "good" data, regardless of whether 
> >>> it is nicely formatted (which seems to be true for the mammal case -- 
> >>> I believe that is the motivation for using IPCC data).
> >> 
> >> 
> >> Yes, it could go either way.  However, I think for most scientists, they 
> >> think of the problem first, then look for data.  The order is more 
> >> likely to be, "I want to compare NPP at grassland sites around the 
> >> world, and there are 4 different ways I could calculate NPP, and each of 
> >> those ways requires different types of data".  The "4 different ways" 
> >> would be expressed as analytical workflows.  It is possible, thought, 
> >> that after framing the question "I want to compare NPP", then they would 
> >> decide to look and see what data are available before thinking about the 
> >> appropriate analysis.  In fact, the whole idea of data-driven analyses 
> >> is a new one in ecology (and science in general), and there are whole 
> >> groups of people who think it is a completely wrong approach.
> >> 
> SB> 
> SB> I take back my original statement that the problem is harder in one 
> SB> direction than in the other. Basically, our problem is to match datasets 
> SB> with services.  There is a set of constraints we want the datasets to 
> SB> satisfy Dq (the query) and a set of implied constraints in the datasets 
> SB> found Dc (e.g., structural and semantic constraints) by Dq.  Similarly, 
> SB> there is a set of constraints we want the services to satisfy Sq (e.g., 
> SB> that the services/workflow computes NPP), and a set of implied 
> SB> constraints in the services found Ds (structural and semantic 
> SB> constraints on inputs, e.g.) by Dq. So generally, regardless of whether 
> SB> we search for datasets first or services first, our goal is to figure 
> SB> out a way to transform and group the datasets to make the implied 
> SB> constraints on the datasets fit with the implied constraints on the 
> SB> services.  The problem changes (which is what my original point was 
> SB> trying to say) if we assume that the datasets we look for *must* match 
> SB> (without any transformation) the service constraints, which is the 
> SB> current motivation for choosing the IPCC data (which isn't necessarily a 
> SB> bad thing, it just isn't general, which we already know).
> SB> 
> SB> Also, for data-driven analysis, what is the argument as to why people 
> SB> say it is the wrong approach?  Aren't there other "scientific" fields, 
> SB> such as medicine or psychology (I consider these scientific, but that 
> SB> probably isn't the general classification), that are very much "data 
> SB> driven" in this way? I am just curious, and am interested in hearing 
> SB> your opinion ...
> SB> 
> SB> Shawn
> SB> 
> SB> 
> SB> 
> SB> _______________________________________________
> SB> seek-dev mailing list
> SB> seek-dev at ecoinformatics.org
> SB> http://www.ecoinformatics.org/mailman/listinfo/seek-dev
> _______________________________________________
> seek-dev mailing list
> seek-dev at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/seek-dev

More information about the Seek-dev mailing list