[seek-dev] Re: [Fwd: [seek-kr-sms] Taxon/KR integration prototype proposal]

Deana Pennington dpennington at lternet.edu
Tue Apr 27 14:16:25 PDT 2004

I forgot to mention, Bertram's (b) is different than what we are talking 
about.  In that case, the scientist will have already figured out what 
analysis they want to do, but the exact form that the workflow takes 
will depend on what transformations have to occur, which is data driven.

Bertram Ludaescher wrote:

>Interesting discussioN!
>Btw: I think of two very different things when I hear data-driven:
>(a) what you guys say below: I have some data, let's see what I can do 
>with it, and 
>(b) some scientific workflow that is data-driven (optionally in a very
>technical sense as in Ptolemy/Kepler)
>>>>>>"SB" == Shawn Bowers <bowers at sdsc.edu> writes:
>SB> Comments on one of your comments :-)
>SB> (Also, I CC'd seek-dev in case anyone else is interested in the thread)
>SB> Deana Pennington wrote:
>>>>- In the scenario, interestingly, the researcher first searches for 
>>>>appropriate workflows and once found, searches for the data.  It seems 
>>>>like it could go either way: the researcher may have found roughly the 
>>>>data to support their hypothesis, and then wants to find the right 
>>>>workflow/analysis to use on the data.  The latter seems like it has 
>>>>more potential need for data integration/transformation in that as a 
>>>>researcher looking for data, you wouldn't be restricted by everything 
>>>>being "uniform" just so you could plug it into the right model (of 
>>>>course, you wouldn't necessarily be limited in this way by finding 
>>>>analyses first, but I think it becomes a much harder problem).  
>>>>Instead, you would be looking for "good" data, regardless of whether 
>>>>it is nicely formatted (which seems to be true for the mammal case -- 
>>>>I believe that is the motivation for using IPCC data).
>>>Yes, it could go either way.  However, I think for most scientists, they 
>>>think of the problem first, then look for data.  The order is more 
>>>likely to be, "I want to compare NPP at grassland sites around the 
>>>world, and there are 4 different ways I could calculate NPP, and each of 
>>>those ways requires different types of data".  The "4 different ways" 
>>>would be expressed as analytical workflows.  It is possible, thought, 
>>>that after framing the question "I want to compare NPP", then they would 
>>>decide to look and see what data are available before thinking about the 
>>>appropriate analysis.  In fact, the whole idea of data-driven analyses 
>>>is a new one in ecology (and science in general), and there are whole 
>>>groups of people who think it is a completely wrong approach.
>SB> I take back my original statement that the problem is harder in one 
>SB> direction than in the other. Basically, our problem is to match datasets 
>SB> with services.  There is a set of constraints we want the datasets to 
>SB> satisfy Dq (the query) and a set of implied constraints in the datasets 
>SB> found Dc (e.g., structural and semantic constraints) by Dq.  Similarly, 
>SB> there is a set of constraints we want the services to satisfy Sq (e.g., 
>SB> that the services/workflow computes NPP), and a set of implied 
>SB> constraints in the services found Ds (structural and semantic 
>SB> constraints on inputs, e.g.) by Dq. So generally, regardless of whether 
>SB> we search for datasets first or services first, our goal is to figure 
>SB> out a way to transform and group the datasets to make the implied 
>SB> constraints on the datasets fit with the implied constraints on the 
>SB> services.  The problem changes (which is what my original point was 
>SB> trying to say) if we assume that the datasets we look for *must* match 
>SB> (without any transformation) the service constraints, which is the 
>SB> current motivation for choosing the IPCC data (which isn't necessarily a 
>SB> bad thing, it just isn't general, which we already know).
>SB> Also, for data-driven analysis, what is the argument as to why people 
>SB> say it is the wrong approach?  Aren't there other "scientific" fields, 
>SB> such as medicine or psychology (I consider these scientific, but that 
>SB> probably isn't the general classification), that are very much "data 
>SB> driven" in this way? I am just curious, and am interested in hearing 
>SB> your opinion ...
>SB> Shawn
>SB> _______________________________________________
>SB> seek-dev mailing list
>SB> seek-dev at ecoinformatics.org
>SB> http://www.ecoinformatics.org/mailman/listinfo/seek-dev
>seek-dev mailing list
>seek-dev at ecoinformatics.org


Deana D. Pennington, PhD
Long-term Ecological Research Network Office

UNM Biology Department
MSC03  2020
1 University of New Mexico
Albuquerque, NM  87131-0001

505-272-7288 (office)
505 272-7080 (fax)

More information about the Seek-dev mailing list