[seek-dev] Re: [Fwd: [seek-kr-sms] Taxon/KR integration prototype proposal]
Bertram Ludaescher
ludaesch at sdsc.edu
Tue Apr 20 18:55:08 PDT 2004
Interesting discussioN!
Btw: I think of two very different things when I hear data-driven:
(a) what you guys say below: I have some data, let's see what I can do
with it, and
(b) some scientific workflow that is data-driven (optionally in a very
technical sense as in Ptolemy/Kepler)
Bertram
>>>>> "SB" == Shawn Bowers <bowers at sdsc.edu> writes:
SB>
SB> Comments on one of your comments :-)
SB>
SB> (Also, I CC'd seek-dev in case anyone else is interested in the thread)
SB>
SB> Deana Pennington wrote:
SB>
>>> - In the scenario, interestingly, the researcher first searches for
>>> appropriate workflows and once found, searches for the data. It seems
>>> like it could go either way: the researcher may have found roughly the
>>> data to support their hypothesis, and then wants to find the right
>>> workflow/analysis to use on the data. The latter seems like it has
>>> more potential need for data integration/transformation in that as a
>>> researcher looking for data, you wouldn't be restricted by everything
>>> being "uniform" just so you could plug it into the right model (of
>>> course, you wouldn't necessarily be limited in this way by finding
>>> analyses first, but I think it becomes a much harder problem).
>>> Instead, you would be looking for "good" data, regardless of whether
>>> it is nicely formatted (which seems to be true for the mammal case --
>>> I believe that is the motivation for using IPCC data).
>>
>>
>> Yes, it could go either way. However, I think for most scientists, they
>> think of the problem first, then look for data. The order is more
>> likely to be, "I want to compare NPP at grassland sites around the
>> world, and there are 4 different ways I could calculate NPP, and each of
>> those ways requires different types of data". The "4 different ways"
>> would be expressed as analytical workflows. It is possible, thought,
>> that after framing the question "I want to compare NPP", then they would
>> decide to look and see what data are available before thinking about the
>> appropriate analysis. In fact, the whole idea of data-driven analyses
>> is a new one in ecology (and science in general), and there are whole
>> groups of people who think it is a completely wrong approach.
>>
SB>
SB> I take back my original statement that the problem is harder in one
SB> direction than in the other. Basically, our problem is to match datasets
SB> with services. There is a set of constraints we want the datasets to
SB> satisfy Dq (the query) and a set of implied constraints in the datasets
SB> found Dc (e.g., structural and semantic constraints) by Dq. Similarly,
SB> there is a set of constraints we want the services to satisfy Sq (e.g.,
SB> that the services/workflow computes NPP), and a set of implied
SB> constraints in the services found Ds (structural and semantic
SB> constraints on inputs, e.g.) by Dq. So generally, regardless of whether
SB> we search for datasets first or services first, our goal is to figure
SB> out a way to transform and group the datasets to make the implied
SB> constraints on the datasets fit with the implied constraints on the
SB> services. The problem changes (which is what my original point was
SB> trying to say) if we assume that the datasets we look for *must* match
SB> (without any transformation) the service constraints, which is the
SB> current motivation for choosing the IPCC data (which isn't necessarily a
SB> bad thing, it just isn't general, which we already know).
SB>
SB> Also, for data-driven analysis, what is the argument as to why people
SB> say it is the wrong approach? Aren't there other "scientific" fields,
SB> such as medicine or psychology (I consider these scientific, but that
SB> probably isn't the general classification), that are very much "data
SB> driven" in this way? I am just curious, and am interested in hearing
SB> your opinion ...
SB>
SB> Shawn
SB>
SB>
SB>
SB> _______________________________________________
SB> seek-dev mailing list
SB> seek-dev at ecoinformatics.org
SB> http://www.ecoinformatics.org/mailman/listinfo/seek-dev
More information about the Seek-dev
mailing list