[seek-dev] Re: [Fwd: [seek-kr-sms] Taxon/KR integration prototype proposal]
Bertram Ludaescher
ludaesch at sdsc.edu
Wed Apr 28 09:37:13 PDT 2004
Deana:
Your examples and vision are very interesting! Hopefully we get a
chance to talk about those in Edinburgh (I'm sure we are ;-)
cheers
Bertram
>>>>> "DP" == Deana Pennington <dpennington at lternet.edu> writes:
DP>
DP> Seems like if we have ontologies that represent what is known, and
DP> explicitly link analyses and data to those ontologies, that we could
DP> represent a hypothesis in terms of the formalized ontology, then drive
DP> the data and analysis discovery from that. This seems like a very
DP> intuitive approach to me, and doesn't seem like it would be that
DP> difficult to do, but then, I'm not the one that has to make it happen :
DP> ) I'm thinking that, for example, if I generated a hypothesis that
DP> linked sinkhole occurrence with some environmental variables, and
DP> formalized that hypothesis in terms of an existing geologic ontology,
DP> that the geologic ontology would link with statistical and measurement
DP> ontologies from which it could be reasoned that a logistic model of
DP> occurrence data (dependent var) and environmental layers (independent)
DP> would be relevant. The geologic ontology would link with a stat
DP> ontology, which would link with an ecological ontology, where the garp
DP> model could be found, then used in an entirely different domain than the
DP> one for which it was constructed.
DP>
DP> We talked about this some in our breakout group in Santa Barbara, but
DP> ontologically-contructed hypothesis generation is something I have been
DP> thinking quite a bit about. In fact, when I was at NSF in January, I was
DP> asked what one thing I would recommend to enable synthetic research, and
DP> I responded that I would force would-be multidisciplinary teams to
DP> construct ontologies around all of the concepts relevant to their
DP> question of interest then explicitly show how their proposed research
DP> was going to extend/clarify the ontologies. I have also suggested to
DP> Bob Waide that we go for funding to hold some working meetings with some
DP> of the LTER groups who are trying to come up with proposals for
DP> synthetic research, to do exactly that. I haven't heard back from him,
DP> but I think this approach might be an opportunity to do some very
DP> interesting, new research within the kr/sms group.
DP>
DP> Deana
DP>
DP>
DP> Shawn Bowers wrote:
DP>
>>
>>
>> Shawn Bowers wrote:
>>
>>>
>>> (Note that I moved this thread to kr-sms ... and off of seek-dev)
>>>
>>> > Actually, this seems to me to be a fundamental difference in the way
>>> > CIS/IM and domain scientists approach problems.
>>>
>>> Even in computer science, research follows the model you describe
>>> directly below. In addition, I believe that most fields in CS are not
>>> data-driven -- even in database fields (we don't care about the data,
>>> we care about the algorithms and their generality). There are CS
>>> fields that are more closely related to branches of psychology (e.g.,
>>> human-computer interaction, and natural language processing) that are
>>> exceptions. Typically the hypothesis testing in these more
>>> "touchy-feely" CS fields strongly depends on the experimental data,
>>> and use standard techniques to evaluate their hypothesis ... These
>>> may use available data, or they may require designing new experiments
>>> to get data. I would consider these both data-driven. And, these
>>> are definitely useful endeavors -- e.g., in the field of medicine.
>>
>>
>> I meant to say, that medicine is another field that I would say is
>> "data driven" in this way.
>>
>>>
>>> The way you characterize data-driven below reminds me of data mining
>>> -- you have some data and you try to find patterns in the data. I
>>> definitely don't advocate this approach in SEEK ... and this isn't
>>> really what I was suggesting before.
>>>
>>> So, I think we agree that pure tool-driven (not sure of an example)
>>> and data-driven approaches (data mining in the traditional sense) are
>>> out (I don't think we ever thought they were in, but anyway ...), and
>>> users of SEEK technology generally will have a hypothesis in mind
>>> when they interact with the system, is it useful to try to capture /
>>> represent hypothesis in the system, and if so, how could they be
>>> exploited and how could they be practically represented?
>>>
>>> For example, could workflows be organized based on their
>>> applicability to certain styles of hypothesis? Or, as a holy grail,
>>> you could imagine a scientist entering a hypothesis and the system
>>> actually trying to organize data and services that could be used to
>>> test the hypothesis (where the hypothesis is like a query, I
>>> suppose). For the latter, GEON is actually designing, and has
>>> designed, many of their test cases and use cases around specific
>>> hypothesis ... as opposed to the approach in SEEK of focusing test
>>> cases on a tool (GARP).
>>>
>>> Shawn
>>>
>>>
>>>
>>> Deana Pennington wrote:
>>>
>>>> Sorry so long to reply...I've been at a conference without e-mail...
>>>>
>>>> The entire scientific process is designed around testing
>>>> hypotheses. You come up with a research question of interest, then
>>>> create an analysis to test it. NSF funding (and other funding
>>>> sources) are completely based on the strength (scientific merit) of
>>>> the question and how well thought out the proposed methodology is.
>>>> The idea of integrating data simply to see if anything comes out of
>>>> it is strongly resisted, as is the idea of tool-driven science. The
>>>> general argument is that science should be directed and focused
>>>> along paths that have been rationally determined. Occasionally a
>>>> tool comes along that changes the way we can think about science
>>>> (like the microsope, for example), and for a short time, some
>>>> exploratory analysis is funded. But that is the exception, not the
>>>> norm. The synthetic work that is being encouraged may depend on
>>>> data integration, but it will have to be proposed as a traditional
>>>> research question to get funded. Its the difference between saying
>>>> you want to put climate and hydrology data together over time to
>>>> look for interesting patterns, and having a focused question that
>>>> requires data integration to do the analysis (hypothesis: drought in
>>>> the western US has resulted in reduced evapotranpiration in high
>>>> elevation forests, which should result in an increase in runoff for
>>>> a given increase in precipitation).
>>>>
>>>> Actually, this seems to me to be a fundamental difference in the way
>>>> CIS/IM and domain scientists approach problems. I've been having a
>>>> long-term discussion about this with Samantha. The RCN classes have
>>>> presented a data-centric view that works well with information
>>>> managers, but did not work well with the domain scientists at the
>>>> new fac/postdoc workshop. They kept wondering what the
>>>> goals/objectives were of the information that was presented early in
>>>> the week (Why are we doing this?). For the distributed graduate
>>>> seminar, we have intentionally changed that order around to a
>>>> research question focus. We'll see what kind of response we get,
>>>> but I think it will resonate with them. Formulating your ideas
>>>> through knowledge representation, pulling together concepts,
>>>> creating approaches to workflows...those are early in the seminar,
>>>> and would occur early in the scientific process, long before a
>>>> scientists thinks about data models, structures, or metadata.
>>>>
>>>> Deana
>>>>
>>>>
>>>> Bertram Ludaescher wrote:
>>>>
>>>>> Interesting discussioN!
>>>>>
>>>>> Btw: I think of two very different things when I hear data-driven:
>>>>>
>>>>> (a) what you guys say below: I have some data, let's see what I can
>>>>> do with it, and
>>>>> (b) some scientific workflow that is data-driven (optionally in a very
>>>>> technical sense as in Ptolemy/Kepler)
>>>>>
>>>>> Bertram
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>>>>> "SB" == Shawn Bowers <bowers at sdsc.edu> writes:
>>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>>
SB> SB> Comments on one of your comments :-)
SB> SB> (Also, I CC'd seek-dev in case anyone else is interested in
>>>>> the thread)
SB> SB> Deana Pennington wrote:
SB>
>>>>>
>>>>>>>> - In the scenario, interestingly, the researcher first searches
>>>>>>>> for appropriate workflows and once found, searches for the
>>>>>>>> data. It seems like it could go either way: the researcher may
>>>>>>>> have found roughly the data to support their hypothesis, and
>>>>>>>> then wants to find the right workflow/analysis to use on the
>>>>>>>> data. The latter seems like it has more potential need for data
>>>>>>>> integration/transformation in that as a researcher looking for
>>>>>>>> data, you wouldn't be restricted by everything being "uniform"
>>>>>>>> just so you could plug it into the right model (of course, you
>>>>>>>> wouldn't necessarily be limited in this way by finding analyses
>>>>>>>> first, but I think it becomes a much harder problem). Instead,
>>>>>>>> you would be looking for "good" data, regardless of whether it
>>>>>>>> is nicely formatted (which seems to be true for the mammal case
>>>>>>>> -- I believe that is the motivation for using IPCC data).
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Yes, it could go either way. However, I think for most
>>>>>>> scientists, they think of the problem first, then look for data.
>>>>>>> The order is more likely to be, "I want to compare NPP at
>>>>>>> grassland sites around the world, and there are 4 different ways
>>>>>>> I could calculate NPP, and each of those ways requires different
>>>>>>> types of data". The "4 different ways" would be expressed as
>>>>>>> analytical workflows. It is possible, thought, that after
>>>>>>> framing the question "I want to compare NPP", then they would
>>>>>>> decide to look and see what data are available before thinking
>>>>>>> about the appropriate analysis. In fact, the whole idea of
>>>>>>> data-driven analyses is a new one in ecology (and science in
>>>>>>> general), and there are whole groups of people who think it is a
>>>>>>> completely wrong approach.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
SB> SB> I take back my original statement that the problem is
>>>>> harder in one SB> direction than in the other. Basically, our
>>>>> problem is to match datasets SB> with services. There is a set of
>>>>> constraints we want the datasets to SB> satisfy Dq (the query) and
>>>>> a set of implied constraints in the datasets SB> found Dc (e.g.,
>>>>> structural and semantic constraints) by Dq. Similarly, SB> there
>>>>> is a set of constraints we want the services to satisfy Sq (e.g.,
SB> that the services/workflow computes NPP), and a set of implied
SB> constraints in the services found Ds (structural and semantic
SB> constraints on inputs, e.g.) by Dq. So generally, regardless of
>>>>> whether SB> we search for datasets first or services first, our
>>>>> goal is to figure SB> out a way to transform and group the datasets
>>>>> to make the implied SB> constraints on the datasets fit with the
>>>>> implied constraints on the SB> services. The problem changes
>>>>> (which is what my original point was SB> trying to say) if we
>>>>> assume that the datasets we look for *must* match SB> (without any
>>>>> transformation) the service constraints, which is the SB> current
>>>>> motivation for choosing the IPCC data (which isn't necessarily a
SB> bad thing, it just isn't general, which we already know).
SB> SB> Also, for data-driven analysis, what is the argument as to
>>>>> why people SB> say it is the wrong approach? Aren't there other
>>>>> "scientific" fields, SB> such as medicine or psychology (I consider
>>>>> these scientific, but that SB> probably isn't the general
>>>>> classification), that are very much "data SB> driven" in this way?
>>>>> I am just curious, and am interested in hearing SB> your opinion ...
SB> SB> Shawn
SB> SB> SB> SB> _______________________________________________
SB> seek-dev mailing list
SB> seek-dev at ecoinformatics.org
SB> http://www.ecoinformatics.org/mailman/listinfo/seek-dev
>>>>> _______________________________________________
>>>>> seek-dev mailing list
>>>>> seek-dev at ecoinformatics.org
>>>>> http://www.ecoinformatics.org/mailman/listinfo/seek-dev
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> seek-kr-sms mailing list
>>> seek-kr-sms at ecoinformatics.org
>>> http://www.ecoinformatics.org/mailman/listinfo/seek-kr-sms
>>
>>
>> _______________________________________________
>> seek-kr-sms mailing list
>> seek-kr-sms at ecoinformatics.org
>> http://www.ecoinformatics.org/mailman/listinfo/seek-kr-sms
DP>
DP>
DP> --
DP> ********
DP>
DP> Deana D. Pennington, PhD
DP> Long-term Ecological Research Network Office
DP>
DP> UNM Biology Department
DP> MSC03 2020
DP> 1 University of New Mexico
DP> Albuquerque, NM 87131-0001
DP>
DP> 505-272-7288 (office)
DP> 505 272-7080 (fax)
DP>
More information about the Seek-kr-sms
mailing list