[kepler-dev] Re: sampling

Wed May 26 12:40:41 PDT 2004

Shawn Bowers wrote:

>
> This sounds like an interesting example, but I am not sure I 
> completely understand the problem ... 

I'll be happy to send you the 3 page write up when I'm finished!

>
>
> > I want to be able to sample in any dimension (space: x,y,z, time t, 
> or > other dimensions).
>
> So, 'sample dimension' would be a parameter for the workflow? 

Yes.  There could be multiple sample dimensions in any given run. 

>
>
> > We should be able to figure out which fields in a parsed dataset are
> > which, but I need to also allow for the selection of other 
> dimensions, > which will have to be identified by the user.
>
> I am not sure what this implies ... are you meaning that you can 
> determine the 'sample dimension' from the columns of a database? And 
> that one of these dimensions can be selected as the parameter value 
> for the workflow? Or, are these independent: the columns and the 
> 'sample dimension'?

By dimension, I essentially mean a column that contains continuous data, 
which can be "located" along a dimensional axis relative to one 
another.  The data to be sampled may be in a different column, but how 
it is sampled depends on the dimensional column.  For instance, a 
temporal dimension exists continuously in the past and future, 
independent of what exact dates are represented in a dataset.  If the 
data has dates 19970203, 19980923, and 20000403, a temporal dimension 
exists in the data, and I could infer that that endpoints for an 
analysis should run from 19970000 to 20010000, even though those dates 
are not represented, because I know that date columns represent the 
temporal dimension, and I can place the dates along a dimensional axis 
relative to one another.  I may decide that I want to sample in 5 year 
increments.  The sample intervals would be determined by the dimensional 
column, but the data would come from an attribute column.  I could have 
a continuous column that represents temperature, and I could use that 
column as a dimension to determine my sample intervals (say, 10 degree 
increments), but the data I am sampling would be something else (pH, or 
number of species, or anything that could be related to the temperature 
column).  In any given dataset, there may be 0 to N columns that could 
be used in a dimensional analysis.  The mammals project needs to sample 
species occurrence along the x & y dimensions, but this is going to be a 
generic actor that could operate on any kind of continuous data.  The 
metadata has specific tags for fields that represent continuous spatial 
and temporal dimensions.  Other continuous columns will not have special 
tags, but we could figure out from the metadata that they could be used 
as a dimension.  We need to keep track of spatial x,y,z & temporal t (if 
they exist in the data), because these have special meanings in many 
models. 

>
> > I thought about setting this up so that we could use your eml
> > ingestion actor to parse a file, then send it to the sampler.
> > However, that requires mapping specific eml outport ports to specific
> > sampler input ports, which will not be known until runtime.  What is
> > the best way to set it up so that the user can send a file to the
> > sampler actor, see some information about the fields, and either
> > select or parameterize the correct fields to be sampled?
>
> What do you mean by "parameterize the correct fields" here? 

The user needs to be able to select which columns (dimensions) they want 
to use to construct the sample interval, and also what data they want 
sampled.  That will have to be a parameter(s).

>
>
>
> >> Seems like we could have the SMS figure out, at the
> >> beginning of the run, what actors need to be parameterized based on
> >> runtime data, and could prompt the user for those inputs.
>
> In this case, by analyzing the semantic annotations, we would know 
> that the sample actor requires a 'sample dimension' parameter prior to 
> executing the workflow.   So, prior to running the workflow, this port 
> (parameter, or whatever), would need to be bound, and bound to 
> something that provides a 'sample dimension'. If a user has found data 
> to be pushed through the workflow, SMS could tell the user at workflow 
> setup time the ports that are required, possibly offering suggestions 
> as to how the ports could be bound via the given data. Note that this 
> is a static analysis based on semantic annotations, i.e., the analysis 
> is not done at runtime by analyzing the data.

For this example, SMS could state that the worlflow requires a 'sample 
dimension', and present the set of sample dimensions that can be found 
in the data provided by the user (by analyzing the semantic annotations 
over the data), allowing the user to select one of the possible 
dimensions. (I am not sure what the 'other dimensions' might be that you 
list above ...) SMS could then figure out how to feed the information 
correctly to the port, which in this case, doesn't sound that hard.

>
>
> Is this what you were thinking?

This is exactly what I was thinking (but stated far more clearly!)

>
> shawn
>
>
>
>
> Chad Berkley wrote:
>
>> I think this is the ideal solution, but I don't think we're anywhere 
>> close to having that kind of functionality (Shawn and Bertram, 
>> correct me if I'm wrong).  If we want garp and WFs like it to work in 
>> the meantime, we're still going to need user input functionality.  
>> Efrat just pointed out on IRC that we can do the basics with the 
>> browserUI actor, which may be a good stopgap.
>>
>> chad
>>
>> On May 26, 2004, at 11:17 AM, Deana Pennington wrote:
>>
>>> Is this something we could work towards with the SMS workflow 
>>> analysis?  Seems like we could have the SMS figure out, at the 
>>> beginning of the run, what actors need to be parameterized based on 
>>> runtime data, and could prompt the user for those inputs.  That way, 
>>> it wouldn't have to pause in the middle, which is really not a great 
>>> idea.  Some models run for days, and you don't want them pausing in 
>>> the middle of the night, waiting for user input.
>>>
>>> I think we should just add this to Shawn/Bertram's list of things to 
>>> do...doesn't the SMS fix everything??? :-)
>>>
>>> Deana
>>>
>>>
>>> Chad Berkley wrote:
>>>
>>>> Hi Deana,
>>>>
>>>> See my comments below:
>>>>
>>>> On May 26, 2004, at 1:24 AM, Deana Pennington wrote:
>>>>
>>>>> Chad,
>>>>>
>>>>> I worked on this for several hours yesterday, and still have some 
>>>>> work to
>>>>> do today.  It would be easy to figure this out just for the mammal
>>>>> project...its turning out to be more difficult to think through a 
>>>>> generic
>>>>> sampling routine.  I think I have it figured out, now, though, and am
>>>>> writing up some instructions for you.  I'll send those by the end 
>>>>> of the
>>>>> day.  Then we can talk about it on IRC, or by phone.
>>>>
>>>>
>>>>
>>>>
>>>> Cool.  I'll be on IRC all day so just let me know when you're ready 
>>>> to chat.
>>>>
>>>>>
>>>>> Question:  I want to be able to sample in any dimension (space: 
>>>>> x,y,z,
>>>>> time t, or other dimensions).  We should be able to figure out which
>>>>> fields in a parsed dataset are which, but I need to also allow for 
>>>>> the
>>>>> selection of other dimensions, which will have to be identified by 
>>>>> the
>>>>> user.  I thought about setting this up so that we could use your eml
>>>>> ingestion actor to parse a file, then send it to the sampler.  
>>>>> However,
>>>>> that requires mapping specific eml outport ports to specific 
>>>>> sampler input
>>>>> ports, which will not be known until runtime.  What is the best 
>>>>> way to set
>>>>> it up so that the user can send a file to the sampler actor, see some
>>>>> information about the fields, and either select or parameterize the
>>>>> correct fields to be sampled?
>>>>
>>>>
>>>>
>>>>
>>>> I've been tossing this idea of stopping the execution to ask for 
>>>> input around in my head since the SEV meeting.  I haven't really 
>>>> come up with a good solution other than writing an extension to the 
>>>> pause actor that allows an input dialog to be popped up.  How to 
>>>> configure that dialog or get the actual information you need is a 
>>>> tough question since I'd like to make it generic enough to work 
>>>> with workflows other than GARP.  Edward or Christopher, do you have 
>>>> any examples of workflows that stop execution and ask users for 
>>>> input then continue executing based on that input?  I haven't seen 
>>>> any.  Does anyone have any good ideas on how this could be done 
>>>> generically?
>>>>
>>>> The flow as I see it is:
>>>> 1) execution is paused
>>>> 2) a dialog that is partially preconfigured at design-time and 
>>>> fully configured at run-time is presented to the user
>>>> 3) the user makes a choice, altering the exec-time parameters of 
>>>> the rest of the workflow
>>>>
>>>> Note that in 2, the run-time configuration may include such things 
>>>> as dialog widget configuration with run-time produced data (e.g. 
>>>> populating a list box with a run-time data stream).  The 
>>>> design-time configuration would include issues such as choosing the 
>>>> input/output ports and configuring what the logic of the dialog is 
>>>> (this may be tricky).
>>>>
>>>> I think this is probably a necessary bit of functionality since I 
>>>> have seen a couple different eco workflows prototypes that want to 
>>>> do this.
>>>>
>>>> comments?
>>>>
>>>> chad
>>>>
>>>>>
>>>>> Deana
>>>>>
>>>>>
>>>>> Chad Berkley wrote:
>>>>>
>>>>>> Hi Deana,
>>>>>>
>>>>>> I was going to start working on the sampling actor for garp.  
>>>>>> could you
>>>>>> refresh my memory as to how to that should work.  I have the 
>>>>>> inputs as
>>>>>> a species and a scaling metric and the outputs as the intrinsic and
>>>>>> extrinsic data.  aren't there different sampling techniques?  I'd 
>>>>>> like
>>>>>> to build one generic sampling actor that can use one of a number of
>>>>>> different techniques.  I'm on IRC now if you want to chat about 
>>>>>> this in
>>>>>> real-time.
>>>>>>
>>>>>> thanks,
>>>>>> chad
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> -- 
>>> ********
>>>
>>> Deana D. Pennington, PhD
>>> Long-term Ecological Research Network Office
>>>
>>> UNM Biology Department
>>> MSC03  2020
>>> 1 University of New Mexico
>>> Albuquerque, NM  87131-0001
>>>
>>> 505-272-7288 (office)
>>> 505 272-7080 (fax)
>>>
>>
>> _______________________________________________
>> kepler-dev mailing list
>> kepler-dev at ecoinformatics.org
>> http://www.ecoinformatics.org/mailman/listinfo/kepler-dev
>
>
> _______________________________________________
> kepler-dev mailing list
> kepler-dev at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/kepler-dev

-- 
********

Deana D. Pennington, PhD
Long-term Ecological Research Network Office

UNM Biology Department
MSC03  2020
1 University of New Mexico
Albuquerque, NM  87131-0001

505-272-7288 (office)
505 272-7080 (fax)