[seek-dev] Today's Ecogrid Call

Dan Higgins higgins at nceas.ucsb.edu
Fri Sep 17 10:42:56 PDT 2004


Hi Shawn,
    I had been thinking along the same lines of using records or arrays 
of records. This is similar to dataframes in R. However, there are some 
desired capabilities that would be nice that I don't see how to carry 
out without some work.

Say a table is a collection of records (columns). How do I sort the 
whole table based on the sort of one column? Or how do I subset the 
table based on values in one column?  For example, in R one can subset a 
datatable with a command like

d[d$col1>1000,]

where 'd' is a datatable name and 'col1' is the name of a column. The 
result is all rows in the table with values in col1 greter than 1000.
I can, of course, white code to do this by examing the record values 
individually and building new records, but it sure would be nice to have 
some simpler expressions for such things. [And we can always use the 
HSQL engine for all such operations, even locally]


Dan


Shawn Bowers wrote:

>
> I think that Ptolemy actually supports tables, through complex 
> structures, pretty well. In particular, every table is simply an array 
> of records.
>
> Lets say I have the following relation schema:
>
> CREATE TABLE ds1
> (
> age int,
> weight double,
> plot int,
> species string
> )
>
> (I'm fudging a bit on the domains since these aren't valid sql, but 
> that is a minor detail.)
>
>
> This can be represented in Ptolemy as the following type definition:
>
> {{age=int, weight=double, plot=int, species=string}}
>
> That is, as a list of 4-tuples.
>
> Of course, this definition doesn't explicitly state that the structure 
> is a table. One could introduce a convention for representing tables a 
> la xml (i.e., through tags), or else, could introduce an explicit 
> ptolemy data type to support tables (not hard given that the data 
> structures exist and in principle ptolemy's type system is extensible).
>
> For the convention approach, we could just wrap the whole structure in 
> a record:
>
> { sql_tbl = { { _attributes here_ } } }
>
> So, for example, the above def would be:
>
> { sql_tbl = { { age=int, weight=double, plot=int, species=string } } }
>
> And the actual table would be passed as:
>
> { sql_tbl = {
>     {age=1, weight=50.0, plot=1, species="ABCD" },
>     {age=1, weight=49.9, plot=1, species="ABCE" },
>     {age=2, weight=50.1, plot=2, species="ABCD" }
>   }
> }
>
>
> Shawn
>
>
>
>
> Dan Higgins wrote:
>
>> Rod,
>>     I haven't been following very closely all the work you and Jing 
>> (and others) are doing , so this may be a silly question. I notice 
>> that you have referred to the EML200DataSource returning a table. 
>> Just what sort of data strructure are you referring to?  As far as I 
>> know, Kepler doesn't have a 'TableToken'. It this a Java class or 
>> some array/vector of column data (strings?) in the Java code? It 
>> would be nice if we could create some table strucure like the 'data 
>> frames' of 'R' that could be passed between actors iin Kepler and 
>> easily manipulated.
>>
>> Dan
>>
>> Rod Spears wrote:
>>
>>> Thinks to think about before we meet:
>>>
>>> 1) The Eml200DataSource uses the Ecogrid to get Metadata about an 
>>> item and then returns the data for that item as a single table. The 
>>> QueryBuilder can be used to reduce the number of columns that are 
>>> pass through the ports, but is not necessarily a require part of 
>>> this data object.
>>>
>>> 2) What else will we be using the generic QueryBuilder for? Meaning 
>>> what kind of data object will be returning more than one table that 
>>> is not an Ecogrid Query?
>>>     2.1) I think we have talked about this when the user will be 
>>> accessing local data files; thru HSQL?  JDBC?
>>>         2.1.1) If so, then how do they discover and get their local 
>>> data into Kepler?
>>>
>>> 3) Do we need a more generic EcogridDataSource object that can 
>>> execute generic Ecogrid Queries? And if so, do we need an Ecogrid 
>>> Query specific QueryBuilder instead of a generic one?
>>>
>>> 4) Do we need a DiGIR Data Source object, or would this be covered 
>>> by #2. If it was DiGIR specific than we could get data from node 
>>> that may not be register???? (I am not sure)
>>>
>>> Rod
>>>
>>> -- 
>>> Rod Spears
>>> Biodiversity Research Center
>>> University of Kansas
>>> 1345 Jayhawk Boulevard
>>> Lawrence, KS 66045, USA
>>> Tel: 785 864-4082, Fax: 785 864-5335
>>>
>>
>>
>> -- 
>> *******************************************************************
>> Dan Higgins                                  higgins at nceas.ucsb.edu
>> http://www.nceas.ucsb.edu/    Ph: 805-892-2531
>> National Center for Ecological Analysis and Synthesis (NCEAS) 735 
>> State Street - Room 205
>> Santa Barbara, CA 93195
>> *******************************************************************
>>
>
> _______________________________________________
> seek-dev mailing list
> seek-dev at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/seek-dev



-- 
*******************************************************************
Dan Higgins                                  higgins at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Ph: 805-892-2531
National Center for Ecological Analysis and Synthesis (NCEAS) 
735 State Street - Room 205
Santa Barbara, CA 93195
*******************************************************************




More information about the Seek-dev mailing list