[seek-dev] Today's Ecogrid Call

Dan Higgins higgins at nceas.ucsb.edu
Sat Sep 18 14:27:04 PDT 2004


Shawn,
    As far as I can tell, R does not explicitly use any backend SQL 
database. (There are hooks to SQL databases in extensions.) R is 
basicallly a memory based system so it does have lmitations with 
handling very large datasets (but modern machinse do typically have a 
lot of RAM!)

Dan

Shawn Bowers wrote:

>
>
> Dan Higgins wrote:
>
>> Hi Shawn,
>>    I had been thinking along the same lines of using records or 
>> arrays of records. This is similar to dataframes in R. However, there 
>> are some desired capabilities that would be nice that I don't see how 
>> to carry out without some work.
>>
>> Say a table is a collection of records (columns). How do I sort the 
>> whole table based on the sort of one column? Or how do I subset the 
>> table based on values in one column?  For example, in R one can 
>> subset a datatable with a command like
>>
>> d[d$col1>1000,]
>>
>> where 'd' is a datatable name and 'col1' is the name of a column. The 
>> result is all rows in the table with values in col1 greter than 1000.
>> I can, of course, white code to do this by examing the record values 
>> individually and building new records, but it sure would be nice to 
>> have some simpler expressions for such things. [And we can always use 
>> the HSQL engine for all such operations, even locally]
>>
>
> I think the array of record thing is useful for passing tables around, 
> not necessarily for querying them (i.e., the R expression is really 
> select * from d where col1 > 1000).
>
> If you need to query it, then I would think that is best done using an 
> SQL query engine; which as you say could be quickly performed by the 
> HSQL engine.  For example, have an HSQL actor that takes as a 
> parameter (or as an input) an SQL query expression and one or more 
> input tables, and outputs a result table.
>
> I think all of this is predicated on the tables being smallish.
>
> For large tables (that won't fit in main memory reasonably), you need 
> a real database :)  -- Out of curiosity, does R use a db backend?
>
> shawn
>
>
>>
>> Dan
>>
>>
>> Shawn Bowers wrote:
>>
>>>
>>> I think that Ptolemy actually supports tables, through complex 
>>> structures, pretty well. In particular, every table is simply an 
>>> array of records.
>>>
>>> Lets say I have the following relation schema:
>>>
>>> CREATE TABLE ds1
>>> (
>>> age int,
>>> weight double,
>>> plot int,
>>> species string
>>> )
>>>
>>> (I'm fudging a bit on the domains since these aren't valid sql, but 
>>> that is a minor detail.)
>>>
>>>
>>> This can be represented in Ptolemy as the following type definition:
>>>
>>> {{age=int, weight=double, plot=int, species=string}}
>>>
>>> That is, as a list of 4-tuples.
>>>
>>> Of course, this definition doesn't explicitly state that the 
>>> structure is a table. One could introduce a convention for 
>>> representing tables a la xml (i.e., through tags), or else, could 
>>> introduce an explicit ptolemy data type to support tables (not hard 
>>> given that the data structures exist and in principle ptolemy's type 
>>> system is extensible).
>>>
>>> For the convention approach, we could just wrap the whole structure 
>>> in a record:
>>>
>>> { sql_tbl = { { _attributes here_ } } }
>>>
>>> So, for example, the above def would be:
>>>
>>> { sql_tbl = { { age=int, weight=double, plot=int, species=string } } }
>>>
>>> And the actual table would be passed as:
>>>
>>> { sql_tbl = {
>>>     {age=1, weight=50.0, plot=1, species="ABCD" },
>>>     {age=1, weight=49.9, plot=1, species="ABCE" },
>>>     {age=2, weight=50.1, plot=2, species="ABCD" }
>>>   }
>>> }
>>>
>>>
>>> Shawn
>>>
>>>
>>>
>>>
>>> Dan Higgins wrote:
>>>
>>>> Rod,
>>>>     I haven't been following very closely all the work you and Jing 
>>>> (and others) are doing , so this may be a silly question. I notice 
>>>> that you have referred to the EML200DataSource returning a table. 
>>>> Just what sort of data strructure are you referring to?  As far as 
>>>> I know, Kepler doesn't have a 'TableToken'. It this a Java class or 
>>>> some array/vector of column data (strings?) in the Java code? It 
>>>> would be nice if we could create some table strucure like the 'data 
>>>> frames' of 'R' that could be passed between actors iin Kepler and 
>>>> easily manipulated.
>>>>
>>>> Dan
>>>>
>>>> Rod Spears wrote:
>>>>
>>>>> Thinks to think about before we meet:
>>>>>
>>>>> 1) The Eml200DataSource uses the Ecogrid to get Metadata about an 
>>>>> item and then returns the data for that item as a single table. 
>>>>> The QueryBuilder can be used to reduce the number of columns that 
>>>>> are pass through the ports, but is not necessarily a require part 
>>>>> of this data object.
>>>>>
>>>>> 2) What else will we be using the generic QueryBuilder for? 
>>>>> Meaning what kind of data object will be returning more than one 
>>>>> table that is not an Ecogrid Query?
>>>>>     2.1) I think we have talked about this when the user will be 
>>>>> accessing local data files; thru HSQL?  JDBC?
>>>>>         2.1.1) If so, then how do they discover and get their 
>>>>> local data into Kepler?
>>>>>
>>>>> 3) Do we need a more generic EcogridDataSource object that can 
>>>>> execute generic Ecogrid Queries? And if so, do we need an Ecogrid 
>>>>> Query specific QueryBuilder instead of a generic one?
>>>>>
>>>>> 4) Do we need a DiGIR Data Source object, or would this be covered 
>>>>> by #2. If it was DiGIR specific than we could get data from node 
>>>>> that may not be register???? (I am not sure)
>>>>>
>>>>> Rod
>>>>>
>>>>> -- 
>>>>> Rod Spears
>>>>> Biodiversity Research Center
>>>>> University of Kansas
>>>>> 1345 Jayhawk Boulevard
>>>>> Lawrence, KS 66045, USA
>>>>> Tel: 785 864-4082, Fax: 785 864-5335
>>>>>
>>>>
>>>>
>>>> -- 
>>>> *******************************************************************
>>>> Dan Higgins                                  higgins at nceas.ucsb.edu
>>>> http://www.nceas.ucsb.edu/    Ph: 805-892-2531
>>>> National Center for Ecological Analysis and Synthesis (NCEAS) 735 
>>>> State Street - Room 205
>>>> Santa Barbara, CA 93195
>>>> *******************************************************************
>>>>
>>>
>>> _______________________________________________
>>> seek-dev mailing list
>>> seek-dev at ecoinformatics.org
>>> http://www.ecoinformatics.org/mailman/listinfo/seek-dev
>>
>>
>>
>>
>>




More information about the Seek-dev mailing list