[seek-dev] Today's Ecogrid Call
Dan Higgins
higgins at nceas.ucsb.edu
Sat Sep 18 14:27:04 PDT 2004
Shawn,
As far as I can tell, R does not explicitly use any backend SQL
database. (There are hooks to SQL databases in extensions.) R is
basicallly a memory based system so it does have lmitations with
handling very large datasets (but modern machinse do typically have a
lot of RAM!)
Dan
Shawn Bowers wrote:
>
>
> Dan Higgins wrote:
>
>> Hi Shawn,
>> I had been thinking along the same lines of using records or
>> arrays of records. This is similar to dataframes in R. However, there
>> are some desired capabilities that would be nice that I don't see how
>> to carry out without some work.
>>
>> Say a table is a collection of records (columns). How do I sort the
>> whole table based on the sort of one column? Or how do I subset the
>> table based on values in one column? For example, in R one can
>> subset a datatable with a command like
>>
>> d[d$col1>1000,]
>>
>> where 'd' is a datatable name and 'col1' is the name of a column. The
>> result is all rows in the table with values in col1 greter than 1000.
>> I can, of course, white code to do this by examing the record values
>> individually and building new records, but it sure would be nice to
>> have some simpler expressions for such things. [And we can always use
>> the HSQL engine for all such operations, even locally]
>>
>
> I think the array of record thing is useful for passing tables around,
> not necessarily for querying them (i.e., the R expression is really
> select * from d where col1 > 1000).
>
> If you need to query it, then I would think that is best done using an
> SQL query engine; which as you say could be quickly performed by the
> HSQL engine. For example, have an HSQL actor that takes as a
> parameter (or as an input) an SQL query expression and one or more
> input tables, and outputs a result table.
>
> I think all of this is predicated on the tables being smallish.
>
> For large tables (that won't fit in main memory reasonably), you need
> a real database :) -- Out of curiosity, does R use a db backend?
>
> shawn
>
>
>>
>> Dan
>>
>>
>> Shawn Bowers wrote:
>>
>>>
>>> I think that Ptolemy actually supports tables, through complex
>>> structures, pretty well. In particular, every table is simply an
>>> array of records.
>>>
>>> Lets say I have the following relation schema:
>>>
>>> CREATE TABLE ds1
>>> (
>>> age int,
>>> weight double,
>>> plot int,
>>> species string
>>> )
>>>
>>> (I'm fudging a bit on the domains since these aren't valid sql, but
>>> that is a minor detail.)
>>>
>>>
>>> This can be represented in Ptolemy as the following type definition:
>>>
>>> {{age=int, weight=double, plot=int, species=string}}
>>>
>>> That is, as a list of 4-tuples.
>>>
>>> Of course, this definition doesn't explicitly state that the
>>> structure is a table. One could introduce a convention for
>>> representing tables a la xml (i.e., through tags), or else, could
>>> introduce an explicit ptolemy data type to support tables (not hard
>>> given that the data structures exist and in principle ptolemy's type
>>> system is extensible).
>>>
>>> For the convention approach, we could just wrap the whole structure
>>> in a record:
>>>
>>> { sql_tbl = { { _attributes here_ } } }
>>>
>>> So, for example, the above def would be:
>>>
>>> { sql_tbl = { { age=int, weight=double, plot=int, species=string } } }
>>>
>>> And the actual table would be passed as:
>>>
>>> { sql_tbl = {
>>> {age=1, weight=50.0, plot=1, species="ABCD" },
>>> {age=1, weight=49.9, plot=1, species="ABCE" },
>>> {age=2, weight=50.1, plot=2, species="ABCD" }
>>> }
>>> }
>>>
>>>
>>> Shawn
>>>
>>>
>>>
>>>
>>> Dan Higgins wrote:
>>>
>>>> Rod,
>>>> I haven't been following very closely all the work you and Jing
>>>> (and others) are doing , so this may be a silly question. I notice
>>>> that you have referred to the EML200DataSource returning a table.
>>>> Just what sort of data strructure are you referring to? As far as
>>>> I know, Kepler doesn't have a 'TableToken'. It this a Java class or
>>>> some array/vector of column data (strings?) in the Java code? It
>>>> would be nice if we could create some table strucure like the 'data
>>>> frames' of 'R' that could be passed between actors iin Kepler and
>>>> easily manipulated.
>>>>
>>>> Dan
>>>>
>>>> Rod Spears wrote:
>>>>
>>>>> Thinks to think about before we meet:
>>>>>
>>>>> 1) The Eml200DataSource uses the Ecogrid to get Metadata about an
>>>>> item and then returns the data for that item as a single table.
>>>>> The QueryBuilder can be used to reduce the number of columns that
>>>>> are pass through the ports, but is not necessarily a require part
>>>>> of this data object.
>>>>>
>>>>> 2) What else will we be using the generic QueryBuilder for?
>>>>> Meaning what kind of data object will be returning more than one
>>>>> table that is not an Ecogrid Query?
>>>>> 2.1) I think we have talked about this when the user will be
>>>>> accessing local data files; thru HSQL? JDBC?
>>>>> 2.1.1) If so, then how do they discover and get their
>>>>> local data into Kepler?
>>>>>
>>>>> 3) Do we need a more generic EcogridDataSource object that can
>>>>> execute generic Ecogrid Queries? And if so, do we need an Ecogrid
>>>>> Query specific QueryBuilder instead of a generic one?
>>>>>
>>>>> 4) Do we need a DiGIR Data Source object, or would this be covered
>>>>> by #2. If it was DiGIR specific than we could get data from node
>>>>> that may not be register???? (I am not sure)
>>>>>
>>>>> Rod
>>>>>
>>>>> --
>>>>> Rod Spears
>>>>> Biodiversity Research Center
>>>>> University of Kansas
>>>>> 1345 Jayhawk Boulevard
>>>>> Lawrence, KS 66045, USA
>>>>> Tel: 785 864-4082, Fax: 785 864-5335
>>>>>
>>>>
>>>>
>>>> --
>>>> *******************************************************************
>>>> Dan Higgins higgins at nceas.ucsb.edu
>>>> http://www.nceas.ucsb.edu/ Ph: 805-892-2531
>>>> National Center for Ecological Analysis and Synthesis (NCEAS) 735
>>>> State Street - Room 205
>>>> Santa Barbara, CA 93195
>>>> *******************************************************************
>>>>
>>>
>>> _______________________________________________
>>> seek-dev mailing list
>>> seek-dev at ecoinformatics.org
>>> http://www.ecoinformatics.org/mailman/listinfo/seek-dev
>>
>>
>>
>>
>>
More information about the Seek-dev
mailing list