[seek-dev] (no subject)

Jing Tao tao at nceas.ucsb.edu
Thu Sep 23 19:48:48 PDT 2004


Hi, matt, steve and chris:

This afternoon I played a little bit with the query steve submit to 
ecogrid node and found something about number and string.

Metacat stores the metadata in "nodedata" field in xml_nodes table. The 
data type for "nodedata" is char. If you want to compare a number in 
query, db should do casting.

basically, steve's query will look like:
AND 
 condition1 northBoundingCoordinate<=31.5
 condition2 southBoundingCoordinate>=10.0
 condition3 surname=joye
AND

In metacat log, the time consumer is the main query, taking 271 seconds 
(extened  query is trivial, only take 3 seconds). I post the main query 
to sql client, it took 267.4 seconds to run. So the result is almost same.

Then I did some other tests and try to figure it out. Here is some testing:

Query 1: northBoundingCoordinate<=31.5

SQL query look like(simplied, only show different part to quey 2,3):
SELECT DISTINCT docid FROM xml_nodes WHERE nodedata <= 31.5   AND 
parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE 
'geographicCoverage/boundingCoordinates/northBoundingCoordinate'  

Time: 236 seconds(No too much faster than whole query) appro 100 documents


Query 2: southBoundingCoordinate>=10.0
SQL query look like(simplied, only show differenc to query 1, 3)
 SELECT DISTINCT docid FROM xml_nodes WHERE nodedata >= 10.0  AND 
parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE 
'geographicCoverage/boundingCoordinates/southBoundingCoordinate')

Time: 114.9 seconds appro 100 docs

Query 3:
Condition surname=joye

SQL query looks like(only show different part to query 1, 2)
SELECT DISTINCT docid FROM xml_nodes WHERE UPPER(nodedata) = 'JOYE' AND 
parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE 
'/eml/dataset/creator/individualName/surName'  

Time: 2.8 seconds(Much faster) 4 docs

Oh, Third one(2.8 secs) is much faster than first one(236 secs) and second 
one(114.9). Postgres have some problem in casting :)

I also run following query:

Query 4: northBoundingCoordinate<=31.5
The condition is as same as query 1, but the sql query are little bit 
different. It add something to get rid of non-number cells (this is 
metacat does now):
SELECT DISTINCT docid FROM xml_nodes WHERE nodedata <= 31.5  AND 
UPPER(nodedata) = LOWER(nodedata) AND LTRIM(nodedata) != ' '  AND nodedata 
IS NOT NULL AND parentnodeid IN (SELECT nodeid FROM xml_index WHERE path 
LIKE 'geographicCoverage/boundingCoordinates/northBoundingCoordinate'  

Time: 247.1 seconds(Almost to query 1 - 236 seconds)


Query 5: southBoundingCoordinate>=10.0
The condition is as same as query 2, but the sql query are little bit 
different. It add some thing got get rid of nonnumber of cells.

SELECT DISTINCT docid FROM xml_nodes WHERE nodedata >= 10.0 AND 
UPPER(nodedata) = LOWER(nodedata)  AND nodedata IS NOT NULL AND  
LTRIM(nodedata) != ' ' parentnodeid IN (SELECT nodeid FROM xml_index WHERE 
path LIKE 
'geographicCoverage/boundingCoordinates/southBoundingCoordinate' 

Time: 14.1 seconds (much faster than query 2 - 114.9 secs)

The difference between query 2 and query 5 is, query 5 gets rid of 
non-number cell in search space. And query 5(14 secs) is much faster than
query 2(114 secs). So it seems progres has a hard time to ignore 
non-number char during the casting(from char to number).

But why query 4 (247 seconds) and query 1(236 seconds) no big difference 
even we add something to get rid of nonnumber cell in query 4? I guess the 
our adding is not completed, it doesn't get rid of all non-number char in 
cell. We need fixed that.

I also try to run those queries in Oracle in ecoinfo. But I got a 
"ORA-01722: invalid number" error. Durane got the same problem in 
numberica search. Those prove our metacat query couldn't remove all 
non-number character too.

It seems posgres can ingore non-number character in cell during 
string-number casting, but it is very expensive. Oracle just throw a sql 
excpeiton. 

Hope this is helpful.

Jing


-- 
Jing Tao
National Center for Ecological
Analysis and Synthesis (NCEAS)
735 State St. Suite 204
Santa Barbara, CA 93101




More information about the Seek-dev mailing list