[seek-dev] (no subject)

Rod Spears rods at ku.edu
Fri Sep 24 07:21:22 PDT 2004


The QB doesn't do joins right now, it is on my list, but at the moment 
other things are ahead of it.

Rod


Shawn Bowers wrote:

>Just FYI: it is usually much better to write a join than a nested subquery
>if you want the query to run fast.  Most query optimizers don't do very
>well at optimizing nested subqueries ... so you might try writing the
>joins below as explicit joins.
>
>shawn
>
>On Thu, 23 Sep 2004, Jing Tao wrote:
>
>  
>
>>Hi, matt, steve and chris:
>>
>>This afternoon I played a little bit with the query steve submit to
>>ecogrid node and found something about number and string.
>>
>>Metacat stores the metadata in "nodedata" field in xml_nodes table. The
>>data type for "nodedata" is char. If you want to compare a number in
>>query, db should do casting.
>>
>>basically, steve's query will look like:
>>AND
>> condition1 northBoundingCoordinate<=31.5
>> condition2 southBoundingCoordinate>=10.0
>> condition3 surname=joye
>>AND
>>
>>In metacat log, the time consumer is the main query, taking 271 seconds
>>(extened  query is trivial, only take 3 seconds). I post the main query
>>to sql client, it took 267.4 seconds to run. So the result is almost same.
>>
>>Then I did some other tests and try to figure it out. Here is some testing:
>>
>>Query 1: northBoundingCoordinate<=31.5
>>
>>SQL query look like(simplied, only show different part to quey 2,3):
>>SELECT DISTINCT docid FROM xml_nodes WHERE nodedata <= 31.5   AND
>>parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE
>>'geographicCoverage/boundingCoordinates/northBoundingCoordinate'
>>
>>Time: 236 seconds(No too much faster than whole query) appro 100 documents
>>
>>
>>Query 2: southBoundingCoordinate>=10.0
>>SQL query look like(simplied, only show differenc to query 1, 3)
>> SELECT DISTINCT docid FROM xml_nodes WHERE nodedata >= 10.0  AND
>>parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE
>>'geographicCoverage/boundingCoordinates/southBoundingCoordinate')
>>
>>Time: 114.9 seconds appro 100 docs
>>
>>Query 3:
>>Condition surname=joye
>>
>>SQL query looks like(only show different part to query 1, 2)
>>SELECT DISTINCT docid FROM xml_nodes WHERE UPPER(nodedata) = 'JOYE' AND
>>parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE
>>'/eml/dataset/creator/individualName/surName'
>>
>>Time: 2.8 seconds(Much faster) 4 docs
>>
>>Oh, Third one(2.8 secs) is much faster than first one(236 secs) and second
>>one(114.9). Postgres have some problem in casting :)
>>
>>I also run following query:
>>
>>Query 4: northBoundingCoordinate<=31.5
>>The condition is as same as query 1, but the sql query are little bit
>>different. It add something to get rid of non-number cells (this is
>>metacat does now):
>>SELECT DISTINCT docid FROM xml_nodes WHERE nodedata <= 31.5  AND
>>UPPER(nodedata) = LOWER(nodedata) AND LTRIM(nodedata) != ' '  AND nodedata
>>IS NOT NULL AND parentnodeid IN (SELECT nodeid FROM xml_index WHERE path
>>LIKE 'geographicCoverage/boundingCoordinates/northBoundingCoordinate'
>>
>>Time: 247.1 seconds(Almost to query 1 - 236 seconds)
>>
>>
>>Query 5: southBoundingCoordinate>=10.0
>>The condition is as same as query 2, but the sql query are little bit
>>different. It add some thing got get rid of nonnumber of cells.
>>
>>SELECT DISTINCT docid FROM xml_nodes WHERE nodedata >= 10.0 AND
>>UPPER(nodedata) = LOWER(nodedata)  AND nodedata IS NOT NULL AND
>>LTRIM(nodedata) != ' ' parentnodeid IN (SELECT nodeid FROM xml_index WHERE
>>path LIKE
>>'geographicCoverage/boundingCoordinates/southBoundingCoordinate'
>>
>>Time: 14.1 seconds (much faster than query 2 - 114.9 secs)
>>
>>The difference between query 2 and query 5 is, query 5 gets rid of
>>non-number cell in search space. And query 5(14 secs) is much faster than
>>query 2(114 secs). So it seems progres has a hard time to ignore
>>non-number char during the casting(from char to number).
>>
>>But why query 4 (247 seconds) and query 1(236 seconds) no big difference
>>even we add something to get rid of nonnumber cell in query 4? I guess the
>>our adding is not completed, it doesn't get rid of all non-number char in
>>cell. We need fixed that.
>>
>>I also try to run those queries in Oracle in ecoinfo. But I got a
>>"ORA-01722: invalid number" error. Durane got the same problem in
>>numberica search. Those prove our metacat query couldn't remove all
>>non-number character too.
>>
>>It seems posgres can ingore non-number character in cell during
>>string-number casting, but it is very expensive. Oracle just throw a sql
>>excpeiton.
>>
>>Hope this is helpful.
>>
>>Jing
>>
>>
>>
>>    
>>
>
>_______________________________________________
>seek-dev mailing list
>seek-dev at ecoinformatics.org
>http://www.ecoinformatics.org/mailman/listinfo/seek-dev
>  
>


-- 
Rod Spears
Biodiversity Research Center
University of Kansas
1345 Jayhawk Boulevard
Lawrence, KS 66045, USA
Tel: 785 864-4082, Fax: 785 864-5335

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-dev/attachments/20040924/222714ca/attachment.htm


More information about the Seek-dev mailing list