[seek-dev] (no subject)

Shawn Bowers bowers at sdsc.edu
Fri Sep 24 07:15:30 PDT 2004


Just FYI: it is usually much better to write a join than a nested subquery
if you want the query to run fast.  Most query optimizers don't do very
well at optimizing nested subqueries ... so you might try writing the
joins below as explicit joins.

shawn

On Thu, 23 Sep 2004, Jing Tao wrote:

> Hi, matt, steve and chris:
>
> This afternoon I played a little bit with the query steve submit to
> ecogrid node and found something about number and string.
>
> Metacat stores the metadata in "nodedata" field in xml_nodes table. The
> data type for "nodedata" is char. If you want to compare a number in
> query, db should do casting.
>
> basically, steve's query will look like:
> AND
>  condition1 northBoundingCoordinate<=31.5
>  condition2 southBoundingCoordinate>=10.0
>  condition3 surname=joye
> AND
>
> In metacat log, the time consumer is the main query, taking 271 seconds
> (extened  query is trivial, only take 3 seconds). I post the main query
> to sql client, it took 267.4 seconds to run. So the result is almost same.
>
> Then I did some other tests and try to figure it out. Here is some testing:
>
> Query 1: northBoundingCoordinate<=31.5
>
> SQL query look like(simplied, only show different part to quey 2,3):
> SELECT DISTINCT docid FROM xml_nodes WHERE nodedata <= 31.5   AND
> parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE
> 'geographicCoverage/boundingCoordinates/northBoundingCoordinate'
>
> Time: 236 seconds(No too much faster than whole query) appro 100 documents
>
>
> Query 2: southBoundingCoordinate>=10.0
> SQL query look like(simplied, only show differenc to query 1, 3)
>  SELECT DISTINCT docid FROM xml_nodes WHERE nodedata >= 10.0  AND
> parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE
> 'geographicCoverage/boundingCoordinates/southBoundingCoordinate')
>
> Time: 114.9 seconds appro 100 docs
>
> Query 3:
> Condition surname=joye
>
> SQL query looks like(only show different part to query 1, 2)
> SELECT DISTINCT docid FROM xml_nodes WHERE UPPER(nodedata) = 'JOYE' AND
> parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE
> '/eml/dataset/creator/individualName/surName'
>
> Time: 2.8 seconds(Much faster) 4 docs
>
> Oh, Third one(2.8 secs) is much faster than first one(236 secs) and second
> one(114.9). Postgres have some problem in casting :)
>
> I also run following query:
>
> Query 4: northBoundingCoordinate<=31.5
> The condition is as same as query 1, but the sql query are little bit
> different. It add something to get rid of non-number cells (this is
> metacat does now):
> SELECT DISTINCT docid FROM xml_nodes WHERE nodedata <= 31.5  AND
> UPPER(nodedata) = LOWER(nodedata) AND LTRIM(nodedata) != ' '  AND nodedata
> IS NOT NULL AND parentnodeid IN (SELECT nodeid FROM xml_index WHERE path
> LIKE 'geographicCoverage/boundingCoordinates/northBoundingCoordinate'
>
> Time: 247.1 seconds(Almost to query 1 - 236 seconds)
>
>
> Query 5: southBoundingCoordinate>=10.0
> The condition is as same as query 2, but the sql query are little bit
> different. It add some thing got get rid of nonnumber of cells.
>
> SELECT DISTINCT docid FROM xml_nodes WHERE nodedata >= 10.0 AND
> UPPER(nodedata) = LOWER(nodedata)  AND nodedata IS NOT NULL AND
> LTRIM(nodedata) != ' ' parentnodeid IN (SELECT nodeid FROM xml_index WHERE
> path LIKE
> 'geographicCoverage/boundingCoordinates/southBoundingCoordinate'
>
> Time: 14.1 seconds (much faster than query 2 - 114.9 secs)
>
> The difference between query 2 and query 5 is, query 5 gets rid of
> non-number cell in search space. And query 5(14 secs) is much faster than
> query 2(114 secs). So it seems progres has a hard time to ignore
> non-number char during the casting(from char to number).
>
> But why query 4 (247 seconds) and query 1(236 seconds) no big difference
> even we add something to get rid of nonnumber cell in query 4? I guess the
> our adding is not completed, it doesn't get rid of all non-number char in
> cell. We need fixed that.
>
> I also try to run those queries in Oracle in ecoinfo. But I got a
> "ORA-01722: invalid number" error. Durane got the same problem in
> numberica search. Those prove our metacat query couldn't remove all
> non-number character too.
>
> It seems posgres can ingore non-number character in cell during
> string-number casting, but it is very expensive. Oracle just throw a sql
> excpeiton.
>
> Hope this is helpful.
>
> Jing
>
>
>




More information about the Seek-dev mailing list