[seek-dev] (no subject)

Arcot Rajasekar sekar at sdsc.edu
Fri Sep 24 13:11:30 PDT 2004


Bing
	In SRB,  you can deal with numeric data comparison by using
   "num >" etc and it convert to numeric and do the comparison.
	the same can be done in SEEK also.

	raja

On Fri, 24 Sep 2004, Bing Zhu wrote:

> I believe we have the similar issue in SRB when I tried to compose more
> complicated queries for WhyWhere data and had problem dealing with 'integer'
> type meta data.
>
> Note the following condition example which a typical query part in our
> query document.
>
>    <condition operator="LESS THAN OR EQUALS"
> concept="northBoundingCoordinate">31.5</condition>
>
> The missing piece of information to translate it into underneath data
> management
> system such as SRB is the attribute "datatype". If we can have this
> information
> in our query document, the problem of "northBoundingCoordinate<=31.5" can be
> avoided.
>
>
> Bing
>
>
>
>
>
> -----Original Message-----
> From: seek-dev-admin at ecoinformatics.org
> [mailto:seek-dev-admin at ecoinformatics.org]On Behalf Of Jing Tao
> Sent: Thursday, September 23, 2004 7:49 PM
> To: jones at nceas.ucsb.edu
> Cc: seek-dev at ecoinformatics.org
> Subject: [seek-dev] (no subject)
>
>
> Hi, matt, steve and chris:
>
> This afternoon I played a little bit with the query steve submit to
> ecogrid node and found something about number and string.
>
> Metacat stores the metadata in "nodedata" field in xml_nodes table. The
> data type for "nodedata" is char. If you want to compare a number in
> query, db should do casting.
>
> basically, steve's query will look like:
> AND
>  condition1 northBoundingCoordinate<=31.5
>  condition2 southBoundingCoordinate>=10.0
>  condition3 surname=joye
> AND
>
> In metacat log, the time consumer is the main query, taking 271 seconds
> (extened  query is trivial, only take 3 seconds). I post the main query
> to sql client, it took 267.4 seconds to run. So the result is almost same.
>
> Then I did some other tests and try to figure it out. Here is some testing:
>
> Query 1: northBoundingCoordinate<=31.5
>
> SQL query look like(simplied, only show different part to quey 2,3):
> SELECT DISTINCT docid FROM xml_nodes WHERE nodedata <= 31.5   AND
> parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE
> 'geographicCoverage/boundingCoordinates/northBoundingCoordinate'
>
> Time: 236 seconds(No too much faster than whole query) appro 100 documents
>
>
> Query 2: southBoundingCoordinate>=10.0
> SQL query look like(simplied, only show differenc to query 1, 3)
>  SELECT DISTINCT docid FROM xml_nodes WHERE nodedata >= 10.0  AND
> parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE
> 'geographicCoverage/boundingCoordinates/southBoundingCoordinate')
>
> Time: 114.9 seconds appro 100 docs
>
> Query 3:
> Condition surname=joye
>
> SQL query looks like(only show different part to query 1, 2)
> SELECT DISTINCT docid FROM xml_nodes WHERE UPPER(nodedata) = 'JOYE' AND
> parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE
> '/eml/dataset/creator/individualName/surName'
>
> Time: 2.8 seconds(Much faster) 4 docs
>
> Oh, Third one(2.8 secs) is much faster than first one(236 secs) and second
> one(114.9). Postgres have some problem in casting :)
>
> I also run following query:
>
> Query 4: northBoundingCoordinate<=31.5
> The condition is as same as query 1, but the sql query are little bit
> different. It add something to get rid of non-number cells (this is
> metacat does now):
> SELECT DISTINCT docid FROM xml_nodes WHERE nodedata <= 31.5  AND
> UPPER(nodedata) = LOWER(nodedata) AND LTRIM(nodedata) != ' '  AND nodedata
> IS NOT NULL AND parentnodeid IN (SELECT nodeid FROM xml_index WHERE path
> LIKE 'geographicCoverage/boundingCoordinates/northBoundingCoordinate'
>
> Time: 247.1 seconds(Almost to query 1 - 236 seconds)
>
>
> Query 5: southBoundingCoordinate>=10.0
> The condition is as same as query 2, but the sql query are little bit
> different. It add some thing got get rid of nonnumber of cells.
>
> SELECT DISTINCT docid FROM xml_nodes WHERE nodedata >= 10.0 AND
> UPPER(nodedata) = LOWER(nodedata)  AND nodedata IS NOT NULL AND
> LTRIM(nodedata) != ' ' parentnodeid IN (SELECT nodeid FROM xml_index WHERE
> path LIKE
> 'geographicCoverage/boundingCoordinates/southBoundingCoordinate'
>
> Time: 14.1 seconds (much faster than query 2 - 114.9 secs)
>
> The difference between query 2 and query 5 is, query 5 gets rid of
> non-number cell in search space. And query 5(14 secs) is much faster than
> query 2(114 secs). So it seems progres has a hard time to ignore
> non-number char during the casting(from char to number).
>
> But why query 4 (247 seconds) and query 1(236 seconds) no big difference
> even we add something to get rid of nonnumber cell in query 4? I guess the
> our adding is not completed, it doesn't get rid of all non-number char in
> cell. We need fixed that.
>
> I also try to run those queries in Oracle in ecoinfo. But I got a
> "ORA-01722: invalid number" error. Durane got the same problem in
> numberica search. Those prove our metacat query couldn't remove all
> non-number character too.
>
> It seems posgres can ingore non-number character in cell during
> string-number casting, but it is very expensive. Oracle just throw a sql
> excpeiton.
>
> Hope this is helpful.
>
> Jing
>
>
> --
> Jing Tao
> National Center for Ecological
> Analysis and Synthesis (NCEAS)
> 735 State St. Suite 204
> Santa Barbara, CA 93101
>
> _______________________________________________
> seek-dev mailing list
> seek-dev at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/seek-dev
>
> _______________________________________________
> seek-dev mailing list
> seek-dev at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/seek-dev
>




More information about the Seek-dev mailing list