[seek-dev] EcoGrid Query and problems with pubDate

Matt Jones jones at nceas.ucsb.edu
Tue Oct 19 10:33:55 PDT 2004


Steve,

Yeah, I see your point.  pubDate in EML can either be a year, or a date. 
  The year could be viewed as a lower precision version of a date-time 
value (which it is), so at least theoretically there should be no 
problems doing the comparison.  However, in practice, there may be 
various casting and roundoff errors that we'll need to look into.  I'm 
sure this is a metacat-specific issue in that its related to how metacat 
stores all values as strings, rather than in their native type (based on 
their xsd type), and that the translation to a database type is 
complicated.  In addition, its unlikely that the database will support 
allof the types in XSD, much less derived types such as 'yearDate' as 
defined in EML, so we're going to need to find a way around this whole 
issue.

Matt

Steve Tekell wrote:
> My concern isn't simpy that pubDate wasn't being properly type casted, but
> that it's type was ambiguous and couldn't be cast to anything other than a
> string without creating some problems.  
> 
> 
> 
>>-----Original Message-----
>>From: Matt Jones [mailto:jones at nceas.ucsb.edu] 
>>Sent: Thursday, October 14, 2004 4:13 PM
>>To: Steve Tekell
>>Cc: seek-dev at ecoinformatics.org; 'Duane Costa'; Saurabh Garg
>>Subject: Re: [seek-dev] EcoGrid Query and problems with pubDate
>>
>>Steve,
>>
>>My (educated) guess is that Metacat isn't casting the values 
>>properly to dates. THis is closely related to the casting 
>>problems you've been having with latitude and longitude 
>>values.  Somebody at NCEAS, probably Sid, will be looking 
>>into these issues for Metacat.  We'll be simultaneously 
>>trying to deal with the query performance issues which are 
>>closely related.
>>
>>Matt
>>
>>Steve Tekell wrote:
>>
>>>There seems to be some potential problems with pubDate and 
>>
>>maybe other
>>
>>>dates.
>>>
>>>eml/dataset/pubDate 
>>>can be a year (2002) or a date (2002-06-21)
>>>(maybe year+month and no day is ok, too, but I'll ignore 
>>
>>that case for now)
>>
>>>So, for EML, the datatype of the field is ambiguous.
>>>(I haven't begun to look into the other schemas that I'll 
>>
>>be searching yet).
>>
>>>If a user enters for start date 2002-01-01, it generates 
>>
>>the condition
>>
>>><condition operator="GREATER THAN OR EQUALS"
>>>concept="/eml/dataset/pubDate">2002-01-01</condition>
>>>
>>>which logically should return anything published in 2002 or 
>>
>>later, but it
>>
>>>doesn't.  Items with pubDate=2002 won't be returned.  I 
>>
>>assume if an item
>>
>>>had a pubDate of 2002-06-21 it would be returned, but I am 
>>
>>only getting
>>
>>>items where the pubDate is a Year instead of Date.
>>>
>>>I am guessing that it's doing a String compare on pubDate.  
>>
>>Whereas maybe
>>
>>>collection date is actually comparing dates.  Collection 
>>
>>date searches all
>>
>>>time out like the geographic boundary searches since it's 
>>
>>storing everything
>>
>>>as strings and doing type conversions on the fly.
>>>
>>>I guess one solution is for me to cripple the app to only 
>>
>>allow pubDate to
>>
>>>be a Year instead of Date and treat it separately from 
>>
>>collection dates.
>>
>>>However, this is a EML/Metacat specific solution.  If other 
>>
>>schemas store
>>
>>>pubDate as a Date, then using only Year could cause other 
>>
>>problems (invalid
>>
>>>input).
>>>
>>>
>>>I put up a build, a snapshot of my work in progress, on my 
>>
>>dev server so
>>
>>>that you can see this problem as well as see the various performance
>>>problems.  
>>>http://lternet-163.lternet.edu:8080/ecogrid/query.jsp 
>>>Remember this is just an early stage test app.
>>>The results screen currently shows the execution time for 
>>
>>the EcoGrid client
>>
>>>query as well as the generated Query XML.  So you can grab 
>>
>>the XML of
>>
>>>queries that timeout and do other tests.
>>>
>>>Steve
>>>
>>>
>>>_______________________________________________
>>>seek-dev mailing list
>>>seek-dev at ecoinformatics.org
>>>http://www.ecoinformatics.org/mailman/listinfo/seek-dev
>>
>>-- 
>>-------------------------------------------------------------------
>>Matt Jones                                     jones at nceas.ucsb.edu
>>http://www.nceas.ucsb.edu/    Fax: 425-920-2439    Ph: 907-789-0496
>>National Center for Ecological Analysis and Synthesis (NCEAS)
>>University of California Santa Barbara
>>Interested in ecological informatics? http://www.ecoinformatics.org
>>-------------------------------------------------------------------
>>

-- 
-------------------------------------------------------------------
Matt Jones                                     jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Fax: 425-920-2439    Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
University of California Santa Barbara
Interested in ecological informatics? http://www.ecoinformatics.org
-------------------------------------------------------------------



More information about the Seek-dev mailing list