[seek-dev] Query deserialization does not work as designed.

Kevin Ruland kruland at ku.edu
Thu Sep 15 08:20:56 PDT 2005

Hi all,

After looking at the query.xsd and examining the hand coded typeDesc for
the ANDType object, I was suspicious that the deserializer does not work
as inteneded.  The problem is the query.xsd specifies (correctly
according to standards):

    <xs:complexType name="ANDType">
            <xs:documentation>A type of logical operator that requires
all of
            its child conditions to evaluate to true in order for the whole
            clause to evaluate to true.</xs:documentation>
        <xs:choice maxOccurs="unbounded">
            <xs:element name="AND" type="ANDType"/>
            <xs:element name="OR" type="ORType"/>
            <xs:element name="condition" type="ConditionType"/>

ORType is similarly defined.

However, Apache Axis has not been able to process xsd:choice or
xsd:sequence elements with maxOccurs>1.  This has been an outstanding
bug in Axis since the dawn of time (2003 anyway).  Even the newest
version of axis (1.2.1) does not process these correctly.

The hand coded type description for AND type is identical to that
generated by axis when provided with the following schema:

    <xs:complexType name="ANDType">
            <xs:documentation>A type of logical operator that requires
all of
            its child conditions to evaluate to true in order for the whole
            clause to evaluate to true.</xs:documentation>
            <xs:element name="AND" type="ANDType" maxOccurs="unbounded"/>
            <xs:element name="OR" type="ORType maxOccurs="unbounded""/>
            <xs:element name="condition" type="ConditionType

Which made me think that the deserializer does not accept every message
which conforms to the schema.  In order to test this, I pushed messages
through my test digir server and printed out the QueryType objects
passed into it.  I envoked the query service using this query:

   <query xmlns="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1"
          queryId="query-digir.1.1" system="http://knb.ecoinformatics.org">
    <namespace prefix="darwin"
    <returnfield xmlns="">/Species</returnfield>
    <returnfield xmlns="">/ScientificName</returnfield>
    <returnfield xmlns="">/Collector</returnfield>
    <returnfield xmlns="">/YearCollected</returnfield>
    <returnfield xmlns="">/InstitutionCode</returnfield>
    <returnfield xmlns="">/CollectionCode</returnfield>
    <returnfield xmlns="">/CatalogNumber</returnfield>
    <returnfield xmlns="">/CatalogNumberText</returnfield>
    <returnfield xmlns="">/DecimalLatitude</returnfield>
    <returnfield xmlns="">/DecimalLongitude</returnfield>
    <title xmlns="">mephitis macroura Query</title>
    <AND xmlns="">
     <condition concept="ScientificName" operator="LIKE"
     <OR xmlns="">
       <condition concept="ScientificName" operator="LIKE"
       <AND xmlns="">
         <condition concept="ScientificName" operator="LIKE"
       <condition concept="ScientificName" operator="LIKE"
     <AND xmlns="">
       <condition concept="ScientificName" operator="LIKE"
     <condition concept="ScientificName" operator="LIKE"

(Don't ask my why there are all the xmlns="" attributes.  I figured out
this was what was required by sniffing the port 8080 traffic from a
known good query from kepler quick search... A whole other story...)

What the service saw was this:

[09/15/2005 10:13:27:949 ]
org.ecoinformatics.ecogrid.digir.impl.DigirProxyImpl [query:454] DEBUG:
EcogridQueryTransformer result: <query queryId="query-digir.1.1"
  <title>mephitis macroura Query</title>
      <condition operator="LIKE" concept="ScientificName">5</condition>
        <condition operator="LIKE" concept="ScientificName">3</condition>
      <condition operator="LIKE" concept="ScientificName">4</condition>
    <condition operator="LIKE" concept="ScientificName">6</condition>

Notice that conditions 1 & 2 are missing.  If these clauses are removed
from the source message, then it satisfies the revised schema.

What does this mean?  Even though the query.xsd is what is desired, the
services as implemented cannot process every message satisfying this
schema.  We have two options now either we adopt the modified schema and
use axis generated code directly, or we write custom deserializers which
function correctly.  Even though the revised schema is not as
expressive, it does still capture the same logical conditions because of
the associativity of AND and OR.  My preference is to accept this
limitation in axis and use the simplified schema.  This ends up
resulting in less code to maintain and we can rely on generated stubs
and not cram them into the repository.


More information about the Seek-dev mailing list