[seek-kr-sms] Re: Comments on Shawn Bowers [Notes from VLDB] message posted on Sept 06, 2004

Fri Oct 8 16:20:17 PDT 2004

Dear Shawn,

We accept your apologies. We didn't take your critiques personally and
certainly we appreciate your interest in our work, but posting the comments
online was not what we would have expected (one may accidentally take a look
and get a complete wrong impression about our work). I would like to take a
moment to discuss some of your remarks.

> As another note on deteriming the overhead, it seems like you could 
> actually make a stronger statement concerning overhead than the 
> experimental estimates. In particular, it seems like you can give a 
> fairly precise characterization of the overhead because you know 
> precisely the SQL-part of a pSQL query, and you know precisely how the 
> pSQL queries are rewritten into equivalent SQL expresssions. It 
> probably gets somewhat complex with query optimization, but perhaps 
> that could be factored out of the equation.  Just a thought.

You might be right. In fact, we do know the precise number of SQL
(sub)queries that are ultimately sent to the SQL engine for each pSQL query
that is posed by the user. So we do know the exact overhead in this sense.
However, as you have observed, it gets messy with query optimization. Any
reasonable characterization of overhead would necessarily involve
understanding the query plans that are generated by the query engine. It was
not our goal, at least not yet, to provide an exact characterization of the
overhead in this sense.

> My comments on the "accuracy" of your approach are a bit muddled. As I 
> said, I don't think these details change your results (i.e., I wasn't 
> arguing that your implementation is incorrect). I am just confused a 
> bit about your formalization.
> 
> Where I saw, and still see, a problem is in the way the conjunctive 
> queries are used in the formalization. My thinking is described below.
> These comments are all based on Section 3 of your paper.
> 
> Given a default-all pSQL query Q, the rewriting of Q to an equivalent 
> SQL query (that "correctly" propogates annotations) B(Q) consists of 
> two
> steps: (1) computing the "representative" query Q0 of Q, (2) computing 
> the "auxilliary" queries Q1 ... Qn. This rewriting you call 
> "generate-query-basis".
> 
> My interpretation of the "representative" query Q0 is that for each 
> tuple produced by the query, each value of the tuple has all of the 
> annotations from the particular values used to generate the result.
> 
> Your example, in the CQ notation, is:
> 
> A0(x,y,z) :- Mapping_Table(w,x,u,v), SWISS-PROT(x,z), PIR(u,y).
> 
> So, e.g., for a tuple in A0, the x-value of the tuple takes 
> annotations from the corresponding x-value of the Mapping_Table AND 
> the corresponding x-value of the SWISS-PROT table that were used to 
> generate the particular result.

You are correct in the sense that yes, this is the interpretation of the
representative query. Note however, that this interpretation is not
particular to the representative query. In fact, this is the interpretation
that we give in general to CQ rules that propagate annotations. For example,
in the CQ rule Ans(x):- R(x,y),R(x,z), the x in the answer takes annotations
from the x's in both the R subgoals, according to our interpretation (please
refer to the second paragraph, page 7 in our paper). This modified semantics
for CQ rules with annotations are discussed in detail in reference [24].

> The "auxilliary" queries go one step further, by also including all 
> the annotations for the same x-values in the Mapping_Table and the 
> SWISS-PROT table.

Yes, this is correct.

> For example, the following may be the facts that "generated" a 
> particular A0 result tuple <5,0,0>.
> 
> Mapping_Table(0,5,0,0), SWISS-PROT(5,0), PIR(0,0).
> 
> But, there may also be a tuple/fact Mapping_Table(1,5,1,1), where 
> there is a different annotation for this occurrence of the value 5.
> The goal of the "auxilliary" queries is to also include this 
> annotation in the result <5,0,0>. (This was what I meant by "scoop-up"
> before.)
> 
> Assuming this is an accurate interpretation of the "auxilliary" 
> queries, here is where I see a problem.
> 
> You then show that you generate a number of auxilliary CQ queries, 
> e.g., here is the one you give for the above example.
> 
> A1(x,y,z) :- Mapping_Table(w,x,u,v), SWISS-PROT(x,z), PIR(u,y),
>              Mapping_Table(w1,x,w2,w3).
> 
> I understand that this query is an equivalent query to A0. In fact, 
> that is where I think the problem comes in with your formalization.
> 
> Semantically, the query above states:
> 
> (FORALL x,y,z) [A1(x,y,z) =>
>   (EXISTS w,u,v,w1,w2,w3)
>   Mapping_Table(w,x,u,v) & SWISS-PROT(x,z) & PIR(u,y) &
>   Mapping_Table(w1,x,w2,w3)]
> 
> In words, for every x, y, and z, A1(x,y,z) holds if there exists a w, 
> u, v, w1, w2, and w3 such that Mapping_Table(w,x,u,v), 
> SWISS-PROT(x,z), PIR(u,y), and Mapping_Table(w1,x,w2,w3) also holds.
> Clearly, if the first mapping table formula is satisfied, then the second
is as well.
> 
> Thus, because of the existential quantifiers over w1, w2, and w3, 
> there is no implied "iteration" over these values for the second 
> mapping table formula. Hence, they are not necessarily "collected" by 
> the rule, as shown in the first-order semantics of the rule.

According to our modified semantics, the auxiliary CQ rules do collect these
annotations that might have been missed by the representative query.

> Basically, I might be being dense, but I don't see how in your 
> formalization these auxilliary rules perform the desired result of the 
> "generate-query-basis" as stated, without giving a different 
> interpretation to CQ rules.  I think on the SQL side, the rewritings 
> make sense because you are adding the _a columns and so on...

Our CQ rules are given a different interpretation. That's precisely the
point.

> As a side note: I think the "representative" query Q0 could also make 
> a useful propogation scheme in itself, and actually begs the question 
> of the purpose of the original annotations.  For example, if one gives 
> an annotation to a value v in column c in one tuple, and in a 
> different tuple a different annotation to v in c, what is the reason a 
> user made, or a database contains, these distinct annotations for the 
> same value?  It could just be a mistake, or possibly the data was 
> "unioned" from multiple sources. But could it also be that the tuple 
> denotes some type of context in which the annotation is applicable? If 
> so, it might not be appropriate to go further and do the auxilliary 
> part ... From your experience looking at annotations, why did you find 
> the proprogate-all a more natural semantics for interpreting the purpose
of the annotations?

Yes, that's why we also allow the user to specify the propagation scheme
(the "effect" of the representative query can be easily obtained with a user
defined propagation scheme). The default scheme is designed with the
intention of propagating annotations based on where data is copied from and
the default-all scheme is designed to give users an invariant interpretation
of a pSQL query among all equivalent queries. As you have correctly pointed
out, there could be many other schemes, probably application-dependent but
our schemes are designed without any specific applications in mind. That is,
we wanted a general-purpose annotation management system that does not
depend on specific applications.

I am curious why you are interested in our work. Are you planning to apply
any of these ideas to your research? Please do not hesitate to let us know
if you have anymore questions.

Thank you,
Laura