This commit is contained in:
sougou 2016-01-11 14:29:15 -08:00
Родитель c1d550c602
Коммит a9f5b5c4ad
1 изменённых файлов: 38 добавлений и 54 удалений

Просмотреть файл

@ -226,17 +226,11 @@ limit 10
The steps performed by the SQL engine are as follows:
1. Scan t (FROM)
2. Filter rows where c=10 (WHERE)
3. Remove all columns other than a & b, and compute a+1 (SELECT)
4. Count and group unique a+1, b values (GROUP BY)
5. Filter counts that are > 10 (HAVING)
6. order the rows by descending count (ORDER BY)
7. Return the top 10 results (LIMIT)
The one thing thats not obvious in a SELECT is that the select list is not the first clause thats executed. Its the FROM.
@ -245,25 +239,15 @@ The one thing thats not obvious in a SELECT is that the select list is not th
If a SELECT statement was used just to fetch results, things would be very simple. However, it can be used in other parts of a query, and can produce 5 kinds of values:
1. A simple value:
1. A simple value: `select * from t where id = (select 1 from dual)`
`select * from t where id = (``_select 1 from dual_``)`
2. A value tuple: multiple values of the same type: `select * from t1 where id in (select val from t2)`
2. A value tuple: multiple values of the same type:
3. A row tuple: multiple values of different types. A row tuple can also be used as a value: `select * from t1 where (a,b) = (select b,c from t2 where pk=1)`
`select * from t1 where id in (``_select val from t2_``)`
4. Rows: results of a regular query. Note that rows can be treated as a value tuple of row tuples. For example, this statement is valid: `select * from t1 where (a,b) in (select b,c from t2)`
3. A row tuple: multiple values of different types. A row tuple can also be used as a value:
`select * from t1 where (a,b) = (``_select b,c from t2 where pk=1_``)`
4. Rows: results of a regular query. Note that rows can be treated as a value tuple of row tuples. For example, this statement is valid:
`select * from t1 where (a,b) in (``_select b,c from t2_``)`
5. A virtual table: If you dont strip out the field info from the original query, then a select can act as a table, where the field names act as column names. For example, this statement is valid:
`select * from (``_select a, b from t2_``) as t where t.a=1`
5. A virtual table: If you dont strip out the field info from the original query, then a select can act as a table, where the field names act as column names. For example, this statement is valid: `select * from (select a, b from t2) as t where t.a=1`
This opens the door to some powerful expressibility, the other edge of the sword being indefinite complexity.
@ -586,7 +570,7 @@ Now that all operators are defined, we have to see if theyre complete. Let
```
01: (select count(*), ta1.col+1, tb1.col
02: from ta1 join tb1 on ta1.id=tb1.id
03: where ta1.``c``ol2=10 and ta1.col2 > any
03: where ta1.col2=10 and ta1.col2 > any
04: (select c from tc1 where tc1.id=ta1.id)
05: group by ta1.col+1, tb1.col
06: having count(*) > 10
@ -610,7 +594,7 @@ The above query can be represented as a single expression. But lets start wit
04: SubqueryCScanResult = Select(
Filter(
Scan(tc1),
tc1.id=``JoinResult1.ta1.id``,
tc1.id=JoinResult1.ta1.id,
),
tc1.c)
03: FilterResult1 = Filter(
@ -628,7 +612,7 @@ The above query can be represented as a single expression. But lets start wit
10: SelectResult2 = Select(FilterResult2, (1, ta2.col+1, tb2.col))
13: Result2 = Aggregate(SelectResult2, count(1))
// union
09: UnionResult = ``Aggregate``(Merge(Result1, Result2), NULL)
09: UnionResult = Aggregate(Merge(Result1, Result2), NULL)
14: OrderResult = Sort(UnionResult, (1 desc))
15: Result = Limit(OrderResult, 0, 5)
```
@ -646,7 +630,7 @@ Result = Limit(
Aggregate(
Select(
Filter(
``JoinResult1`` = InnerJoin(
JoinResult1 = InnerJoin(
Scan(ta1),
Scan(tb1),
ta1.id=tb1.id,
@ -655,7 +639,7 @@ Result = Limit(
Select(
Filter(
Scan(tc1),
tc1.id=``JoinResult1``.ta1.id,
tc1.id=JoinResult1.ta1.id,
),
tc1.c,
),
@ -1061,16 +1045,16 @@ Also, the inner query cannot access any columns of the outer t2 because it has i
A corollary: since subquery nesting can have many levels, an inner query can still reference results from any of the outer scopes. For example, this is a valid query:
```
select`` ``t1.a``, t1.b
select t1.a, t1.b
from t1
where t1.id in (
``select`` t2.id
select t2.id
from t2
where t2.id in (
select t3.id
from t3
where t3.id = ``t1.a
``)
where t3.id = t1.a
)
)
```
@ -1086,7 +1070,7 @@ The result of such a subquery will be a new virtual table with its own vindex co
select sq.a, t2.b
from
(select a from t1 where id = :id) as sq
join t2` on`` t2.b = sq.a
join t2 on t2.b = sq.a
```
The main symbol table will look like this:
@ -1120,11 +1104,11 @@ The symbol table for the `from t1` subquery will be in its own scope. Its
*Conceptually speaking, theres no strong reason to prevent the FROM clause from referencing columns of an outer query. MySQL probably disallows this because it leads to confusing scoping rules. Lets take a look at this example:*
```
select id from ``a`` where id in (
select id from a where id in (
select b.id, a.id
from
(select id from b where b.id=``a.id``) b
join ``d as a`` on ``a.id``=b.id)
(select id from b where b.id=a.id) b
join d as a on a.id=b.id)
```
*If we allowed an outer scope, then b.id=a.id becomes ambiguous. We dont know if it should refer to the outer a, or the one that is yet to be created as part of the current scope. Needless to say, one shouldnt be writing such unreadable queries.*
@ -1133,11 +1117,11 @@ select id from ``a`` where id in (
By the fact that the ON clause is part of a FROM clause, you cannot refer to an element of an outer query. This is an invalid query:
`select id from a ``where id in (select b.id from b join c on b.id=``a.id``)`
`select id from a where id in (select b.id from b join c on b.id=a.id)`
But this is valid:
`select id from a where id in (select b.id from b join c where b.id=``a.id``)`
`select id from a where id in (select b.id from b join c where b.id=a.id)`
Furthermore, you can only refer to elements that participate in the current JOIN. But the following query is still valid:
@ -1147,7 +1131,7 @@ Because of associativity rules, c is actually joined with a join b.
But this query is not valid:
`select * from a join ``(b join c on c.id = a.id)`
`select * from a join (b join c on c.id = a.id)`
The ON clause participants are only b and c because of the parenthesis. So, youre not allowed to reference a.id in that join. In other words, analysis of the ON clause requires a list of participants. The analyzer will need to check with this list before looking inside the symbol table, or we could exploit Go interfaces here by sending a subset symbol table that has the same interface as the original one.
@ -1229,7 +1213,7 @@ For the sake of simplicity, well not allow dependencies to use non-standard c
This should instead be rewritten as:
`select * from (select a, count(*) ``c`` from t1) t where ``t.c`` > 0`
`select * from (select a, count(*) c from t1) t where t.c > 0`
This way, we can build a bind variable like `:_t_c` for the outer query if we have split the above query into two.
@ -1421,11 +1405,11 @@ But left joins are slightly different, because the join condition is applied *to
is same as:
`a LeftJoin (b LeftJoin c where c.id=b.id) ``where c.id=a.id``)`
`a LeftJoin (b LeftJoin c where c.id=b.id) where c.id=a.id)`
But its not the same as:
`a LeftJoin (b left join c where c.id=b.id ``and c.id=a.id``)`
`a LeftJoin (b left join c where c.id=b.id and c.id=a.id)`
If the above joins were normal joins, then all three would be equivalent.
@ -1525,16 +1509,16 @@ At the end of the analysis, the output is a symbol table and a reduced tree repr
Heres an example:
`a join b on b.id=a.id ``join c on c.id=b.id2 join d on d.id2=c.id2 ``join e on e.id=a.id`
`a join b on b.id=a.id join c on c.id=b.id2 join d on d.id2=c.id2 join e on e.id=a.id`
will be represented as:
```
``J2
J2
/ \
``J1`` e ``_where e.id=a.id_
J1 e where e.id=a.id
/ \
a join b on b.id=a.id ``c join d on d.id2=c.id2 ``_where c.id=b.id2_
a join b on b.id=a.id c join d on d.id2=c.id2 where c.id=b.id2
```
### WHERE clause
@ -1559,17 +1543,17 @@ the ON clause is processed, which is not until `LeftJoin` returns the results.
*There are some practical use cases where youd want to add conditions in the WHERE clause, typically for NULL checks. To support this in the future, we can use this technique that converts a LEFT JOIN into a normal JOIN:*
`_from a left join b on b.id=a.id where b.id is NULL_`
`from a left join b on b.id=a.id where b.id is NULL`
*After split, the query on b should have become:*
`_from b where b.id=a.id_`
`from b where b.id=a.id`
*We cannot push down the null check in the above case. But we can, if we rewrote it as:*
`_from (select 1) as t left join b on b.id=a.id _``_where b.id is NULL_`
`from (select 1) as t left join b on b.id=a.id where b.id is NULL`
*The (select 1) acts as a fake driver row for the left join, which allows us to tack on the where clause and also change the` `*_LeftJoin_`* to a `*_Join_`*.*
*The (select 1) acts as a fake driver row for the left join, which allows us to tack on the where clause and also change the `LeftJoin` to a `Join`.*
###### Subqueries
@ -1588,9 +1572,9 @@ If a, b and c where in different groups, the output would be:
```
J
/ \
J c where (c.id=``b.id``) and ``(cond2(a.col, c.col))
J c where (c.id=b.id) and (cond2(a.col, c.col))
/ \
a b where (b.id=``a.id``) and ``(cond1(a.col, b.col))
a b where (b.id=a.id) and (cond1(a.col, b.col))
```
The cond2 expression gets pushed into the the where clause for table c because its the right-most group thats referenced by the condition. External references will be changed to appropriate bind variables by the rewiring phase.
@ -1618,9 +1602,9 @@ If an expression references columns from more than one group, then the expressio
*can be resolved with the following join*
```
_select a.col, a.bar from a // produce _a_col & _a_bar_
_Join_
_select :_a_col+b.col from b where b.foo=:_a_bar_
select a.col, a.bar from a // produce _a_col & _a_bar
Join
select :_a_col+b.col from b where b.foo=:_a_bar
```
*However, such queries are not very efficient for VTTablet. It cant cache field info, or use rowcache. Such push-downs dont benefit MySQL either. So, its better to wait till VTGate implements the ability to evaluate expressions.*
@ -1771,7 +1755,7 @@ In order to resolve dependencies correctly, we need the symbol tables that were
1. Change the external reference to the corresponding bind var name.
2. If that column is not already in the SELECT list of the other group, we add it.
2. If that column is not already in the select list of the other group, we add it.
3. Mark the column number to be exported as the bind var name that we chose.