Merge pull request #2583 from jf205/advanced-ql

CodeQL documentation: reorganize 'Advanced QL' topics
2020-01-03 16:02:28 +00:00 · 2020-01-03 16:02:28 +00:00 · 9b9d7121e8
--- a/docs/language/learn-ql/advanced/abstract-classes.rst
+++ b/docs/language/learn-ql/advanced/abstract-classes.rst
@ -1,110 +0,0 @@
-Semantics of abstract classes
-=============================
-
-Concrete classes
----------------
-
-Concrete classes, as described in the QL language handbook topic on `Classes <https://help.semmle.com/QL/ql-handbook/types.html#classes>`__, lend themselves well to top-down modeling. We start from general superclasses representing large sets of values, and carve out individual subclasses representing more restricted sets of values.
-
-A classic example where this approach is useful is when modeling ASTs (Abstract Syntax Trees): the node types of an AST form a natural inheritance hierarchy, where, for example, there is a class ``Expr`` representing all expression nodes, with many different subclasses for different categories of expressions. There might be a class ``ArithmeticExpr`` representing arithmetic expressions, which in turn could have subclasses ``AddExpr`` and ``SubExpr``.
-
-Each value in a concrete class satisfies a particular logical property - the *characteristic predicate* (or *character* for short) of that class. This characteristic predicate consists of the conjunction (``and``) of its own body (if any) and the characteristic predicates of its superclasses.
-
-For example, we could derive a subclass ``MainMethod`` from the standard QL class ``Method`` that contains precisely those Java functions called ``"main"``:
-
-.. code-block:: ql
-
-   class MainMethod extends Method {
-       MainMethod() {
-           hasName("main")
-       }
-   }
-
-.. pull-quote::
-
-   Note
-
-   -  A class ``A`` *extends* a class ``B`` if and only if ``A`` is a subclass of ``B``.
-   -  For a class in QL, the *body* of the characteristic predicate is the logical formula enclosed in curly braces that defines (membership of) the class. In the example, the body of the characteristic predicate of ``MainMethod`` is ``hasName("main")``.
-
-Letting ``cp(C)`` denote the characteristic predicate of class ``C``, it is clear that:
-
-.. code-block:: ql
-
-   cp(MainMethod) = cp(Method) and hasName("main")
-
-That is, entities are *main* methods if and only if they are methods that are also called ``"main"``.
-
-Abstract classes
----------------
-
-In some cases, you might prefer to think of a class as being the union of its subclasses. This can be useful if you want to group multiple existing classes together under a common header and define member predicates on all these classes.
-
-For example, the security queries in LGTM are interested in identifying all expressions that may be interpreted as SQL queries. We could define an abstract class
-
-.. code-block:: ql
-
-   abstract class SqlExpr extends Expr {
-       ...
-   }
-
-with various subclasses that identify expressions of interest for different database access libraries. For example, there could be a subclass ``class PostgresSqlExpr extends SqlExpr`` whose character specifies that this must be an expression passed to some Postgres API that performs a database query, and similarly for MySQL and other kinds of database management systems.
-
-We can simply use ``SqlExpr`` to refer to all of those different expressions. If we want to add support for another database system later on, we can simply add a new subclass to ``SqlExpr``; there is no need to update the queries that rely on it.
-
-Like a concrete class, an abstract class has one or more superclasses and a characteristic predicate. However, for a value to be in an abstract class, it must not only satisfy the character of the class itself, but it must also satisfy the character of a subclass. In particular, an abstract class without subclasses is empty – since there are no subclasses, there are no values that satisfy the characteristic predicate of one of the subclasses.
-
-Example
-~~~~~~~
-
-The following example is taken from the CodeQL library for Java:
-
-.. code-block:: ql
-
-   abstract class SwitchCase extends Stmt {
-   }
-
-   /** A constant case of a switch statement. */
-   class ConstCase extends SwitchCase, @case {
-     ConstCase() { exists(Expr e | e.getParent() = this) }
-
-     ...
-   }
-
-   /** A default case of a switch statement. */
-   class DefaultCase extends SwitchCase, @case {
-     DefaultCase() { not exists(Expr e | e.getParent() = this) }
-
-     ...
-   }
-
-It models the two different types of ``case`` in a ``switch`` statement: constant cases of the form ``case e`` that have an expression e, and default cases ``default`` that do not.
-
-The characteristic predicate of ``SwitchCase`` here is as follows:
-
-.. code-block:: ql
-
-   cp(SwitchCase) = cp(Stmt) and (
-                    cp(@case) and exists(Expr e | e.getParent() = this)
-                    or
-                    cp(@case) and not exists(Expr e | e.getParent() = this)
-                    )
-
-You must take care when you add a new subclass to an existing abstract class. Adding a subclass is not an isolated change, it also extends the abstract class since that is a union of its subclasses.  An extreme example would be extending the ``Call`` class as follows:
-
-.. code-block:: ql
-
-   class CallEx extends Call {
-       predicate somethingUseful()
-       {
-            ...
-       }
-   } 
-
-In this situation, ``cp(CallEx) = cp(Call)``, and then:
-
-.. code-block:: ql
-
-   cp(Call) = cp(Expr) and (cp(FunctionCall) or ... or cp(DestructorCall) or cp(Call)) = cp(Expr)
-
-So by adding a bad subclass of ``Call``, we have actually extended ``Call`` to include everything in ``Expr``. This is surprising and completely undesirable. Whilst the specific situation of extending an abstract class without providing any further constraints is now checked for by the QL compiler, extending abstract classes in general is still potentially hazardous. You should think carefully about the effects on the abstract parent class when doing so.
--- a/docs/language/learn-ql/advanced/advanced-ql.rst
+++ b/docs/language/learn-ql/advanced/advanced-ql.rst
@ -10,9 +10,6 @@ Advanced QL

 Topics on advanced uses of QL. These topics assume that you are familiar with QL and the basics of query writing.

-  :doc:`Semantics of abstract classes <abstract-classes>`
 -  :doc:`Choosing appropriate ways to constrain types <constraining-types>`
 -  :doc:`Determining the most specific types of a variable <determining-specific-types-variables>`
-  :doc:`Folding predicates <folding-predicates>`
-  :doc:`Understanding the difference between != and not(=) <equivalence>`
 -  :doc:`Monotonic aggregates in QL <monotonic-aggregates>`
--- a/docs/language/learn-ql/advanced/equivalence.rst
+++ b/docs/language/learn-ql/advanced/equivalence.rst
@ -1,44 +0,0 @@
-Understanding the difference between != and not(=)
-==================================================
-
-The two expressions:
-
-#. ``a() != b()``
-#. ``not(a() = b())``
-
-look equivalent - so much so that inexperienced (and even experienced) programmers have been known to rewrite one as the other. However, they are not equivalent due to the quantifiers involved.
-
-Thinking of ``a()`` and ``b()`` as sets of values, the first expression says that there is a pair of values (one from each side of the inequality) which are different.
-
-**Using !=**
-
-::
-
-   exists x, y | x in a() and y in b() | x != y
-
-The second expression, however, says that it is *not* the case that there is a pair of values which are the *same* - that is, *all* pairs of values are different:
-
-**Using not(=)**
-
-::
-
-   not exists x, y | x in a() and y in b() | x = y
-
-This is equivalent to: ``forall x, y | x in a() and y in b() | x != y``. The meaning is very different from the first expression.
-
-Examples
--------
-
-``a() = {1, 2}`` and ``b() = {1}``:
-
-#. ``a() != b()`` is true, because ``2 != 1``
-#. ``a() = b()`` is true, because ``1 = 1``
-#. Therefore\ ``: not(a() = b())`` is false - a different answer from the comparison ``a() != b()``
-
-Similarly with ``a() = {}`` and ``b() = {1}``:
-
-#. ``a() != b()`` is false, because there is no value in ``a()`` that is not equal to ``1``
-#. ``a() = b()`` is also false, because there is no value in ``a()`` that is equal to ``1`` either
-#. Therefore: ``not(a() = b())`` is true - again a different answer from the comparison ``a() != b()``
-
-In summary, the QL expressions ``a() != b()`` and ``not(a() = b())`` may look similar, but their meaning is quite different.
--- a/docs/language/learn-ql/advanced/folding-predicates.rst
+++ b/docs/language/learn-ql/advanced/folding-predicates.rst
@ -1,36 +0,0 @@
-Folding predicates
-==================
-
-Sometimes you can assist the query optimizer by "folding" parts of predicates out into their own predicates.
-
-The general principle is to split off chunks of work that are "linear" - that is, there is not too much branching - and tightly bound, such that the chunks then join with each other on as many variables as possible.
-
-Example
-------
-
-.. code-block:: ql
-
-   predicate similar(Element e1, Element e2) {
-       e1.getName() = e2.getName() and
-       e1.getFile() = e2.getFile() and
-       e1.getLocation().getStartLine() = e2.getLocation().getStartLine()
-   }
-
-Here we explore some lookups on ``Element``\ s. Going from ``Element -> File`` and ``Element -> Location -> StartLine`` are linear: there is only one ``File`` for each ``Element``, and one ``Location`` for each ``Element``, etc. However, as written it is difficult for the optimizer to pick out the best ordering here. We want to do the quick, linear parts first, and then join on the resultant larger tables, rather than joining first and then doing the linear lookups. We can precipitate this kind of ordering by rewriting the above predicate as follows:
-
-.. code-block:: ql
-
-   predicate locInfo(Element e, string name, File f, int startLine) {
-       name = e.getName() and
-       f = e.getFile() and
-       startLine = e.getLocation().getStartLine()
-   }
-
-   predicate sameLoc(Element e1, Element e2) {
-       exists(string name, File f, int startLine |
-           locInfo(e1, name, f, startLine) and
-           locInfo(e2, name, f, startLine)
-       )
-   }
-
-Now the structure we want is clearer: we've separated out the easy part into its own predicate ``locInfo``, and the main predicate is just a larger join.
--- a/docs/language/learn-ql/writing-queries/debugging-queries.rst
+++ b/docs/language/learn-ql/writing-queries/debugging-queries.rst
@ -81,6 +81,48 @@ That is, you should define a *base case* that allows the predicate to *bottom ou
   The query optimizer has special data structures for dealing with `transitive closures <https://help.semmle.com/QL/ql-handbook/recursion.html#transitive-closures>`__.
   If possible, use a transitive closure over a simple recursive predicate, as it is likely to be computed faster.

+Fold predicates
+~~~~~~~~~~~~~~~~~~
+
+Sometimes you can assist the query optimizer by "folding" parts of large predicates out into smaller predicates.
+
+The general principle is to split off chunks of work that are:
+
+- **linear**, so that there is not too much branching.
+- **tightly bound**, so that the chunks join with each other on as many variables as possible.
+
+
+In the following example, we explore some lookups on two ``Element``\ s:
+
+.. code-block:: ql
+
+   predicate similar(Element e1, Element e2) {
+     e1.getName() = e2.getName() and
+     e1.getFile() = e2.getFile() and
+     e1.getLocation().getStartLine() = e2.getLocation().getStartLine()
+   }
+
+Going from ``Element -> File`` and ``Element -> Location -> StartLine`` is linear--that is, there is only one ``File``, ``Location``, etc. for each ``Element``. 
+
+However, as written it is difficult for the optimizer to pick out the best ordering. Joining first and then doing the linear lookups later would likely result in poor performance. Generally, we want to do the quick, linear parts first, and then join on the resultant larger tables. We can initiate this kind of ordering by splitting the above predicate as follows:
+
+.. code-block:: ql
+
+   predicate locInfo(Element e, string name, File f, int startLine) {
+     name = e.getName() and
+     f = e.getFile() and
+     startLine = e.getLocation().getStartLine()
+   }
+   
+   predicate sameLoc(Element e1, Element e2) {
+     exists(string name, File f, int startLine |
+       locInfo(e1, name, f, startLine) and
+       locInfo(e2, name, f, startLine)
+     )
+   }
+
+Now the structure we want is clearer. We've separated out the easy part into its own predicate ``locInfo``, and the main predicate ``sameLoc`` is just a larger join.
+
 Further information
 -------------------

--- a/docs/language/ql-handbook/types.rst
+++ b/docs/language/ql-handbook/types.rst
@ -208,7 +208,9 @@ by declaring them in the ``from`` part.
 You can also annotate predicates and fields. See the list of :ref:`annotations <annotations-overview>`
 that are available.

-Kinds of classes
+.. _concrete-classes:
+
+Concrete classes
 ================

 The classes in the above examples are all **concrete** classes. They are defined by 
@ -218,6 +220,9 @@ values in the intersection of the base types that also satisfy the

 .. _abstract-classes:

+Abstract classes
+================
+
 A class :ref:`annotated <abstract>` with ``abstract``, known as an **abstract** class, is also a restriction of 
 the values in a larger type. However, an abstract class is defined as the union of its 
 subclasses. In particular, for a value to be in an abstract class, it must satisfy the 
@ -247,6 +252,13 @@ The abstract class ``SqlExpr`` refers to all of those different expressions. If
 support for another database system later on, you can simply add a new subclass to ``SqlExpr``;
 there is no need to update the queries that rely on it.

+.. pull-quote:: Important
+
+
+   You must take care when you add a new subclass to an existing abstract class. Adding a subclass
+   is not an isolated change, it also extends the abstract class since that is a union of its
+   subclasses. 
+
 .. _overriding-member-predicates:

 Overriding member predicates