Merge pull request #1842 from jf205/add-java-slides/sd-3762

docs: add rst versions of java training slides
This commit is contained in:
Felicity Chapman 2019-09-04 13:53:13 +01:00 коммит произвёл GitHub
Родитель cdcc716675 64c4548aca
Коммит ef7984d1cb
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
32 изменённых файлов: 1440 добавлений и 421 удалений

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

После

Ширина:  |  Высота:  |  Размер: 224 KiB

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

После

Ширина:  |  Высота:  |  Размер: 292 KiB

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

После

Ширина:  |  Высота:  |  Размер: 224 KiB

Просмотреть файл

@ -134,7 +134,7 @@ URL: https://code.google.com/p/io-2012-slides
{% endblock %}
</head>
<body style="opacity: 0">
<div class="wrapper" id="wrapper">
<slides class="layout-widescreen" id="slides">
<!-- {% include "title_slide.html" %} -->
@ -146,7 +146,7 @@ URL: https://code.google.com/p/io-2012-slides
<slide class="backdrop"></slide>
</slides>
</div>
<!--[if IE]>

Просмотреть файл

@ -285,7 +285,8 @@ slides > slide.current .gdbar {
}
/* line 112, ../scss/default.scss */
slides > slide.next {
display: block;
/*display: block;*/
display: none;
opacity: 0;
pointer-events: none;
}
@ -407,7 +408,7 @@ slides.layout-faux-widescreen > slide.current {
/* line 238, ../scss/default.scss */
slides.layout-widescreen > slide.next,
slides.layout-faux-widescreen > slide.next {
display: block;
/*display: block;*/
opacity: 0;
pointer-events: none;
}
@ -744,11 +745,7 @@ table tr:nth-child(odd) {
table th {
color: white;
font-size: 1em;
background: url('') no-repeat;
background: -webkit-gradient(linear, 50% 0%, 50% 100%, color-stop(40%, #4387fd), color-stop(80%, #2a7cdf)) no-repeat;
background: -moz-linear-gradient(top, #4387fd 40%, #2a7cdf 80%) no-repeat;
background: -webkit-linear-gradient(top, #4387fd 40%, #2a7cdf 80%) no-repeat;
background: linear-gradient(to bottom, #4387fd 40%, #2a7cdf 80%) no-repeat;
background: grey;
}
/* line 494, ../scss/default.scss */
table td, table th {
@ -758,17 +755,16 @@ table td, table th {
/* line 499, ../scss/default.scss */
table td.highlight {
color: #515151;
background: url('') no-repeat;
background: -webkit-gradient(linear, 50% 0%, 50% 100%, color-stop(40%, #ffd14d), color-stop(80%, #f6c000)) no-repeat;
background: -moz-linear-gradient(top, #ffd14d 40%, #f6c000 80%) no-repeat;
background: -webkit-linear-gradient(top, #ffd14d 40%, #f6c000 80%) no-repeat;
background: linear-gradient(to bottom, #ffd14d 40%, #f6c000 80%) no-repeat;
background: grey;
}
/* line 504, ../scss/default.scss */
table.rows {
border-bottom: none;
border-right: 1px solid #797979;
}
table td {
background: white;
}
/* line 510, ../scss/default.scss */
q {
@ -1013,18 +1009,24 @@ article.smaller q:before, article.smaller q:after {
background-image: -webkit-radial-gradient(50% 50%, #b1dfff 0%, #4387fd 600px);
background-image: radial-gradient(50% 50%, #b1dfff 0%, #4387fd 600px);
}
/* the popup class is used to display the speaker notes when 'presenter' view
is enabled. This view is not currently optimal, so certain selectors have been commented-out,
with a view to improving the styles at a later date */
/* line 684, ../scss/default.scss */
.with-notes.popup slide.next {
/*.with-notes.popup slide.next {
-moz-transform: translate3d(570px, 80px, 0) scale(0.35);
-ms-transform: translate3d(570px, 80px, 0) scale(0.35);
-webkit-transform: translate3d(570px, 80px, 0) scale(0.35);
transform: translate3d(570px, 80px, 0) scale(0.35);
opacity: 1 !important;
}
}*/
/* line 688, ../scss/default.scss */
.with-notes.popup slide.next .note {
/*.with-notes.popup slide.next .note {
display: none !important;
}
}*/
/* line 694, ../scss/default.scss */
.with-notes.popup .note {
width: 109%;
@ -1168,7 +1170,7 @@ article.smaller q:before, article.smaller q:after {
/* Clickable/tappable areas */
/* line 773, ../scss/default.scss */
.slide-area {
/*.slide-area {
z-index: 1000;
position: absolute;
left: 0;
@ -1179,7 +1181,7 @@ article.smaller q:before, article.smaller q:after {
top: 50%;
cursor: pointer;
margin-top: -350px;
}
}*/
/* line 790, ../scss/default.scss */
#prev-slide-area {
@ -1469,6 +1471,15 @@ hgroup .pre {
color: white;
}
.subheading {
position: absolute;
top: 62.5%;
}
.subheading p {
position: relative;
}
/* purple background slides (new section)*/
.background2 {
@ -1593,7 +1604,7 @@ p.first.admonition-title {
width: inherit;
}
/* images */
/********* images ************/
/* general styles to scale and centre images*/
.image-box {
@ -1606,7 +1617,7 @@ img {
margin: auto;
}
/* deck-specific styles for individual images*/
/********* deck-specific styles for individual images *********/
/* intro to ql */
img.analysis {
width: 90%;
@ -1619,6 +1630,26 @@ img.analysis {
right: -10%;
}
.java-expression-ast {
background-image: url("../../java-expression-ast.svg");
background-size: cover;
}
/* java data flow code example */
.java-data-flow-code-example {
background-image: url("../../java-data-flow-code-example.svg");
background-size: cover;
}
/* extra global data flow slies*/
.mismatched-calls-and-returns {
background-image: url("../../mismatched-calls-and-returns.svg");
background-size: cover;
}
/******* Other custom styles *******/
/* custom styles for lists*/
ol {

Просмотреть файл

@ -24,7 +24,11 @@ For this example you should download:
You can query the project in `the query console <https://lgtm.com/query/project:2034240708/lang:cpp/>`__ on LGTM.com.
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. resume slides
Checking for overflow in C
==========================

Просмотреть файл

@ -26,7 +26,11 @@ For this example you should download:
You can query the project in `the query console <https://lgtm.com/query/project:2034240708/lang:cpp/>`__ on LGTM.com.
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. resume slides
.. rst-class:: agenda

Просмотреть файл

@ -24,7 +24,11 @@ For this example you should download:
You can query the project in `the query console <https://lgtm.com/query/projects:1505958977333/lang:cpp/>`__ on LGTM.com.
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. resume slides
.. rst-class:: agenda
@ -114,162 +118,11 @@ We need something better.
What we need is a way to determine whether the format argument is ever set to something that is not constant.
Data flow analysis
==================
.. include general data flow slides
- Models flow of data through the program.
- Implemented in the module ``semmle.code.cpp.dataflow.DataFlow``.
- Class ``DataFlow::Node`` represents program elements that have a value, such as expressions and function parameters.
.. include:: ../slide-snippets/local-data-flow.rst
- Nodes of the data flow graph.
- Various predicated represent flow between these nodes.
- Edges of the data flow graph.
.. note::
The solution here is to use *data flow*. Data flow is, as the name suggests, about tracking the flow of data through the program. It helps answers questions like: *does this expression ever hold a value that originates from a particular other place in the program*?
We can visualize the data flow problem as one of finding paths through a directed graph, where the nodes of the graph are elements in the program, and the edges represent the flow of data between those elements. If a path exists, then the data flows between those two edges.
Data flow graphs
================
.. container:: column-left
Example:
.. code-block:: cpp
int func(int, tainted) {
int x = tainted;
if (someCondition) {
int y = x;
callFoo(y);
} else {
return x;
}
return -1;
}
.. container:: column-right
Data flow graph:
.. graphviz::
digraph {
graph [ dpi = 1000 ]
node [shape=polygon,sides=4,color=blue4,style="filled,rounded", fontname=consolas,fontcolor=white]
a [label=<tainted<BR /><FONT POINT-SIZE="10">ParameterNode</FONT>>]
b [label=<tainted<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
c [label=<x<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
d [label=<x<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
e [label=<y<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
a -> b
b -> {c, d}
c -> e
}
Local vs global data flow
=========================
- Local (“intra-procedural”) data flow models flow within one function; feasible to compute for all functions in a snapshot
- Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a snapshot
- Different APIs, so discussed separately
- This slide deck focuses on the former.
.. note::
For further information, see:
- `Introduction to data flow analysis in QL <https://help.semmle.com/QL/learn-ql/ql/intro-to-data-flow.html>`__
- `Analyzing data flow in C/C++ <https://help.semmle.com/QL/learn-ql/ql/cpp/dataflow.html>`__
.. rst-class:: background2
Local data flow
===============
Importing data flow
===================
To use the data flow library, add the following import:
.. code-block:: ql
import semmle.code.cpp.dataflow.DataFlow
**Note**: this library contains an explicit “module” declaration:
.. code-block:: ql
module DataFlow {
class Node extends ... { ... }
predicate localFlow(Node source, Node sink) {
localFlowStep*(source, sink)
}
...
}
So all references will need to be qualified (that is, ``DataFlow::Node``)
.. note::
A **query library** is file with the extension ``.qll``. Query libraries do not contain a query clause, but may contain modules, classes, and predicates. For example, the `C/C++ data flow library <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/dataflow/DataFlow.qll/module.DataFlow.html>`__ is contained in the ``semmle/code/cpp/dataflow/DataFlow.qll`` QLL file, and can be imported as shown above.
A **module** is a way of organizing QL code by grouping together related predicates, classes, and (sub-)modules. They can be either explicitly declared or implicit. A query library implicitly declares a module with the same name as the QLL file.
For further information on libraries and modules in QL, see the chapter on `Modules <https://help.semmle.com/QL/ql-handbook/modules.html>`__ in the QL language handbook.
For further information on importing QL libraries and modules, see the chapter on `Name resolution <https://help.semmle.com/QL/ql-handbook/name-resolution.html>`__ in the QL language handbook.
Data flow graph
===============
- Class ``DataFlow::Node`` represents data flow graph nodes
- Predicate ``DataFlow::localFlowStep`` represents local data flow graph edges, ``DataFlow::localFlow`` is its transitive closure
- Data flow graph nodes are *not* AST nodes, but they correspond to AST nodes, and there are predicates for mapping between them:
- ``Expr Node.asExpr()``
- ``Parameter Node.asParameter()``
- ``DataFlow::Node DataFlow::exprNode(Expr e)``
- ``DataFlow::Node DataFlow::parameterNode(Parameter p)``
- ``etc.``
.. note::
The ``DataFlow::Node`` class is shared between both the local and global data flow graphs–the primary difference is the edges, which in the “global” case can link different functions.
``localFlowStep`` is the “single step” flow relation–that is, it describes single edges in the local data flow graph. ``localFlow`` represents the `transitive <https://help.semmle.com/QL/ql-handbook/recursion.html#transitive-closures>`__ closure of this relation–in other words, it contains every pair of nodes where the second node is reachable from the first in the data flow graph.
The data flow graph is separate from the `AST <https://en.wikipedia.org/wiki/Abstract_syntax_tree>`__, to allow for flexibility in how data flow is modeled. There are a small number of data flow node types–expression nodes, parameter nodes, uninitialized variable nodes, and definition by reference nodes. Each node provides mapping functions to and from the relevant AST (for example ``Expr``, ``Parameter`` etc.) or symbol table (for example ``Variable``) classes.
Taint tracking
==============
- Usually, we want to generalise slightly by not only considering plain data flow, but also “taint” propagation, that is, whether a value is influenced by or derived from another.
- Examples:
.. code-block:: cpp
sink = source; // source -> sink: data and taint
strcat(sink, source); // source -> sink: taint, not data
- Library ``semmle.code.cpp.dataflow.TaintTracking`` provides predicates for tracking taint:
- ``TaintTracking::localTaintStep`` represents one (local) taint step
- ``TaintTracking::localTaint`` is its transitive closure.
.. note::
Taint tracking can be thought of as another type of data flow graph. It usually extends the standard data flow graph for a problem by adding edges between nodes where one one node influences or *taints* another.
The `API <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/dataflow/TaintTracking.qll/module.TaintTracking.html>`__ is almost identical to that of the local data flow. All we need to do to switch to taint tracking is ``import semmle.code.cpp.dataflow.TaintTracking`` instead of ``semmle.code.cpp.dataflow.DataFlow``, and instead of using ``localFlow``, we use ``localTaint``.
.. resume language-specific slides
Exercise: source nodes
======================
@ -343,4 +196,4 @@ Beyond local data flow
- Results are still underwhelming.
- Dealing with parameter passing becomes cumbersome.
- Instead, lets turn the problem around and find user-controlled data that flows into a ``printf`` format argument, potentially through calls.
- This needs :doc:`global data flow <global-data-flow-cpp>`.
- This needs :doc:`global data flow <global-data-flow-cpp>`.

Просмотреть файл

@ -24,7 +24,11 @@ For this example you should download:
You can query the project in `the query console <https://lgtm.com/query/projects:1505958977333/lang:cpp/>`__ on LGTM.com.
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. resume slides
.. rst-class:: agenda
@ -36,57 +40,12 @@ Agenda
- Path queries
- Data flow models
Information flow
================
.. insert common global data flow slides
- Many security problems can be phrased as an information flow problem:
.. include:: ../slide-snippets/global-data-flow.rst
Given a (problem-specific) set of sources and sinks, is there a path in the data flow graph from some source to some sink?
.. resume language-specific global data flow slides
- Some examples:
- SQL injection: sources are user-input, sinks are SQL queries
- Reflected XSS: sources are HTTP requests, sinks are HTTP responses
- We can solve such problems using the data flow and taint tracking libraries.
Global data flow and taint tracking
===================================
- Recap:
- Local (“intra-procedural”) data flow models flow within one function; feasible to compute for all functions in a snapshot
- Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a snapshot
- For global data flow (and taint tracking), we must therefore provide restrictions to ensure the problem is tractable.
- Typically, this involves specifying the *source* and *sink*.
.. note::
As we mentioned in the previous slide deck, while local data flow is feasible to compute for all functions in a snapshot, global data flow is not. This is because the number of paths becomes exponentially larger for global data flow.
The global data flow (and taint tracking) avoids this problem by requiring that the query author specifies which ``sources`` and ``sinks`` are applicable. This allows the implementation to compute paths between the restricted set of nodes, rather than the full graph.
Global taint tracking library
=============================
The ``semmle.code.cpp.dataflow.TaintTracking`` library provides a framework for implementing solvers for global taint tracking problems:
#. Subclass ``TaintTracking::Configuration`` following this template:
.. code-block:: ql
class Config extends TaintTracking::Configuration {
Config() { this = "<some unique identifier>" }
override predicate isSource(DataFlow::Node nd) { ... }
override predicate isSink(DataFlow::Node nd) { ... }
}
#. Use ``Config.hasFlow(source, sink)`` to find inter-procedural paths.
.. note::
In addition to the taint tracking configuration described here, there is also an equivalent *data flow* configuration in ``semmle.code.cpp.dataflow.DataFlow``, ``DataFlow::Configuration``. Data flow configurations are used to track whether the exact value produced by a source is used by a sink, whereas taint tracking configurations are used to determine whether the source may influence the value used at the sink. Whether you use taint tracking or data flow depends on the analysis problem you are trying to solve.
Finding tainted format strings (outline)
========================================
@ -164,30 +123,11 @@ Use the ``FormattingFunction`` class, we can write the sink as:
When we run this query, we should find a single result. However, it is tricky to determine whether this result is a true positive (a “real” result) because our query only reports the source and the sink, and not the path through the graph between the two.
Path queries
============
.. insert path queries slides
Path queries provide information about the identified paths from sources to sinks. Paths can be examined in Path Explorer view.
.. include:: ../slide-snippets/path-queries.rst
Use this template:
.. code-block:: ql
/**
* ...
* @kind path-problem
*/
import semmle.code.cpp.dataflow.TaintTracking
import DataFlow::PathGraph
...
from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select sink, source, sink, "<message>"
.. note::
To see the paths between the source and the sinks, we can convert the query to a path problem query. There are a few minor changes that need to be made for this to work–we need an additional import, to specify ``PathNode`` rather than ``Node``, and to add the source/sink to the query output (so that we can automatically determine the paths).
.. resume language-specific global data flow slides
Defining additional taint steps
===============================
@ -252,86 +192,4 @@ Data flow models
Extra slides
============
Exercise: How not to do global data flow
========================================
Implement a ``flowStep`` predicate extending ``localFlowStep`` with steps through function calls and returns. Why might we not want to use this?
.. code-block:: ql
predicate stepIn(Call c, DataFlow::Node arg, DataFlow::ParameterNode parm) {
exists(int i | arg.asExpr() = c.getArgument(i) |
parm.asParameter() = c.getTarget().getParameter(i))
}
predicate stepOut(Call c, DataFlow::Node ret, DataFlow::Node res) {
exists(ReturnStmt retStmt | retStmt.getEnclosingFunction() = c.getTarget() |
ret.asExpr() = retStmt.getExpr() and res.asExpr() = c)
}
predicate flowStep(DataFlow::Node pred, DataFlow::Node succ) {
DataFlow::localFlowStep(pred, succ) or
stepIn(_, pred, succ) or
stepOut(_, pred, succ)
}
Mismatched calls and returns
============================
.. container:: column-left
.. code-block:: ql
char *logFormat(char *fmt) {
log("Observed format string %s.", fmt);
return fmt;
}
...
char *dangerousFmt = unvalidatedUserData();
printf(logFormat(dangerousFmt), args);
...
char *safeFmt = "Hello %s!";
printf(logFormat(safeFmt), name);
.. container:: column-right
Infeasible path due to mismatched call/return pair!
Balancing calls and returns
===========================
- If we simply take ``flowStep*``, we might mismatch calls and returns, causing imprecision, which in turn may cause false positives.
- Instead, make sure that matching ``stepIn``/``stepOut`` pairs talk about the same call site:
.. code-block:: ql
predicate balancedPath(DataFlow::Node src, DataFlow::Node snk) {
src = snk or DataFlow::localFlowStep(src, snk) or
exists(DataFlow::Node m | balancedPath(src, m) | balancedPath(m, snk)) or
exists(Call c, DataFlow::Node parm, DataFlow::Node ret |
stepIn(c, src, parm) and
balancedPath(parm, ret) and
stepOut(c, ret, snk)
)
}
Summary-based global data flow
==============================
- To avoid traversing the same paths many times, we compute *function summaries* that record if a function parameter flows into a return value:
.. code-block:: ql
predicate returnsParameter(Function f, int i) {
exists (Parameter p, ReturnStmt retStmt, Expr ret |
p = f.getParameter(i) and
retStmt.getEnclosingFunction() = f and
ret = retStmt.getExpr() and
balancedPath(DataFlow::parameterNode(p), DataFlow::exprNode(ret))
)
}
- Use this predicate in balancedPath instead of ``stepIn``/``stepOut`` pairs.
.. include:: ../slide-snippets/global-data-flow-extra-slides.rst

Просмотреть файл

@ -24,7 +24,11 @@ For this example you should download:
You can also query the project in `the query console <https://lgtm.com/query/project:1506532406873/lang:cpp/>`__ on LGTM.com.
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. resume slides
.. Include language-agnostic section here

Просмотреть файл

@ -19,68 +19,11 @@ Agenda
- Variables
- Functions
Abstract syntax trees
=====================
.. insert abstract-syntax-tree.rst
The basic representation of an analyzed program is an *abstract syntax tree (AST)*.
.. include:: ../slide-snippets/abstract-syntax-tree.rst
.. container:: column-left
.. code-block:: cpp
try {
...
} catch (AnException e) {
}
.. container:: ast-graph
.. graphviz::
digraph {
graph [ dpi = 1000 ]
node [shape=polygon,sides=4,color=blue4,style="filled,rounded", fontname=consolas,fontcolor=white]
a [label=<TryStmt>]
b [label=<CatchBlock>]
c [label=<...>,color=white,fontcolor=black]
d [label=<Parameter>]
e [label=<...>,color=white,fontcolor=black]
f [label=<...>,color=white,fontcolor=black]
g [label=<...>,color=white,fontcolor=black]
a -> {b, c}
b -> {d, e}
d -> {f, g}
}
.. note::
When writing queries in QL it is important to have in mind the underlying representation of the program which is stored in the database. Typically queries make use of the “AST” representation of the program–a tree structure where program elements are nested within other program elements.
The “Introducing the C/C++ libraries” help topic contains a more complete overview of important AST classes and the rest of the C++ QL libraries: https://help.semmle.com/QL/learn-ql/ql/cpp/introduce-libraries-cpp.html
Database representations of ASTs
================================
AST nodes and other program elements are encoded in the database as *entity values*. Entities are implemented as integers, but in QL they are opaque–all one can do with them is to check their equality.
Each entity belongs to an entity type. Entity types have names starting with “@” and are defined in the database schema (not in QL).
Properties of AST nodes and their relationships to each other are encoded by database relations, which are predicates defined in the database (not in QL).
Entity types are rarely used directly, the usual pattern is to define a QL class that extends the type and exposes properties of its entities through member predicates.
.. note::
ASTs are a typical example of the kind of data representation one finds in object-oriented programming, with data-carrying nodes that reference each other. At first glance, QL, which can only work with atomic values, does not seem to be well suited for working with this kind of data. However, ultimately all that we require of the nodes in an AST is that they have an identity. The relationships among nodes, usually implemented by reference-valued object fields in other languages, can just as well (and arguably more naturally) be represented as relations over nodes. Attaching data (such as strings or numbers) to nodes can also be represented with relations over nodes and primitive values. All we need is a way for relations to reference nodes. This is achieved in QL (as in other database languages) by means of *entity values* (or entities, for short), which are opaque atomic values, implemented as integers under the hood.
It is the job of the extractor to create entity values for all AST nodes and populate database relations that encode the relationship between AST nodes and any values associated with them. These relations are *extensional*, that is, explicitly stored in the database, unlike the relations described by QL predicates, which we also refer to as *intensional* relations. Entity values belong to *entity types*, whose name starts with “@” to set them apart from primitive types and classes.
The interface between entity types and extensional relations on the one hand and QL predicates and classes on the other hand is provided by the *database schema*, which defines the available entity types and the schema of each extensional relation, that is, how many columns the relation has, and which entity type or primitive type the values in each column come from. QL programs can refer to entity types and extensional relations just as they would refer to QL classes and predicates, with the restriction that entity types cannot be directly selected in a “select” clause, since they do not have a well-defined string representation.
For example, the database schema for C++ snapshot databases is here: https://github.com/Semmle/ql/blob/master/cpp/ql/src/semmlecode.cpp.dbscheme
.. resume slides
AST QL classes
==============
@ -93,10 +36,6 @@ Important AST classes include:
These three (and all other AST classes) are subclasses of ``Element``.
.. note::
The “Introducing the C/C++ libraries” help topic contains a more complete overview of important AST classes and the rest of the C++ QL libraries: https://help.semmle.com/QL/learn-ql/ql/cpp/introduce-libraries-cpp.html
Symbol table
============
@ -108,10 +47,6 @@ The database also includes information about the symbol table associated with a
- ``Type``: built-in and user-defined types
.. note::
The “Introducing the C/C++ libraries” help topic contains a more complete overview of important symbol table classes and the rest of the C++ QL libraries: https://help.semmle.com/QL/learn-ql/ql/cpp/introduce-libraries-cpp.html
Working with variables
======================

Просмотреть файл

@ -24,7 +24,11 @@ For this example you should download:
You can also query the project in `the query console <https://lgtm.com/query/project:1506087977050/lang:cpp/>`__ on LGTM.com.
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. resume slides
``snprintf``
============

Просмотреть файл

@ -10,17 +10,5 @@ QL training and variant analysis examples
:maxdepth: 1
:hidden:
./cpp/intro-ql-cpp
./cpp/bad-overflow-guard
./cpp/program-representation-cpp
./cpp/data-flow-cpp
./cpp/snprintf
./cpp/global-data-flow-cpp
./cpp/control-flow-cpp
.. toctree::
:glob:
:maxdepth: 1
:hidden:
./java/intro-ql-java
*
*/**

Просмотреть файл

@ -0,0 +1,141 @@
=======================
Exercise: Apache Struts
=======================
.. container:: subheading
Unsafe deserialization leading to an RCE
CVE-2017-9805
.. container:: semmle-logo
Semmle :sup:`TM`
.. rst-class:: setup
Setup
=====
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `Apache Struts snapshot <https://downloads.lgtm.com/snapshots/java/apache/struts/apache-struts-7fd1622-CVE-2018-11776.zip>`__
.. note::
For this example, we will be analyzing `Apache Struts <https://github.com/apache/struts>`__.
You can also query the project in `the query console <https://lgtm.com/query/project:1878521151/lang:java/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. resume slides
Unsafe deserialization in Struts
================================
Apache Struts provides a ``ContentTypeHandler`` interface, which can be implemented for specific content types. It defines the following interface method:
.. code-block:: java
void toObject(Reader in, Object target);
which is intended to populate the ``target`` object with data from the reader, usually through deserialization. However, the ``in`` parameter should be considered untrusted, and should not be deserialized without sanitization.
RCE in Apache Struts
====================
- Vulnerable code looked like this (`original <https://lgtm.com/projects/g/apache/struts/snapshot/b434c23f95e0f9d5bde789bfa07f8fc1d5a8951d/files/plugins/rest/src/main/java/org/apache/struts2/rest/handler/XStreamHandler.java?sort=name&dir=ASC&mode=heatmap#L45>`__):
.. code-block:: java
public void toObject(Reader in, Object target) {
XStream xstream = createXStream();
xstream.fromXML(in, target);
}
- Xstream allows deserialization of **dynamic proxies**, which permit remote code execution.
- Disclosed as `CVE-2017-9805 <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9805>`__
- Blog post: https://lgtm.com/blog/apache_struts_CVE-2017-9805
Finding the RCE yourself
========================
#. Create a QL class to find the interface ``org.apache.struts2.rest.handler.ContentTypeHandler``
**Hint**: Use predicate ``hasQualifiedName(...)``
#. Identify methods called ``toObject``, which are defined on direct subtypes of ``ContentTypeHandler``
**Hint**: Use ``Method.getDeclaringType()`` and ``Type.getASupertype()``
#. Implement a ``DataFlow::Configuration``, defining the source as the first parameter of a ``toObject`` method, and the sink as an instance of ``UnsafeDeserializationSink``.
**Hint**: Use ``Node::asParameter()``
#. Construct the query as a path-problem query, and verify you find one result.
Model answer, step 1
====================
.. code-block:: ql
import java
/** The interface `org.apache.struts2.rest.handler.ContentTypeHandler`. */
class ContentTypeHandler extends RefType {
ContentTypeHandler() {
this.hasQualifiedName("org.apache.struts2.rest.handler", "ContentTypeHandler")
}
}
Model answer, step 2
====================
.. code-block:: ql
/** A `toObject` method on a subtype of `org.apache.struts2.rest.handler.ContentTypeHandler`. */
class ContentTypeHandlerDeserialization extends Method {
ContentTypeHandlerDeserialization() {
this.getDeclaringType().getASupertype() instanceof ContentTypeHandler and
this.hasName("toObject")
Model answer, step 3
====================
.. code-block:: ql
import UnsafeDeserialization
import semmle.code.java.dataflow.DataFlow::DataFlow
/**
* Configuration that tracks the flow of taint from the first parameter of
* `ContentTypeHandler.toObject` to an instance of unsafe deserialization.
*/
class StrutsUnsafeDeserializationConfig extends Configuration {
StrutsUnsafeDeserializationConfig() { this = "StrutsUnsafeDeserializationConfig" }
override predicate isSource(Node source) {
source.asParameter() = any(ContentTypeHandlerDeserialization des).getParameter(0)
}
override predicate isSink(Node sink) { sink instanceof UnsafeDeserializationSink }
}
Model answer, step 4
====================
.. code-block:: ql
import PathGraph
...
from PathNode source, PathNode sink, StrutsUnsafeDeserializationConfig conf
where conf.hasFlowPath(source, sink)
and sink.getNode() instanceof UnsafeDeserializationSink
select sink.getNode().(UnsafeDeserializationSink).getMethodAccess(), source, sink, "Unsafe deserialization of $@.", source, "user input"
More full-featured version: https://github.com/Semmle/demos/tree/master/ql_demos/java/Apache_Struts_CVE-2017-9805

Просмотреть файл

@ -0,0 +1,146 @@
=========================
Introduction to data flow
=========================
.. container:: semmle-logo
Semmle :sup:`TM`
Finding SPARQL injection vulnerabilities in Java
.. rst-class:: setup
Setup
=====
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `VIVO Vitro snapshot <http://downloads.lgtm.com/snapshots/java/vivo-project/Vitro/vivo-project_Vitro_java-srcVersion_47ae42c01954432c3c3b92d5d163551ce367f510-dist_odasa-lgtm-2019-04-23-7ceff95-linux64.zip>`__
.. note::
For this example, we will be analyzing `VIVO Vitro <https://github.com/vivo-project/Vitro>`__.
You can also query the project in `the query console <https://lgtm.com/query/project:14040005/lang:java/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. resume slides
.. rst-class:: agenda
Agenda
======
- SPARQL injection
- Data flow
- Modules and libraries
- Local data flow
- Local taint tracking
Motivation
==========
`SPARQL <https://en.wikipedia.org/wiki/SPARQL>`__ is a language for querying key-value databases in RDF format, which can suffer from SQL injection-like vulnerabilities:
.. code-block:: none
sparqlAskQuery("ASK { <" + individualURI + "> ?p ?o }")
``individualURI`` is provided by a user, allowing an attacker to prematurely close the ``>``, and provide additional content.
**Goal**: Find query strings that are created by concatenation.
.. note::
If you have completed the “Example: Query injection” slide deck which was part of the previous course, this example will look familiar to you.
To understand the scope of this vulnerability, consider what would happen if a malicious user could provide the following as the content of the ``individualURI`` variable:
``“http://vivoweb.org/ontology/core#FacultyMember> ?p ?o . FILTER regex("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!", "(.*a){50}") } #``
Example: SPARQL injection
=========================
We can write a simple query that finds string concatenations that occur in calls to SPARQL query APIs.
.. rst-class:: build
.. literalinclude:: ../query-examples/java/data-flow-java-1.ql
:language: ql
.. note::
This is similar, but not identical, to the formulation we had in the previous training deck. It has been rewritten to make it easier for the next step.
Success! But also missing results...
====================================
Query finds a CVE reported by Semmle (CVE-2019-6986), plus one other result, but misses other opportunities where:
- String concatenation occurs on a different line in the same method.
- String concatenation occurs in a different method.
- String concatenation occurs through ``StringBuilders`` or similar.
- Entirety of user input is provided as the query.
We want to improve our query to catch more of these cases.
.. note::
For more details of the CVE, see: https://github.com/Semmle/SecurityExploits/tree/master/vivo-project/CVE-2019-6986
As an example, consider this SPARQL query call:
.. code-block:: none
String queryString = "ASK { <" + individualURI + "> ?p ?o }";
sparqlAskQuery(queryString);
Here the concatenation occurs before the call, so the existing query would miss this - the string concatenation does not occur *directly* as the first argument of the call.
.. include general data flow slides
.. include:: ../slide-snippets/local-data-flow.rst
.. resume language-specific slides
Exercise: revisiting SPARQL injection
=====================================
Refine the query to find string concatenation that occurs in the same method, but a different line.
**Hint**: Use ``DataFlow::localFlow`` to assert that the result flows to the SPARQL call argument, using ``DataFlow::exprNode`` to get the data flow nodes for the relevant expression nodes.
.. rst-class:: build
.. literalinclude:: ../query-examples/java/data-flow-java-2.ql
:language: ql
Refinements (take home exercise)
================================
In Java, strings are often created using ``StringBuilder`` and ``StringBuffer`` classes. For example:
.. code-block:: java
StringBuilder queryBuilder = new StringBuilder();
queryBuilder.add("ASK { <");
queryBuilder.add(individualURI);
queryBuilder.add("> ?p ?o }");
sparqlAskQuery(queryBuilder);
**Exercise**: Refine the query to consider strings created from ``StringBuilder`` and ``StringBuffer`` classes as sources of concatenation.
Beyond local data flow
======================
- We are still missing possible results.
- Concatenation that occurs outside the enclosing method.
- Instead, lets turn the problem around and find user-controlled data that flows into a ``printf`` format argument, potentially through calls.
- This needs :doc:`global data flow <global-data-flow-java>`.

Просмотреть файл

@ -0,0 +1,190 @@
================================
Introduction to global data flow
================================
QL for Java
.. container:: semmle-logo
Semmle :sup:`TM`
.. rst-class:: setup
Setup
=====
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `Apache Struts snapshot <https://downloads.lgtm.com/snapshots/java/apache/struts/apache-struts-7fd1622-CVE-2018-11776.zip>`__
.. note::
For this example, we will be analyzing `Apache Struts <https://github.com/apache/struts>`__.
You can also query the project in `the query console <https://lgtm.com/query/project:1878521151/lang:java/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. resume slides
.. rst-class:: agenda
Agenda
======
- Global taint tracking
- Sanitizers
- Path queries
- Data flow models
.. insert common global data flow slides
.. include:: ../slide-snippets/global-data-flow.rst
.. resume language-specific global data flow slides
Code injection in Apache struts
===============================
- In April 2018, Man Yue Mo, a security researcher at Semmle, reported 5 remote code execution (RCE) vulnerabilities (CVE-2018-11776) in Apache Struts.
- These vulnerabilities were caused by untrusted, unsanitized data being evaluated as an OGNL (Object Graph Navigation Library) expression, allowing malicious users to perform remote code execution.
- Conceptually, this is a global taint tracking problem - does untrusted remote input flow to a method call which evaluates OGNL?
.. note::
More details on the CVE can be found here: https://lgtm.com/blog/apache_struts_CVE-2018-11776 and
https://github.com/Semmle/demos/tree/master/ql_demos/java/Apache_Struts_CVE-2018-11776
More details on OGNL can be found here: https://commons.apache.org/proper/commons-ognl/
.. rst-class:: java-data-flow-code-example
Code example
============
Finding RCEs (outline)
======================
.. literalinclude:: ../query-examples/java/global-data-flow-java-1.ql
:language: ql
Defining sources
================
We want to look for method calls where the method name is ``getNamespace()``, and the declaring type of the method is a class called ``ActionProxy``.
.. code-block:: ql
import semmle.code.java.security.Security
class TaintedOGNLConfig extends TaintTracking::Configuration {
override predicate isSource(DataFlow::Node source) {
exists(Method m |
m.getName() = "getNamespace" and
m.getDeclaringType().getName() = "ActionProxy" and
source.asExpr() = m.getAReference()
)
}
...
}
.. note::
We first define what it means to be a *source* of tainted data for this particular problem. In this case, we are interested in the value returned by calls to ``getNamespace()``.
Exercise: Defining sinks
========================
Fill in the definition of ``isSink``.
**Hint**: We want to find the first argument of calls to the method ``compileAndExecute``.
.. code-block:: ql
import semmle.code.java.security.Security
class TaintedOGNLConfig extends TaintTracking::Configuration {
override predicate isSink(DataFlow::Node sink) {
/* Fill me in */
}
...
}
.. note::
The second part is to define what it means to be a sink for this particular problem. The queries from an :doc:`Introduction to data flow <data-flow-java>` will be useful for this exercise.
Solution: Defining sinks
========================
Find a method access to ``compileAndExecute``, and mark the first argument.
.. code-block:: ql
import semmle.code.java.security.Security
class TaintedOGNLConfig extends TaintTracking::Configuration {
override predicate isSink(DataFlow::Node sink) {
exists(MethodAccess ma |
ma.getMethod().getName() = "compileAndExecute" and
ma.getArgument(0) = sink.asExpr()
)
}
...
}
.. insert path queries slides
.. include:: ../slide-snippets/path-queries.rst
.. resume language-specific global data flow slides
Defining sanitizers
===================
A sanitizer allows us to *prevent* flow through a particular node in the graph. For example, flows that go via ``ValueStackShadowMap`` are not particularly interesting, because it is a class that is rarely used in practice. We can exclude them like so:
.. code-block:: ql
class TaintedOGNLConfig extends TaintTracking::Configuration {
override predicate isSanitizer(DataFlow::Node nd) {
nd.getEnclosingCallable()
.getDeclaringType()
.getName() = "ValueStackShadowMap"
}
...
}
Defining additional taint steps
===============================
Add an additional taint step that (heuristically) taints a local variable if it is a pointer, and it is passed to a function in a parameter position that taints it.
.. code-block:: ql
class TaintedOGNLConfig extends TaintTracking::Configuration {
override predicate isAdditionalTaintStep(DataFlow::Node pred,
DataFlow::Node succ) {
exists(Field f, RefType t |
node1.asExpr() = f.getAnAssignedValue() and
node2.asExpr() = f.getAnAccess() and
node1.asExpr().getEnclosingCallable().getDeclaringType() = t and
node2.asExpr().getEnclosingCallable().getDeclaringType() = t
)
}
...
}
.. rst-class:: end-slide
Extra slides
============
.. include:: ../slide-snippets/global-data-flow-extra-slides.rst

Просмотреть файл

@ -24,7 +24,11 @@ For this example you should download:
You can also query the project in `the query console <https://lgtm.com/query/project:1878521151/lang:java/>`__ on LGTM.com.
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. resume slides
.. Include language-agnostic section here
@ -47,7 +51,7 @@ Oops
}
- The return statement has been commented out (during debugging?)
- The if statement is now dead code
- The ``if`` statement is now dead code
- No explicit bounds checking, will throw ``ArrayIndexOutOfbounds``
.. note::
@ -182,7 +186,7 @@ Iterative query refinement
- **Common workflow**: Start with a simple query, inspect a few results, refine, repeat.
- For example, empty then branches are not a problem if there is an else.
- For example, empty ``then`` branches are not a problem if there is an ``else``.
- **Exercise**: How can we refine the query to take this into account?

Просмотреть файл

@ -0,0 +1,98 @@
======================
Program representation
======================
QL for Java
.. container:: semmle-logo
Semmle :sup:`TM`
.. rst-class:: agenda
Agenda
======
- Abstract syntax trees
- Database representation
- Program elements
- AST classes
.. insert abstract-syntax-tree.rst
.. include:: ../slide-snippets/abstract-syntax-tree.rst
.. resume slides
Program elements
================
- The QL class ``Element`` represents program elements with a name.
- This includes: packages (``Package``), compilation units (``CompilationUnit``), types (``Type``), methods (``Method``), constructors (``Constructor``), and variables (``Variable``).
- It is often convenient to refer to an element that might either be a method or a constructor; the class ``Callable``, which is a common superclass of ``Method`` and ``Constructor``, can be used for this purpose.
AST
===
There are two primary AST classes, used within ``Callables``:
- ``Expr``: expressions such as assignments, variable references, function calls, ...
- ``Stmt``: statements such as conditionals, loops, try statements, ...
Operations are provided for exploring the AST:
- ``Expr.getAChildExpr`` returns a sub-expression of a given expression.
- ``Stmt.getAChild`` returns a statement or expression that is nested directly inside a given statement.
- ``Expr.getParent`` and ``Stmt.getParent`` return the parent node of an AST node.
Types
=====
The database also includes information about the types used in a program:
- ``PrimitiveType`` represents a `primitive type <http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html>`__, that is, one of ``boolean``, ``byte``, ``char``, ``double``, ``float``, ``int``, ``long``, ``short``. QL also classifies ``void`` and ``<nulltype>`` (the type of the ``null`` literal) as primitive types.
- ``RefType`` represents a reference type; it has several subclasses:
- ``Class`` represents a Java class.
- ``Interface`` represents a Java interface.
- ``EnumType`` represents a Java enum type.
- ``Array`` represents a Java array type.
Working with variables
======================
``Variable`` represents program variables, including locally scoped variables (``LocalScopeVariable``), fields (``Fields``), and parameters (``Parameters``):
- ``string Variable.getName()``
- ``Type Variable.getType()``
``Access`` represents references to declared entities such as methods (``MethodAccess``) and variables (``VariableAccess``), including fields (``FieldAccess``).
- ``Declaration Access.getTarget()``
``VariableDeclarationEntry`` represents declarations or definitions of a variable.
- ``Variable VariableDeclarationEntry.getVariable()``
Working with callables
======================
Callables are represented by the ``Callable`` QL class.
Calls to callables are modeled by the QL class ``Call`` and its subclasses:
- ``Call.getCallee()`` gets the declared target of the call
- ``Call.getAReference()`` gets a call to this function
Typically, callables are identified by name:
- ``string Callable.getName()``
- ``string Callable.getQualifiedName()``
.. rst-class:: java-expression-ast
Example: Java expression AST
============================
.. diagram copied from google slides

Просмотреть файл

@ -0,0 +1,148 @@
========================
Example: Query injection
========================
QL for Java
.. container:: semmle-logo
Semmle :sup:`TM`
.. rst-class:: setup
Setup
=====
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `VIVO Vitro snapshot <http://downloads.lgtm.com/snapshots/java/vivo-project/Vitro/vivo-project_Vitro_java-srcVersion_47ae42c01954432c3c3b92d5d163551ce367f510-dist_odasa-lgtm-2019-04-23-7ceff95-linux64.zip>`__
.. note::
For this example, we will be analyzing `VIVO Vitro <https://github.com/vivo-project/Vitro>`__.
You can also query the project in `the query console <https://lgtm.com/query/project:14040005/lang:java/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. resume slides
SQL injection
=============
- Occurs when user input is used to construct an SQL query without any sanitization or escaping.
- Classic example involves constructing a query using string concatenation:
.. code-block:: sql
runQuery("SELECT * FROM users WHERE id='" + userId + "'");
- If the ``userId`` can be provided by a user, and is not sanitized, then a malicious user can provide input that manipulates the intended query.
- For example, providing the input ``"' OR '1'='1"`` would allow the attacker to return all records in the users table.
.. note::
`SQL <https://en.wikipedia.org/wiki/SQL>`__ is a database query language, which is often used from within other programming languages to interact with a database. The typical case is that a query is to be executed to find some data, based on some input provided by the user - for example, the users ID. However, the interface between the host programming language and SQL is typically implemented by passing a string containing the query to some API.
SPARQL injection
================
- `SPARQL <https://en.wikipedia.org/wiki/SPARQL>`__ is a language for querying key-value databases in RDF format.
- The same type of vulnerability can occur for SPARQL as for SQL: if the SPARQL query is constructed through string concatenation, a malicious user can subvert the query:
.. code-block:: sql
sparqlAskQuery("ASK { <" + individualURI + "> ?p ?o }");
- SPARQL is used by many projects, but we will be looking at `VIVO Vitro <https://github.com/vivo-project/Vitro/>`__.
.. rst-class:: background2
Developing a QL query
======================
Finding a query concatenation
QL query: find SPARQL methods
=============================
Lets start by looking for calls to methods with names of the form ``sparql*Query``, using the classes ``Method`` and ``MethodAccess`` from the Java library.
.. rst-class:: build
.. literalinclude:: ../query-examples/java/query-injection-java-1.ql
.. note::
- When performing `variant analysis <https://semmle.com/ variant-analysis>`__, it is usually helpful to write a simple query that finds the simple syntactic pattern, before trying to go on to describe the cases where it goes wrong.
- In this case, we start by looking for all the method calls which appear to run, before trying to refine the query to find cases which are vulnerable to query injection.
- The ``select`` clause defines what this query is looking for:
- a ``MethodAccess``: the call to a SPARQL query method
- a ``Method``: the SPARQL query method.
- The ``where`` part of the query ties these three QL variables together using `predicates <https://help.semmle.com/QL/ ql-handbook/predicates.html>`__ defined in the `standard QL for Java library <https://help.semmle.com/qldoc/java/>`__.
QL query: find string concatenation
===================================
- We now need to define what would make these API calls unsafe.
- A simple heuristic would be to look for string concatenation used in the query argument.
- We may want to reuse this logic, so let us create a separate predicate.
Looking at autocomplete suggestions, we see that we can get the type of an expression using the ``getType()`` method.
.. rst-class:: build
.. code-block:: ql
predicate isStringConcat(AddExpr ae) {
ae.getType() instanceof TypeString
}
.. note::
- An important part of the query is to determine whether a given expression is string concatenation.
- We therefore write a helper predicate for finding string concatenation.
- This predicate effectively represents the set of all ``add`` expressions in the database where the type of the expression is ``TypeString`` - that is, the addition produces a ``String`` value.
QL query: SPARQL injection
==========================
We can now combine our predicate with the existing query.
Note that we do not need to specify that the argument of the method access is an ``AddExpr`` - this is implied by the ``isStringConcat`` requirement.
Now our query becomes:
.. rst-class:: build
.. literalinclude:: ../query-examples/java/query-injection-java-2.ql
:language: ql
The final query
===============
.. literalinclude:: ../query-examples/java/query-injection-java-3.ql
:language: ql
There are two results, one of which was assigned **CVE-2019-6986**.
.. note::
Full write up and exploit can be found here: https://github.com/Semmle/SecurityExploits/tree/master/vivo-project/CVE-2019-6986
Follow up
=========
- Our query successfully finds cases where the concatenation occurs in the argument to the SPARQL API.
- However, in general, the concatenation could occur before the method call.
- For this, we would need to use :doc:`local data flow <data-flow-java>`, which is the topic of the next set of training slides.

Просмотреть файл

@ -0,0 +1,11 @@
import java
class StringConcat extends AddExpr {
StringConcat() { getType() instanceof TypeString }
}
from MethodAccess ma
where
ma.getMethod().getName().matches("sparql%Query") and
ma.getArgument(0) instanceof StringConcat
select ma, "SPARQL query vulnerable to injection."

Просмотреть файл

@ -0,0 +1,8 @@
import java
import semmle.code.java.dataflow.DataFlow::DataFlow
from MethodAccess ma, StringConcat stringConcat
where
ma.getMethod().getName().matches("sparql%Query") and
localFlow(exprNode(stringConcat), exprNode(ma.getArgument(0)))
select ma, "SPARQL query vulnerable to injection."

Просмотреть файл

@ -0,0 +1,14 @@
import java
import semmle.code.java.dataflow.TaintTracking
class TaintedOGNLConfig extends TaintTracking::Configuration {
TaintedOGNLConfig() { this = "TaintedOGNLConfig" }
override predicate isSource(DataFlow::Node source) { /* TBD */ }
override predicate isSink(DataFlow::Node sink) { /* TBD */ }
}
from TaintedOGNLConfig cfg, DataFlow::Node source, DataFlow::Node sink
where cfg.hasFlow(source, sink)
select source,
"This untrusted input is evaluated as an OGNL expression $@.",
sink, "here"

Просмотреть файл

@ -0,0 +1,7 @@
import java
from Method m, MethodAccess ma
where
m.getName().matches("sparql%Query") and
ma.getMethod() = m
select ma, m

Просмотреть файл

@ -0,0 +1,8 @@
import java
from Method m, MethodAccess ma
where
m.getName().matches("sparql%Query") and
ma.getMethod() = m and
isStringConcat(ma.getArgument(0))
select ma, m

Просмотреть файл

@ -0,0 +1,12 @@
import java
predicate isStringConcat(AddExpr ae) {
ae.getType() instanceof TypeString
}
from Method m, MethodAccess ma
where
m.getName().matches("sparql%Query") and
ma.getMethod() = m and
isStringConcat(ma.getArgument(0))
select ma, "SPARQL query vulnerable to injection."

Просмотреть файл

@ -0,0 +1,70 @@
Abstract syntax trees
=====================
The basic representation of an analyzed program is an *abstract syntax tree (AST)*.
.. container:: column-left
.. code-block:: java
try {
...
} catch (AnException e) {
}
.. container:: ast-graph
.. graphviz::
digraph {
graph [ dpi = 1000 ]
node [shape=polygon,sides=4,color=blue4,style="filled,rounded", fontname=consolas,fontcolor=white]
a [label=<TryStmt>]
b [label=<CatchClause>]
c [label=<...>,color=white,fontcolor=black]
d [label=<LocalVariable<BR />DeclExpr>]
e [label=<...>,color=white,fontcolor=black]
f [label=<...>,color=white,fontcolor=black]
g [label=<...>,color=white,fontcolor=black]
a -> {b, c}
b -> {d, e}
d -> {f, g}
}
.. note::
When writing queries in QL it is important to have in mind the underlying representation of the program which is stored in the database. Typically queries make use of the “AST” representation of the program - a tree structure where program elements are nested within other program elements.
The following topics contain overviews of the important AST classes and QL libraries for C/C++, C#, and Java:
- `Introducing the C/C++ libraries <https://help.semmle.com/QL/learn-ql/cpp/introduce-libraries-cpp.html>`__
- `Introducing the C# libraries <https://help.semmle.com/QL/learn-ql/csharp/introduce-libraries-csharp.html>`__
- `Introducing the Java libraries <https://help.semmle.com/QL/learn-ql/java/introduce-libraries-java.html>`__
Database representations of ASTs
================================
AST nodes and other program elements are encoded in the database as *entity values*. Entities are implemented as integers, but in QL they are opaque - all one can do with them is to check their equality.
Each entity belongs to an entity type. Entity types have names starting with “@” and are defined in the database schema (not in QL).
Properties of AST nodes and their relationships to each other are encoded by database relations, which are predicates defined in the database (not in QL).
Entity types are rarely used directly, the usual pattern is to define a QL class that extends the type and exposes properties of its entities through member predicates.
.. note::
ASTs are a typical example of the kind of data representation one finds in object-oriented programming, with data-carrying nodes that reference each other. At first glance, QL, which can only work with atomic values, does not seem to be well suited for working with this kind of data. However, ultimately all that we require of the nodes in an AST is that they have an identity. The relationships among nodes, usually implemented by reference-valued object fields in other languages, can just as well (and arguably more naturally) be represented as relations over nodes. Attaching data (such as strings or numbers) to nodes can also be represented with relations over nodes and primitive values. All we need is a way for relations to reference nodes. This is achieved in QL (as in other database languages) by means of *entity values* (or entities, for short), which are opaque atomic values, implemented as integers under the hood.
It is the job of the extractor to create entity values for all AST nodes and populate database relations that encode the relationship between AST nodes and any values associated with them. These relations are *extensional*, that is, explicitly stored in the database, unlike the relations described by QL predicates, which we also refer to as *intensional* relations. Entity values belong to *entity types*, whose name starts with “@” to set them apart from primitive types and classes.
The interface between entity types and extensional relations on the one hand and QL predicates and classes on the other hand is provided by the *database schema*, which defines the available entity types and the schema of each extensional relation, that is, how many columns the relation has, and which entity type or primitive type the values in each column come from. QL programs can refer to entity types and extensional relations just as they would refer to QL classes and predicates, with the restriction that entity types cannot be directly selected in a ``select`` clause, since they do not have a well-defined string representation.
For example, the database schemas for C/++, C#, and Java snapshot databases are here:
- https://github.com/Semmle/ql/blob/master/cpp/ql/src/semmlecode.cpp.dbscheme
- https://github.com/Semmle/ql/blob/master/csharp/ql/src/semmlecode.csharp.dbscheme
- https://github.com/Semmle/ql/blob/master/java/ql/src/config/semmlecode.dbscheme

Просмотреть файл

@ -0,0 +1,67 @@
Exercise: How not to do global data flow
========================================
Implement a ``flowStep`` predicate extending ``localFlowStep`` with steps through function calls and returns. Why might we not want to use this?
.. code-block:: ql
predicate stepIn(Call c, DataFlow::Node arg, DataFlow::ParameterNode parm) {
exists(int i | arg.asExpr() = c.getArgument(i) |
parm.asParameter() = c.getTarget().getParameter(i))
}
predicate stepOut(Call c, DataFlow::Node ret, DataFlow::Node res) {
exists(ReturnStmt retStmt | retStmt.getEnclosingFunction() = c.getTarget() |
ret.asExpr() = retStmt.getExpr() and res.asExpr() = c)
}
predicate flowStep(DataFlow::Node pred, DataFlow::Node succ) {
DataFlow::localFlowStep(pred, succ) or
stepIn(_, pred, succ) or
stepOut(_, pred, succ)
}
.. rst-class:: mismatched-calls-and-returns
Mismatched calls and returns
============================
.. diagram copied from google slides
Balancing calls and returns
===========================
- If we simply take ``flowStep*``, we might mismatch calls and returns, causing imprecision, which in turn may cause false positives.
- Instead, make sure that matching ``stepIn``/``stepOut`` pairs talk about the same call site:
.. code-block:: ql
predicate balancedPath(DataFlow::Node src, DataFlow::Node snk) {
src = snk or DataFlow::localFlowStep(src, snk) or
exists(DataFlow::Node m | balancedPath(src, m) | balancedPath(m, snk)) or
exists(Call c, DataFlow::Node parm, DataFlow::Node ret |
stepIn(c, src, parm) and
balancedPath(parm, ret) and
stepOut(c, ret, snk)
)
}
Summary-based global data flow
==============================
- To avoid traversing the same paths many times, we compute function summaries that record if a function parameter flows into a return value:
.. code-block:: ql
predicate returnsParameter(Function f, int i) {
exists (Parameter p, ReturnStmt retStmt, Expr ret |
p = f.getParameter(i) and
retStmt.getEnclosingFunction() = f and
ret = retStmt.getExpr() and
balancedPath(DataFlow::parameterNode(p), DataFlow::exprNode(ret))
)
}
- Use this predicate in ``balancedPath`` instead of ``stepIn``/``stepOut`` pairs.

Просмотреть файл

@ -0,0 +1,51 @@
Information flow
================
- Many security problems can be phrased as an information flow problem:
Given a (problem-specific) set of sources and sinks, is there a path in the data flow graph from some source to some sink?
- Some examples:
- SQL injection: sources are user-input, sinks are SQL queries
- Reflected XSS: sources are HTTP requests, sinks are HTTP responses
- We can solve such problems using the data flow and taint tracking libraries.
Global data flow and taint tracking
===================================
- Recap:
- Local (“intra-procedural”) data flow models flow within one function; feasible to compute for all functions in a snapshot
- Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a snapshot
- For global data flow (and taint tracking), we must therefore provide restrictions to ensure the problem is tractable.
- Typically, this involves specifying the *source* and *sink*.
.. note::
As we mentioned in the previous slide deck, while local data flow is feasible to compute for all functions in a snapshot, global data flow is not. This is because the number of paths becomes exponentially larger for global data flow.
The global data flow (and taint tracking) avoids this problem by requiring that the query author specifies which ``sources`` and ``sinks`` are applicable. This allows the implementation to compute paths between the restricted set of nodes, rather than the full graph.
Global taint tracking library
=============================
The ``semmle.code.<language>.dataflow.TaintTracking`` library provides a framework for implementing solvers for global taint tracking problems:
#. Subclass ``TaintTracking::Configuration`` following this template:
.. code-block:: ql
class Config extends TaintTracking::Configuration {
Config() { this = "<some unique identifier>" }
override predicate isSource(DataFlow::Node nd) { ... }
override predicate isSink(DataFlow::Node nd) { ... }
}
#. Use ``Config.hasFlow(source, sink)`` to find inter-procedural paths.
.. note::
In addition to the taint tracking configuration described here, there is also an equivalent *data flow* configuration in ``semmle.code.<language>.dataflow.DataFlow``, ``DataFlow::Configuration``. Data flow configurations are used to track whether the exact value produced by a source is used by a sink, whereas taint tracking configurations are used to determine whether the source may influence the value used at the sink. Whether you use taint tracking or data flow depends on the analysis problem you are trying to solve.

Просмотреть файл

@ -0,0 +1,160 @@
Data flow analysis
==================
- Models flow of data through the program.
- Implemented in the module ``semmle.code.<lang>.dataflow.DataFlow``.
- Class ``DataFlow::Node`` represents program elements that have a value, such as expressions and function parameters.
- Nodes of the data flow graph.
- Various predicated represent flow between these nodes.
- Edges of the data flow graph.
.. note::
The solution here is to use *data flow*. Data flow is, as the name suggests, about tracking the flow of data through the program. It helps answers questions like: *does this expression ever hold a value that originates from a particular other place in the program*?
We can visualize the data flow problem as one of finding paths through a directed graph, where the nodes of the graph are elements in program, and the edges represent the flow of data between those elements. If a path exists, then the data flows between those two edges.
Data flow graphs
================
.. container:: column-left
Example:
.. code-block:: cpp
int func(int, tainted) {
int x = tainted;
if (someCondition) {
int y = x;
callFoo(y);
} else {
return x;
}
return -1;
}
.. container:: column-right
Data flow graph:
.. graphviz::
digraph {
graph [ dpi = 1000 ]
node [shape=polygon,sides=4,color=blue4,style="filled,rounded", fontname=consolas,fontcolor=white]
a [label=<tainted<BR /><FONT POINT-SIZE="10">ParameterNode</FONT>>]
b [label=<tainted<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
c [label=<x<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
d [label=<x<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
e [label=<y<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
a -> b
b -> {c, d}
c -> e
}
Local vs global data flow
=========================
- Local (“intra-procedural”) data flow models flow within one function; feasible to compute for all functions in a snapshot
- Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a snapshot
- Different APIs, so discussed separately
- This slide deck focuses on the former
.. note::
For further information, see:
- `Introduction to data flow analysis in QL <https://help.semmle.com/QL/learn-ql/ql/intro-to-data-flow.html>`__
.. rst-class:: background2
Local data flow
===============
Importing data flow
===================
To use the data flow library, add the following import:
.. code-block:: ql
import semmle.code.<language>.dataflow.DataFlow
**Note**: this library contains an explicit “module” declaration:
.. code-block:: ql
module DataFlow {
class Node extends ... { ... }
predicate localFlow(Node source, Node sink) {
localFlowStep*(source, sink)
}
...
}
So all references will need to be qualified (that is, ``DataFlow::Node``)
.. note::
A **query library** is file with the extension ``.qll``. Query libraries do not contain a query clause, but may contain modules, classes, and predicates.
For further information on the data flow libraries, see the following links:
- `Java data flow library <https://help.semmle.com/qldoc/java/semmle/code/java/dataflow/DataFlow.qll/module.DataFlow.html>`__
- `C/C++ data flow library <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/dataflow/DataFlow.qll/module.DataFlow.html>`__
- `C# data flow library <https://help.semmle.com/qldoc/csharp/semmle/code/csharp/dataflow/DataFlow.qll/module.DataFlow.html>`__
A **module** is a way of organizing QL code by grouping together related predicates, classes, and (sub-)modules. They can be either explicitly declared or implicit. A query library implicitly declares a module with the same name as the QLL file.
For further information on libraries and modules in QL, see the chapter on `Modules <https://help.semmle.com/QL/ql-handbook/modules.html>`__ in the QL language handbook.
For further information on importing QL libraries and modules, see the chapter on `Name resolution <https://help.semmle.com/QL/ql-handbook/name-resolution.html>`__ in the QL language handbook.
Data flow graph
===============
- Class ``DataFlow::Node`` represents data flow graph nodes
- Predicate ``DataFlow::localFlowStep`` represents local data flow graph edges, ``DataFlow::localFlow`` is its transitive closure
- Data flow graph nodes are *not* AST nodes, but they correspond to AST nodes, and there are predicates for mapping between them:
- ``Expr Node.asExpr()``
- ``Parameter Node.asParameter()``
- ``DataFlow::Node DataFlow::exprNode(Expr e)``
- ``DataFlow::Node DataFlow::parameterNode(Parameter p)``
- ``etc.``
.. note::
The ``DataFlow::Node`` class is shared between both the local and global data flow graphs–the primary difference is the edges, which in the “global” case can link different functions.
``localFlowStep`` is the “single step” flow relation–that is, it describes single edges in the local data flow graph. ``localFlow`` represents the `transitive <https://help.semmle.com/QL/ql-handbook/recursion.html#transitive-closures>`__ closure of this relation–in other words, it contains every pair of nodes where the second node is reachable from the first in the data flow graph.
The data flow graph is separate from the `AST <https://en.wikipedia.org/wiki/Abstract_syntax_tree>`__, to allow for flexibility in how data flow is modeled. There are a small number of data flow node types–expression nodes, parameter nodes, uninitialized variable nodes, and definition by reference nodes. Each node provides mapping functions to and from the relevant AST (for example ``Expr``, ``Parameter`` etc.) or symbol table (for example ``Variable``) classes.
Taint tracking
==============
- Usually, we want to generalise slightly by not only considering plain data flow, but also “taint” propagation, that is, whether a value is influenced by or derived from another.
- Examples:
.. code-block:: java
sink = source; // source -> sink: data and taint
strcat(sink, source); // source -> sink: taint, not data
- Library ``semmle.code.<language>.dataflow.TaintTracking`` provides predicates for tracking taint; ``TaintTracking::localTaintStep`` represents one (local) taint step, ``TaintTracking::localTaint`` is its transitive closure.
.. note::
Taint tracking can be thought of as another type of data flow graph. It usually extends the standard data flow graph for a problem by adding edges between nodes where one one node influences or *taints* another.
The taint-tracking API is almost identical to that of the local data flow. All we need to do to switch to taint tracking is ``import semmle.code.<language>.dataflow.TaintTracking`` instead of ``semmle.code.<language>.dataflow.DataFlow``, and instead of using ``localFlow``, we use ``localTaint``.
- `Java taint-tracking library <https://help.semmle.com/qldoc/java/semmle/code/java/dataflow/TaintTracking.qll/module.TaintTracking.html>`__
- `C/C++ taint-tracking library <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/dataflow/TaintTracking.qll/module.TaintTracking.html>`__
- `C# taint-tracking library <https://help.semmle.com/qldoc/csharp/semmle/code/csharp/dataflow/TaintTracking.qll/module.TaintTracking.html>`__

Просмотреть файл

@ -0,0 +1,24 @@
Path queries
============
Path queries provide information about the identified paths from sources to sinks. Paths can be examined in the Path Explorer view.
Use this template:
.. code-block:: ql
/**
* ...
* @kind path-problem
*/
import semmle.code.<language>.dataflow.TaintTracking
import DataFlow::PathGraph
...
from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select sink, source, sink, "<message>"
.. note::
To see the paths between the source and the sinks, we can convert the query to a path problem query. There are a few minor changes that need to be made for this to work–we need an additional import, to specify ``PathNode`` rather than ``Node``, and to add the source/sink to the query output (so that we can automatically determine the paths).

Просмотреть файл

@ -0,0 +1 @@
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the codebase.

Просмотреть файл

@ -0,0 +1,175 @@
.. Template for rst slide shows
.. Key points:
- Each heading marks the start of a new slide
- The default slide style is a plain white-ish background with minimal company branding
- Different slide designs have been preconfigured. To choose a different layout
use the appropriate .. rst-class:: directive. For examples of the different designs,
see the template below. This directive can also be used to create custom classes for individual
images and slide backgrounds if necessary. Additional CSS styles may also be required when using custom
class directives. Search for 'deck-specific styles for individual images` in default.css for examples
of how to implement custom class styles.
- Additional notes can be added to a slide using a .. note:: directive
- Press P to access the additional notes on the rendered slides.
- Press F is go into full screen mode when viewing the rendered slides.
.. Title slide. Includes the deck title, subtitles, and the company logo
===================
Template slide deck
===================
.. container:: subheading
First subheading
Second subheading
.. container:: semmle-logo
Semmle :sup:`TM`
.. Set up slide. Include link to QL4E snapshots required for examples
.. rst-class:: setup
Setup
=====
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- A snapshot
.. note::
Some notes about the project, perhaps a link to the project page on LGTM.
.. Agenda slide. Explaining what is to be covered in the presentation
.. rst-class:: Agenda
Agenda
======
- Item 1
- Item 2
- Item 3
- Item 4
- Item 5
Text
====
If you don't specify an rst-class, you default to the 'basic' slide design.
You can fit about this much text on a slide:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident,
sunt in culpa qui officia deserunt mollit anim id est laborum.
Code sample
===========
Use a ``.. code-block::`` directive to include a code snippet.
Specify the language after the directive to add syntax highlighting.
.. code-block:: ql
import cpp
from AddExpr a, Variable v, RelationalOperation cmp
where
a.getAnOperand() = v.getAnAccess() and
cmp.getAnOperand() = a and
cmp.getAnOperand() = v.getAnAccess()
select cmp, "Overflow check."
Columns and graphs
==================
.. container:: column-left
``.. container:: column-left`` sets up a column on the left of the slide.
``.. container:: column-right`` sets up a column on the right of the slide.
Code can be included in columns:
.. code-block:: ql
import cpp
from IfStmt ifstmt, Block block
where
block = ifstmt.getThen() and
block.isEmpty()
select ifstmt, "This if-statement is redundant."
.. container:: column-right
Graphs can be built from text using a ``.. graphviz directive``.
See the source file for details.
.. graphviz::
digraph {
graph [ dpi = 1000 ]
node [shape=polygon,sides=4,color=blue4,style="filled,rounded", fontname=consolas,fontcolor=white]
a [label=<tainted<BR /><FONT POINT-SIZE="10">ParameterNode</FONT>>]
b [label=<tainted<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
c [label=<x<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
d [label=<x<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
a -> b
b -> {c, d}
}
.. You can indicate a new concept by using a purple slide background
.. rst-class:: background2
Purple background
=================
Subheading on purple slide
Including snippets
==================
rst snippets can be included using:
.. code-block:: none
.. include:: path/to/file.rst
Code snippets can be included using:
.. code-block:: none
.. literalinclude:: path/to/file.ql
:language: ql
:emphasize-lines: 3-6
Specify the language to apply syntax highlighting and the lines of the fragment that you want to emphasize.
Further details
===============
- For more information on writing in reStructuredText, see http://docutils.sourceforge.net/rst.html.
- For more information on Sphinx, see https://www.sphinx-doc.org.
- For more information about hieroglpyh, the Sphinx extension used to generate these slides, see https://github.com/nyergler/hieroglyph.
- For more information about creating graphs, see https://build-me-the-docs-please.readthedocs.io/en/latest/Using_Sphinx/UsingGraphicsAndDiagramsInSphinx.html.
.. The final slide with the company details is generated automatically.