This recursive predicate is made faster by working around a known
optimizer problem (QL-796) that causes the optimizer to insert extra
type checks in recursive case even when they are only needed in the
base case.
This reduces the number of bounds computed, and will simplify use of the
library. The resulting locations in the tests may be slightly strange,
because the example `Instruction` for a `ValueNumber` is the first
appearing in the IR, regardless of source order, and may not be the most
closely related `Instruction` to the bounded value. I think that's worth
doing for the performance and usability benefits.
I changed to detect any logical operation usage (i.e. !, ==), but I kept usage in a conditional directly as a separate detection condition. I found no false positives on the projects you shared with me previously.
This implements calculation of the control-flow graph in QL. The new
code is not enabled yet as we'll need more extractor changes first.
The `SyntheticDestructorCalls.qll` file is a temporary solution that can
be removed when the extractor produces this information directly.
The existing unreachable IR removal code only retargeted an infeasible edge to an `Unreached` instruction if the successor of the edge was an unreachable block. This is too conservative, because it doesn't remove an infeasible edge that targets a block that is still reachable via other paths. The trivial example of this is `do { } while (false);`, where the back edge is infeasible, but the body block is still reachable from the loop entry.
This change retargets all infeasible edges to `Unreached` instructions, regardless of the reachability of the successor block.
With these changes we can run `odasa prepareQueries --check-only
--fail-on-warnings` on the C++ query directory. Two changes were needed:
1. The `Metrics/queries.xml` file had to be deleted. It existed because
the built distribution has a different file layout, where `Metrics`
is moved to the top-level query dir `odasa-cpp-metrics`. Since
internal PR 28230 this file is created as needed as part of the dist
build process, so it doesn't need to be checked in with the sources.
2. All uses of the `deprecated` and stubbed-out Objective C classes were
removed.
The AST-based data flow libraries and the IR-based ones both define
modules `DataFlow`, `DataFlow2`, etc. This caused
`ImportAdditionalLibraries.ql` to fail in compilation.
This change removes any IR instructions that can be statically proven unreachable. To detect unreachable IR, we first run a simple constant value analysis on the IR. Then, any `ConditionalBranch` with a constant condition has the appropriate edge marked as "infeasible". We define a class `ReachableBlock` as any `IRBlock` with a path from the entry block of the function. SSA construction has been modified to operate only on `ReachableBlock` and `ReachableInstruction`, which ensures that only reachable IR gets translated into SSA form. For any infeasible edge where its predecessor block is reachable, we replace the original target of the branch with an `Unreached` instruction, which lets us preserve the invariant that all `ConditionalBranch` instructions have both a true and a false edge, and allows guard inference to still work.
The changes to `SSAConstruction.qll` are not as scary as they look. They are almost entirely a mechanical replacement of `OldIR::IRBlock` with `OldBlock`, which is just an alias for `ReachableBlock`.
Note that the `constant_func.ql` test can determine that the two new test functions always return 0.
Removing unreachable code helps get rid of some common FPs in IR-based dataflow analysis, especially for constructs like `while(true)`.
This change moves the simple constant analysis that was used by the const_func test into a pyrameterized module for use on any stage of the IR. This will be used to detect unreachable code.
This sort of fixes one FP and causes a new FN, but for the wrong reasons. The IR dataflow is tracking the reference itself, rather than the referred-to object. Once we can better model indirections, we can make this work correctly.
This change is still the right thing to do, because it ensures that the dataflow is looking at actual expression being computed by the instruction.
I've separated the model interface for memory side effects from the model for escaped addresses. It will be fairly common for a given model to extend both interfaces, but they are used for two different purposes.
I've also put each model interface and the non-member predicates that query it into a named module, which seemed cleaner than having predicates named `functionModelReadsMemory()` and `getFunctionModelParameterAliasBehavior()`.
These type checks were overlapping with `assignOperatorWithWrongType` is
are no longer needed now that `assignOperatorWithWrongType` is improved.
They were causing FPs and misleading error messages on uninstantiated
templates.
Adding this call to `getUnspecifiedType` makes the error message better
in the presence of typedefs and qualifiers on an assignment operator
return type. It's also needed to avoid losing valid results in the
commit that comes after this.
The predicate `AlwaysTrueUponEntryLoop.getARelevantVariable` was very
sensitive to join ordering, and with the 1.19 QL engine it got an
unfortunate join order that made it explode on certain snapshots. With
this change, it goes from taking minutes to taking less than a second on
a libretro-uae snapshot.
Made `Node::getType()`, `Node::asParameter()`, and `Node::asUninitialized()` operate directly on the IR. This actually fixed several diffs compared to the AST dataflow, because `getType()` wasn't holding for nodes that weren't `Exprs`.
Made `Uninitialized` a `VariableInstruction`. This makes it consistent with `InitializeParameter`.
This commit adds a new model interface that describes the known side effects (or lack thereof) of a library function. Does it read memory, does it write memory, and do any of its parameters escape? Initially, we have models for just two Standard Library functions: `std::move` and `std::forward`, which neither read nor write memory, and do not escape their parameter.
IR construction has been updated to insert the correct side effect instruction (or no side effect instruction) based on the model.
This fixes a subtle bug in the construction of aliased SSA. `getResultMemoryAccess` was failing to return a `MemoryAccess` for a store to a variable whose address escaped. This is because no `VirtualIRVariable` was being created for such variables. The code was assuming that any access to such a variable would be via `UnknownMemoryAccess`. The result is that accesses to such variables were not being modeled in SSA at all.
Instead, the way to handle this is to have a `VariableMemoryAccess` even when the variable being accessed has escaped, and to have `VariableMemoryAccess::getVirtualVariable()` return the `UnknownVirtualVariable` for escaped variables. In the future, this will also let us be less conservative about inserting `Chi` nodes, because we'll be able to determine that there's an exact overlap between two accesses to the same escaped variable in some cases.
The AST dataflow library essentially ignores conversions, which is probably the right behavior. Converting an `int` to a `long` preserves the value, even if the bit pattern might be different. It's arguable whether narrowing conversions should be treated as dataflow, but we'll do so for now. We can revisit that if we see it cause problems.
I noticed that queries using the data flow library spent significant
time in `#Dominance::bbIDominates#fbPlus`, which is the body of the
`bbStrictlyDominates` predicate. That predicate took 28 seconds to
compute on Wireshark.
The `b` in the predicate name means that magic was applied, and the
application of magic meant that it could not be evaluated with the
built-in `fastTC` HOP but became an explicit recursion instead. Applying
`pragma[nomagic]` to this predicate means that we will always get it
evaluated with `fastTC`, and that takes less than a second in my test
case.
This predicate was fast with the queries and engine from 1.18. With the
queries from `master` it got a bad join order in the
`UninitializedLocal.ql` query, which made it take 2m34s on Wireshark.
This commit decomposes `bbEntryReachesLocally` into two predicates that
together take only 4s.
The `nullValue` predicate performs a slow custom data-flow analysis to
find possible null values. It's so slow that it timed out after 1200s on
Wireshark.
In `UnsafeCreateProcessCall.ql`, the values found with `nullValue` were
used as sources in another data-flow analysis. By using the `NullValue`
class as sink instead of `nullValue`, we avoid the slow-down of doing
data flow twice. The `NullValue` class is essentially the base case of
`nullValue`. Confusing names, yes.
This commit adds Chi nodes to the successor relation and accounts for
them in the CFG, but does not add them to the SSA data graph. Chi nodes
are inserted for partial writes to any VirtualVariable, regardless of
whether the partial write reaches any uses.
This enables the addition of new instructions in later phases of IR
construction; in particular, aliasing write instructions and inference
instructions.
This test was failing due to a semantic merge conflict between #509,
which added `UninitializedInstruction`, and #517, which added new test
code that would get `UninitializedInstruction`s in it after merging with #509.
This adds a `NewOrNewArrayExpr.getPlacementPointer` predicate and uses
it in `Alloc.qll` to detect when a `new`-expression is not an
allocation.
User-defined replacements for `operator new` may not be allocations
either, but the code continues to assume that they are. It's possible
that we want to change this assumption in the future or leave it up to
individual queries to decide on which side to err. It's hard to
statically tell whether `operator new` has been overloaded in a
particular file because it can be overloaded by a definition that is not
in scope but is only linked together with that file.