Граф коммитов

206 Коммитов

Автор SHA1 Сообщение Дата
Tom Hvitved d5b8c9728c Update `identifal-files.json` 2020-06-10 09:40:44 +02:00
Dave Bartolomeo 0ae98e78a2 Merge remote-tracking branch 'github/master' into github/codeql-c-analysis-team/69_union 2020-06-08 11:20:14 -04:00
Dave Bartolomeo 0666a2e587 Remove usage of f-string 2020-06-04 08:48:14 -04:00
Dave Bartolomeo a18eba2c4c Allow missing files in `sync-files --latest`
When running `sync-files` (or `sync-identical-files`) with the `--latest` switch, if one or more of the files in a group does not exist, the script will crash. This happens all the time when I add a new group, or add a new file path in an existing group. This has bothered me for a long time, so I finally fixed it when I ran into it again today.

I've changed the script as follows:
- If _none_ of the paths in the group exist, print an error message listing the paths in the group. This happens with or without `--latest`.
- If `--latest` is specified, copy the master file to the paths of the missing files.
2020-06-03 14:53:31 -04:00
Dave Bartolomeo bbadf4b4bb C#: Port `TInstruction`-sharing support from C++
This updates C#'s IR to share `TInstruction` across stages the same way C++ does. The only interesting part is that, since we have not yet ported full alias analysis to C#, I stubbed out the required parts of the aliased SSA interface in `AliasedSSAStub.qll`.
2020-06-03 13:52:19 -04:00
Dave Bartolomeo 1e863ac40b C++: Share `TInstruction` across IR stages
Each stage of the IR reuses the majority of the instructions from previous stages. Previously, we've been wrapping each reused old instruction in a branch of the `TInstruction` type for the next stage. This causes use to create roughly three times as many `TInstruction` objects as we actually need.

Now that IPA union types are supported in the compiler, we can share a single `TInstruction` IPA type across stages. We create a single `TInstruction` IPA type, with individual branches of this type for instructions created directly from the AST (`TRawInstruction`) and for instructions added by each stage of SSA construction (`T*PhiInstruction`, `T*ChiInstruction`, `T*UnreachedInstruction`). Each stage then defines a `TStageInstruction` type that is a union of all of the branches that can appear in that particular stage. The public `Instruction` class for each phase extends the `TStageInstruction` type for that stage.

The interface that each stage exposes to the pyrameterized modules in the IR is now split into three pieces:
- The `Raw` module, exposed only by the original IR construction stage. This module identifies which functions have IR, which `TRawInstruction`s exist, and which `IRVariable`s exist.
- The `SSA` module, exposed only by the two SSA construction stages. This identifiers which `Phi`, `Chi`, and `Unreached` instructions exist.
- The global module, exposed by all three stages. This module has all of the predicates whose implementation is different for each stage, like gathering definitions of `MemoryOperand`s.

Similarly, there is now a single `TIRFunction` IPA type that is shared across all three stages. There is a single `IRFunctionBase` class that exposes the stage-indepdendent predicates; the `IRFunction` class for each stage extends `IRFunctionBase`.

Most of the other changes are largely mechanical.
2020-06-01 11:15:29 -04:00
Dave Bartolomeo 09d1da2f7a C++/C#: Rename `sanity` -> `consistency`
I did both of these languages together because they share some of the changed code via `identical-files.json`.
2020-05-11 13:29:52 -04:00
Sauyon Lee 972551edd7
sync-files.py: cast line to string before concat 2020-04-23 15:32:28 -07:00
Mathias Vorreiter Pedersen 1e73528102 C++/C#: Add synchronization 2020-04-03 10:08:00 +02:00
Jonas Jensen 93f7c950ea
Merge pull request #3152 from dbartol/dbartol/sync-files
Move `sync-identical-files.py` into public repo as `sync-files.py`
2020-03-31 08:31:00 +02:00
Dave Bartolomeo 3eef2747d5 Fix LGTM alerts 2020-03-29 03:12:27 -04:00
Dave Bartolomeo 0952064eb3 Move `sync-identical-files.py` into public repo as `sync-files.py`
We currently use a script to keep certain duplicate QL files in sync across the repo. For historical reasons, this script has lived in the private repo alongside the rest of CodeQL, even though it's only used for files in the public `ql` repo. This PR moves the script into the public `ql` repo. It is still invoked by Jenkins scripts that live in the private repo during CI, but it can also be invoked directly without having a checkout of the private repo. This is useful for anyone who is modifying the dataflow or IR libraries with only a QL checkout.
2020-03-29 02:59:14 -04:00
Anders Schack-Mulligen 67d386b5ba C++/C#: Add synchronization. 2020-02-27 14:10:16 +01:00
Mathias Vorreiter Pedersen d4c6f487bc C++/C#: Fix sync config file for value numbering sharing 2020-02-13 22:32:52 +01:00
Robert Marsh ffaaed0550 C++: separate IR ValueNumber newtype and interface 2020-02-06 15:35:20 +01:00
Dave Bartolomeo d12b140921 C++/C#: Update shared file list 2020-01-28 10:55:38 -07:00
Dave Bartolomeo 9d35ff73c4 C++/C#: Make escape analysis unsound by default
When building SSA, we'll be assuming that stack variables do not escape, at least until we improve our alias analysis. I've added a new `IREscapeAnalysisConfiguration` class to allow the query to control this, and a new `UseSoundEscapeAnalysis.qll` module that can be imported to switch to the sound escape analysis. I've cloned the existing IR and SSA tests to have both sound and unsound versions. There were relatively few diffs in the IR dump tests, and the sanity tests still give the same results after one change described below.

Assuming that stack variables do not escape exposed an existing bug where we do not emit an `Uninitialized` instruction for the temporary variables used by `return` statements and `throw` expressions, even if the initializer is a constructor call or array initializer. I've refactored the code for handling elements that initialize a variable to share a common base class. I added a test case for returning an object initialized by constructor call, and ensured that the IR diffs for the existing `throw` test cases are correct.
2020-01-22 00:15:30 -07:00
Tom Hvitved 82c368e13e C#: Sync `XML.qll` with other languages 2019-12-19 10:26:08 +01:00
Max Schaefer 81f51e4e2b Ensure that XML libraries for C++, Java, JavaScript and Python stay in sync. 2019-12-17 10:15:43 +00:00
Dave Bartolomeo 51ff262cbc C++/C#: Add IR SSA sanity tests 2019-11-22 13:16:05 -07:00
Dave Bartolomeo bcd987cdf1 Merge from master and share value numbering 2019-09-27 17:40:43 -07:00
Dave Bartolomeo f76334c24a C++, C#: Share unaliased SSA files between languages
Most of the C# diffs are from bringing those files in sync with the latest C++ files.
2019-09-27 13:46:42 -07:00
Dave Bartolomeo c389432922 C++, C#: Sync `IRType.qll` between languages 2019-09-26 22:11:24 -07:00
Jonas Jensen 80a0027808 C++: Shared TaintTrackingImpl for IR TaintTracking 2019-09-10 09:40:27 +02:00
Jonas Jensen e9a029cba3 C++: Local field flow using global library
This commit removes fields from the responsibilities of `FlowVar.qll`.
The treatment of fields in that file was slow and imprecise.

It then adds another copy of the shared global data flow library, used
only to find local field flow, and it exposes that local field flow
through `localFlow` and `localFlowStep`.

This has a performance cost. It adds two cached stages to any query that
uses `localFlow`: the stage from `DataFlowImplCommon`, which is shared
with all queries that use global data flow, and a new stage just for
`localFlowStep`.
2019-09-02 11:17:27 +02:00
Geoffrey White 1215da2d6c
Merge pull request #1827 from jbj/sbb-tidy
C++: Tidy up SubBasicBlocks.qll
2019-08-29 15:42:40 +01:00
AndreiDiaconu1 c74898ec9f Synced files
Synced the files that are needed for this PR
2019-08-28 12:25:14 +01:00
AndreiDiaconu1 de6f547088 Synced more files 2019-08-28 12:25:13 +01:00
Dave Bartolomeo 609ca034c0 C#/C++: Share IR implementation 2019-08-28 12:25:13 +01:00
Jonas Jensen 17ee3f555c C++: Sync the two copies of SubBasicBlocks.qll
These files are now added to `identical-files.json` so they will remain
in sync.
2019-08-26 16:01:36 +02:00
Jonas Jensen 25701f203d C++/C#/Java: Shared TaintTrackingImpl.qll
This file is now identical in all languages. Unifying this file led to
the following changes:
- The documentation spelling fixes and example from the C++ version
  were copied to the other versions and updated.
- The steps through `NonLocalJumpNode` from C# were abstracted into a
  `globalAdditionalTaintStep` predicate that's empty for C++ and Java.
- The `defaultTaintBarrier` predicate from Java is now present but empty
  on C++ and C#.
- The C++ `isAdditionalFlowStep` predicate on
  `TaintTracking::Configuration` no longer includes `localFlowStep`.
  That should avoid some unnecessary tuple copying.
2019-08-21 14:55:54 +02:00
Jonas Jensen 11583b69e0 C#: Use pyrameterized modules for TaintTracking
To keep the code changes minimal, and to keep the implementation similar
to C++ and Java, the `TaintTracking{Public,Private}` files are now
imported together through `TaintTrackingUtil`. This has the side effect
of exposing `localAdditionalTaintStep`. The corresponding predicate for
Java was already exposed.
2019-08-20 13:45:38 +02:00
Jonas Jensen aeb2323128 Java: Use pyrameterized modules for TaintTracking 2019-08-20 13:45:37 +02:00
Jonas Jensen d388be7d3b C++: Use pyrameterized modules for TaintTracking 2019-08-20 13:45:37 +02:00
Tom Hvitved 081ee9944d C#: Add more copies of the data flow library 2019-08-07 10:41:39 +02:00
Robert Marsh 6bd22b01b3
Merge pull request #1607 from dave-bartolomeo/dave/CrossLanguageIR
C++: Start preparing IR for supporting multiple languages
2019-07-29 12:34:21 -07:00
Anders Schack-Mulligen 3d340d4fba Java: Delete deprecated dependency DataFlowImplDepr. 2019-07-25 11:18:01 +02:00
Dave Bartolomeo efa854ea3e C++: Add `*Imports.qll` files to identical-files.json 2019-07-19 15:38:11 -07:00
Tom Hvitved a6fa6dfd74 C#: Add shared data flow files 2019-05-06 14:54:11 +02:00
Dave Bartolomeo b5a3edfdae C++: FunctionIR -> IRFunction 2019-03-12 11:28:22 -07:00
Dave Bartolomeo b40fd95b8e C++: Better tracking of SSA memory accesses
This change fixes a few key problems with the existing SSA implementations:

For unaliased SSA, we were incorrectly choosing to model a local variable that had accesses that did not cover the entire variable. This has been changed to ensure that all accesses to the variable are at offset zero and have the same type as the variable itself. This was only possible to fix now that every `MemoryOperand` has its own type.

For aliased SSA, we now correctly track the offset and size of each memory access using an interval of bit offsets covered by the access. The offset interval makes the overlap computation more straightforward. Again, this is only possible now that operands have types.
The `getXXXMemoryAccess` predicates are now driven by the `MemoryAccessKind` on the operands and results, instead of by specific opcodes.

This change does fix an existing false negative in the IR dataflow tests.

I added a few simple test cases to the SSA IR tests, covering the various kinds of overlap (MustExcactly, MustTotally, and MayPartially).

I added "PrintSSA.qll", which can dump the SSA memory accesses as part of an IR dump.
2019-02-13 10:44:39 -08:00
Dave Bartolomeo 56bb9dcde0 C++: Remove infeasible edges to reachable blocks
The existing unreachable IR removal code only retargeted an infeasible edge to an `Unreached` instruction if the successor of the edge was an unreachable block. This is too conservative, because it doesn't remove an infeasible edge that targets a block that is still reachable via other paths. The trivial example of this is `do { } while (false);`, where the back edge is infeasible, but the body block is still reachable from the loop entry.

This change retargets all infeasible edges to `Unreached` instructions, regardless of the reachability of the successor block.
2018-12-14 12:13:22 -08:00
Dave Bartolomeo 5ba51e32f0 C++: Remove aliased_ssa instantiation of IR reachability
We never actually consumed this iteration, since SSA construction only depends on the reachability instantiation of the previous IR layer.
2018-12-10 21:22:55 -08:00
Dave Bartolomeo 99d33f9623 C++: Remove unreachable IR
This change removes any IR instructions that can be statically proven unreachable. To detect unreachable IR, we first run a simple constant value analysis on the IR. Then, any `ConditionalBranch` with a constant condition has the appropriate edge marked as "infeasible". We define a class `ReachableBlock` as any `IRBlock` with a path from the entry block of the function. SSA construction has been modified to operate only on `ReachableBlock` and `ReachableInstruction`, which ensures that only reachable IR gets translated into SSA form. For any infeasible edge where its predecessor block is reachable, we replace the original target of the branch with an `Unreached` instruction, which lets us preserve the invariant that all `ConditionalBranch` instructions have both a true and a false edge, and allows guard inference to still work.

The changes to `SSAConstruction.qll` are not as scary as they look. They are almost entirely a mechanical replacement of `OldIR::IRBlock` with `OldBlock`, which is just an alias for `ReachableBlock`.

Note that the `constant_func.ql` test can determine that the two new test functions always return 0.

Removing unreachable code helps get rid of some common FPs in IR-based dataflow analysis, especially for constructs like `while(true)`.
2018-12-10 21:22:55 -08:00
Dave Bartolomeo 59fc77f066 C++: Simple constant analysis
This change moves the simple constant analysis that was used by the const_func test into a pyrameterized module for use on any stage of the IR. This will be used to detect unreachable code.
2018-12-10 21:22:54 -08:00
Dave Bartolomeo 58f7596519 C++: IR-based dataflow 2018-11-30 12:15:11 -08:00
Robert Marsh a33b59103a C++: insert Chi nodes in the IR successor relation
This commit adds Chi nodes to the successor relation and accounts for
them in the CFG, but does not add them to the SSA data graph. Chi nodes
are inserted for partial writes to any VirtualVariable, regardless of
whether the partial write reaches any uses.
2018-11-26 12:08:18 -08:00
Robert Marsh f9ed39915f C++: recompute IRBlock membership at each stage
This enables the addition of new instructions in later phases of IR
construction; in particular, aliasing write instructions and inference
instructions.
2018-11-26 12:08:18 -08:00
Dave Bartolomeo f278f4fa47 C++: Operands as IPA types
@rdmarsh2 has been working on various queries and libraries on top of the IR, and has pointed out that having to always refer to an operand of an instruction by the pair of (instruction, operandTag) makes using the IR a bit clunky. This PR adds a new `Operand` IPA type that represents an operand of an instruction. `OperandTag` still exists, but is now an internal type used only in the IR implementation.
2018-10-23 14:58:44 -07:00
Dave Bartolomeo 5a25602c28 C++: Move GVN out of "internal" directory 2018-09-20 08:21:15 -07:00
Dave Bartolomeo 46b2c19c66 C++: Initial attempt at IR-based value numbering 2018-09-17 17:19:05 -07:00
Dave Bartolomeo fce7a5fccb C++: Final IR reshuffle
Moved IR flavors into "implementation", with internal files under "implementation/internal". Made `IRBlockConstruction` just a nested module of `IRConstruction`/`SSAConstruction`, so it gets picked up from the `Construction` parameter of the `IR` module, rather than being picked up just from being in the same directory as `IRBlock`.
2018-09-04 09:05:27 -07:00
Dave Bartolomeo 9fd5f26e2e C++: Remove unnecessary Impl suffix from some files 2018-09-04 09:05:10 -07:00
Dave Bartolomeo aacee8fecf C++: Reshuffle IR files into a consistent directory structure
There are no real code changes here, other than to fix up `import`s. All tests still hae the same output, as expected.

A future commit will hide the IR flavors other than the one we want queries to use directly.
2018-09-04 09:05:03 -07:00
Jonas Jensen d81e8081be C++: IRBlockConstruction in identical-files.json
These two files have been identical since dc2283325.
2018-08-24 11:41:06 +02:00
Jonas Jensen dabbd83848 C++: Move the identical-files.json spec to ql repo
This file is used by the `sync-identical-files.py` pull-request check in
our internal repo, which can hopefully soon start running on this repo
as well. This initial commit moves over all the file group definitions
whose files are entirely within this repository.

This change is not synchronized with the internal repo, so the file
groups will appear in both repositories until they sync up. That should
not cause any problems.
2018-08-24 11:32:29 +02:00