`getLocation()` will now exists for all entities. When there is no
valid location, the location will still not be emitted in the DB, but
on the QL side we will then assign a special `UnknownLocation` with
empty filename and 0 for line/column start/end.
This unknown location is currently emitted (with a unique `@` key) at
the start of every extraction, but we can move it elsewhere (and
possibly in a unique global trap file) at a later stage, possibly after
or when we rework the trap file strategy.
This should solve flakiness that was observed on the control flow tests,
which is probably caused by the `nodes` predicate in the `TestOutput`
class in `ControlFlowGraphImplShared.qll` not able to assign a proper
rank when the node does not have a location.
This adds:
* a base `README.md` file to `codegen`
* module docstrings for the modules in `generators`
* help strings on all command line flags
Moreover some unneeded command line flags (`--namespace`,
`--include-dir` and `--trap-affix`) have been dropped.
While the (removed) comment is correct and the frontend can be called in
different modes, both `swift build` and `xcodebuild` always use
`-primary-files` when compiling Swift projects.
The other mode was present only within our test runner (`qltest.sh`), so
removing it and doing what the official build systems do simplifies our
code base.
Additionally, file archival is now a separate function/operation.
Python code was simplified, and now a `--generate` option can be used
to drive what can be generated.
The extractor pack creation now will use an internally generated
dbscheme. This should be the same as the checked in one, but doing so
allows `bazel run create-extractor-pack` and `bazel run codegen` to be
run independently from one another, while previously the former had to
follow the latter in case of a schema change. This is the change that
triggered the above simplification, as in order for the two dbscheme
files to be identical, the first `// generated` line had to state the
same generator script.
By explicitly marking children in the `schema.yml` file, an internal
`getAChild` predicate is implemented, that is in turn used in `AstNode`
to implement `getParent`.
This is yet to be used in the control flow library to replace the
hand-rolled implementation.
A further, more complex step is to use the same information to fully
generate the core implementation of `PrintAst` (including the
accessor string). This will be done later.
The `parent` tests use the same swift code as the extractor tests, and
this is currently enforced by `sync-files.py`. Notice that `qltest.sh`
had to be modified to deal with multiple files, which was not working
yet.
While we need universal references to catch different value categories,
we don't need perfect forwarding as `fetchLabel` does not behave
differently on lvalue and rvalues.
That class was meant to allow aggregate initialization of generated
C++ entries having the label `id` as first argument.
As aggregate initialization turned out to be undesirable (names of
fields are not explicit, and `{}` must be inserted for empty
superclasses), this commit removes it and disallows aggregate
initialization altogether by defining empty constructors for generated
classes.
necessary because the CFG library doesn't support the following
two requirements simultaneously:
1. Traverse AST classes by virtual dispatch
2. Construct ControlFlowElements from non-AST classes
Because the CFG trees derive from the a base type that must be a
subtype of `ControlFlowElement`. So if we make `ControlFlowElement`
an IPA type, we cannot write:
```
class AssignTree extends PostOrderTree instanceof AssignExpr { ... }
```
because `AssignExpr` is not a subtype of PostOrderTree (since
PostOrderTree is now a subtype of the new IPA type).
To fix this, Tom suggested the following (which is implemented in
this PR):
1. Create a copy of the CFG tree classes (i.e., Pre/PostOrderTree,
LeafTree, etc.) and call them AstPreOrderTree/AstPostOrderTree,
AstLeafTree, etc.
2. For each tree AstTree from step 1, create a instance of the
internal CFG library's appropriate class.
3. In `ControlFlowGraphImpl`, proceed as normal with virtual
dispatch using `instanceof`, but extend the AstTree classes
from step 1 instead of the CFG's own tree classes.
This works because each AstTree implements one of the CFG
library's tree classes (as per step 2).
This commit performs step 1 and 2. Step 3 will be the next commit.
This transfers the current state of `StmtVisitor` in the PoC, plus some
changes required for the update to swift 5.6.
Also `getLabel` in `SwiftDispatcher` got renamed to `createLabel`, and
is now correctly outputting the label assignment to the trap file.
This transfers the current state of `DeclVisitor` from the
proof-of-concept.
TODO: make the `declarations` tests in `extractor-tests` more
comprehensive.
This allows to avoid bypassing label type correcness in the extractor,
and allows to independently resolve TBD extractions, as with this
approach TBD nodes do have the correctly typed trap label. The TBD
status is now a predicate on the QL side.
This requires:
* a default visit using the correct type, which is achieved via macro
metaprogramming in `VisitorBase.h`, following the way
`swift::ASTVisitor` is programmed
* a mapping from labels to corresponding binding trap entries. The
functor is defined in `TrapTagTraits.h` and instantiated in generated
`TrapEntries.h`
* Binding trap entries for TBD unknown entities must not have any other
field than the `id` (after all, we are supposed to not extract them
yet). This is why all unextracted fields in `schema.yml` have been
commented out, and will be uncommentend when visitors are added
This is solving a papercut, where the C++ build was relying on the
local dbscheme file to be up-to-date, even if all the information for
building is actually in `schema.yml`. This made a pure C++ development
cycle with changes to `schema.yml` clumsy, as it required a further
dbscheme generation step.
Now for C++ the dbscheme is generated internally in the build files, and
thus a change in `schema.yml` is reflected immediately in the C++ build.
A `swift/codegen` step for checked in generated code (including the
dbscheme) is still required, but a developer can do it just before
running QL tests or committing, instead of during each C++
recompilation.
Some directory reorganization was also carried out, moving specific
generator modules to a new `generators` python package, and only leaving
the two drivers at the top level.
This allows to avoid bypassing label type correcness in the extractor,
and allows to independently resolve TBD extractions, as with this
approach TBD nodes do have the correctly typed trap label. The TBD
status is now a predicate on the QL side.
This requires:
* a default visit using the correct type, which is achieved via macro
metaprogramming in `VisitorBase.h`, following the way
`swift::ASTVisitor` is programmed
* a mapping from labels to corresponding binding trap entries. The
functor is defined in `TrapTagTraits.h` and instantiated in generated
`TrapEntries.h`
* Binding trap entries for TBD unknown entities must not have any other
field than the `id` (after all, we are supposed to not extract them
yet). This is why all unextracted fields in `schema.yml` have been
commented out, and will be uncommentend when visitors are added
This turned out easier than expected previously. `llvm::PointerUnion`
was also considered, which would have less memory footprint, but it
would require more effort as it is lacking the same implicit conversions
and operators that `std::variant` provides.
Also renamed `ToTag<E>` to `TrapTagOf<E>` and introduced a derived
convenience functor `TrapLabelOf<E>`.
These changes are required to allow a new type-safe approach to TBD
nodes, that will come in a separate commit.
This introduces:
* the possibility to add properties to the root `Element`
* a functor taking tags to the corresponding binding trap entry
* `hasProp()` methods for optional properties in QL
* `getPrimaryQlClass()` method
Properties marked with `predicate` in the schema are now accepted.
* in the dbscheme, they will translate to a table with a single `id`
column (and the table name will not be pluralized)
* in C++ classes, they will translate to `bool` fields
* in QL classes, they will translate to predicates
Closes https://github.com/github/codeql-c-team/issues/1016
Providing `--dynamic_mode=fully` (for example setting it in
`local.bazelrc`) will now work.
All runfiles are now copied in the extractor pack: in dynamic mode,
those will be the executable and the dynamic libraries, while in static
mode only the executable will be part of the runfiles.
Setting the correct `LD_LIBRARY_PATH` in `qltest.sh` then allows to
run tests with this pakcage. If we need something more, we can switch to
a wrapper script in place of `extractor` in the future.
Notice that `LD_LIBRARY_PATH` is also set in static mode, but that has
no consequence.
This checks in the trapgen script generating trap entries in C++.
The codegen suite has been slightly reorganized, moving the templates
directory up one level and chopping everything into smaller bazel
packages. Running tests is now done via
```
bazel run //swift/codegen/test
```
With respect to the PoC, the nested `codeql::trap` namespace has been
dropped in favour of a `Trap` prefix (or suffix in case of entries)
within the `codeql` namespace. Also, generated C++ code is not checked
in in git any more, and generated during build. Finally, labels get
printed in hex in the trap file.
`TrapLabel` is for the moment only default-constructible, so only one
single label is possible. `TrapArena`, that is responsible for creating
disjoint labels will come in a later commit.
Tests can be run with
```
bazel test //swift/codegen:tests
```
Coverage can be checked installing `pytest-cov` and running
```
pytest --cov=swift/codegen swift/codegen/test
```
This patch introduces the basic infrastructure of the code generation
suite and the `dbscheme` generator.
Notice that the checked in `schema.yml` should reflect swift 5.6 but
might need some tweaking.
Closes https://github.com/github/codeql-c-team/issues/979
* fixed 5.0.0 as bazel version
* made dependencies better loadable
* moved `//swift/install` to `//swift:create-extractor-pack` (following
the clearer ruby naming)
* renamed `extractor_pack` to `extractor-pack` for consistency with Ruby
This adds a first dummy extractor for swift.
Running `bazel run //swift:install` will create an `extractor_pack`
directory in `swift`. From that moment providing `--search-path=swift`
will pick up the extractor.