Having that module in `Instruction.qll` slowed down the parsing of that
file both humans and the compiler.
This commit moves the `InstructionSanity` module to `IRSanity.qll`
without making any changes to its contents apart from adding some
imports.
On Wireshark with 6GB RAM, I've observed `definitionReachesRank` to be
the slowest predicate in the IR. It seems that the implementation was
slow because the optimizer failed to eliminate the common
`reachesRank - 1` subexpression. This led to context being pushed into
the `not`, which got implemented as `MATERIALIZE`. That wouldn't
normally be a disaster, but this is one of the largest predicates in the
IR SSA construction, and iteration 2 was very slow.
Before:
(1505s) Starting to evaluate predicate SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta/4[1]@93f592 (iteration 1)
(1535s) Tuple counts for SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta:
130670697 ~0% {4} r1 = SCAN project#SSAConstruction::DefUse::hasDefinitionAtRank#fffff AS I OUTPUT I.<0>, I.<1>, I.<2>, (I.<2> + 1)
130670697 ~6% {5} r2 = JOIN r1 WITH SSAConstruction::DefUse::exitRank#fff AS R ON FIRST 2 OUTPUT r1.<0>, r1.<1>, r1.<2>, r1.<3>, R.<2>
130670697 ~6% {5} r3 = SELECT r2 ON r2.<3> <= r2.<4>
130670697 ~0% {4} r4 = SCAN r3 OUTPUT r3.<0>, r3.<1>, r3.<2>, r3.<3>
return r4
(1535s) - SSAConstruction::DefUse::definitionReachesRank#ffff_delta has 130670697 rows (order for disjuncts: delta=<standard>).
(1535s) Starting to evaluate predicate SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta/4[2]@866c14 (iteration 2)
(1626s) Tuple counts for SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta:
261341394 ~107% {4} r1 = JOIN SSAConstruction::DefUse::definitionReachesRank#ffff#prev_delta AS L WITH SSAConstruction::DefUse::definitionReachesRank#ffff#join_rhs AS R ON FIRST 3 OUTPUT R.<0>, R.<1>, R.<2>, (1 + L.<3>)
261341394 ~107% {4} r2 = r1 AND NOT SSAConstruction::DefUse::definitionReachesRank#ffff#prev AS R(r1.<0>, r1.<1>, r1.<2>, r1.<3>)
130670697 ~0% {5} r3 = SCAN r2 OUTPUT r2.<0>, r2.<1>, (r2.<3> - 1), r2.<2>, r2.<3>
106034590 ~1% {4} r4 = JOIN r3 WITH project#SSAConstruction::DefUse::hasDefinitionAtRank#fffff AS R ON FIRST 3 OUTPUT r3.<0>, r3.<1>, r3.<3>, r3.<4>
106034590 {4} r5 = MATERIALIZE r4 AS antijoin_rhs
24636107 ~3% {4} r6 = r2 AND NOT r5(r2.<0>, r2.<1>, r2.<2>, r2.<3>)
24636107 ~0% {5} r7 = JOIN r6 WITH SSAConstruction::DefUse::exitRank#fff AS R ON FIRST 2 OUTPUT r6.<0>, r6.<1>, r6.<2>, r6.<3>, R.<2>
2749441 ~0% {5} r8 = SELECT r7 ON r7.<3> <= r7.<4>
2749441 ~4% {4} r9 = SCAN r8 OUTPUT r8.<0>, r8.<1>, r8.<2>, r8.<3>
return r9
(1626s) - SSAConstruction::DefUse::definitionReachesRank#ffff_delta has 2749441 rows (order for disjuncts: delta=<standard>).
After:
(12s) Tuple counts for SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta:
130670697 ~0% {4} r1 = SCAN project#SSAConstruction::DefUse::hasDefinitionAtRank#fffff AS I OUTPUT I.<0>, I.<1>, I.<2>, (I.<2> + 1)
return r1
(12s) - SSAConstruction::DefUse::definitionReachesRank#ffff_delta has 130670697 rows (order for disjuncts: delta=<standard>).
(12s) Starting to evaluate predicate SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta/4[2]@fff64c (iteration 2)
(34s) Tuple counts for SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta:
108784031 ~0% {4} r1 = SSAConstruction::DefUse::definitionReachesRank#ffff#prev_delta AS L AND NOT SSAConstruction::DefUse::exitRank#fff AS R(L.<0>, L.<1>, L.<3>)
2749441 ~5% {4} r2 = r1 AND NOT project#SSAConstruction::DefUse::hasDefinitionAtRank#fffff AS R(r1.<0>, r1.<1>, r1.<3>)
2749441 ~4% {4} r3 = SCAN r2 OUTPUT r2.<0>, r2.<1>, r2.<2>, (r2.<3> + 1)
2749441 ~4% {4} r4 = r3 AND NOT SSAConstruction::DefUse::definitionReachesRank#ffff#prev AS R(r3.<0>, r3.<1>, r3.<2>, r3.<3>)
return r4
(34s) - SSAConstruction::DefUse::definitionReachesRank#ffff_delta has 2749441 rows (order for disjuncts: delta=<standard>).
Note that the row counts are exactly the same before and after.
The charpred of class `GVN` in `ASTValueNumbering.qll` got inlined into
the member predicate `getAnInstruction` and caused a tuple explosion on
Wireshark in the query `StrncpyFlippedArgs.ql`.
I interrupted the predicate after 10 minutes and got these intermediate
tuple counts:
(5208s) Tuple counts for ASTValueNumbering::GVN::getAnInstruction_dispred#ff:
8754900909 ~5% {3} r1 = JOIN ValueNumberingInternal::tvalueNumber#ff_10#join_rhs AS L WITH ValueNumberingInternal::tvalueNumber#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, L.<1>, L.<0>
4390274632 ~150085% {2} r2 = JOIN r1 WITH project#SSAConstruction::Cached::getInstructionUnconvertedResultExpression AS R ON FIRST 1 OUTPUT r1.<2>, r1.<1>
return r2
After this change, the `getAnInstruction` predicate is itself inlined,
like it should be. The new non-inlined charpred takes 2.1s and has these
tuple counts:
(2s) Tuple counts for ASTValueNumbering::GVN#f:
9158442 ~117% {1} r1 = JOIN project#SSAConstruction::Cached::getInstructionUnconvertedResultExpression AS L WITH ValueNumberingInternal::tvalueNumber#ff@staged_ext AS R ON FIRST 1 OUTPUT R.<1>
return r1
The code is now quadratic in the number of statements in a basic block,
whereas before it was quadratic in the number of _control-flow nodes_ in
a basic block.
The `iDominates` relation is directly on control-flow nodes, and its
transitive closure is far too large. It got compiled into a recursion
rather than `fastTC`, and I've observed that recursion to take about an
hour on a medium-size customer snapshot.
The fix is to check for dominance at the basic-block level.
The `cpp/missing-case-in-switch` performed badly on some snapshots, to
the extent where it was as slow as the most expensive IR stages
(example: ChakraCore). This commit makes it faster, removing a
`pragma[noopt]` along the way.
The intermediate tuple counts on a customer codebase drop from 84M to
3M, while the content hash of `getAMissingCase` is the same.
Before:
(124s) Tuple counts for Stmt::EnumSwitch::getAMissingCase#ff#antijoin_rhs:
20867789 ~0% {3} r1 = JOIN Stmt::SwitchStmt::getASwitchCase_dispred#ff AS L WITH Stmt::EnumSwitch::getAMissingCase#ff#shared AS R ON FIRST 1 OUTPUT L.<1>, R.<0>, R.<1>
20122830 ~0% {3} r2 = JOIN r1 WITH Stmt::SwitchCase::getExpr_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r1.<1>, r1.<2>
20122830 ~0% {3} r3 = JOIN r2 WITH Expr::Expr::getValue_dispred#ff AS R ON FIRST 1 OUTPUT r2.<2>, r2.<1>, R.<1>
83961918 ~0% {4} r4 = JOIN r3 WITH Enum::EnumConstant::getInitializer_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r3.<1>, r3.<0>, r3.<2>
83961918 ~0% {4} r5 = JOIN r4 WITH initialisers AS R ON FIRST 1 OUTPUT R.<2>, r4.<3>, r4.<1>, r4.<2>
234348 ~185% {2} r6 = JOIN r5 WITH Expr::Expr::getValue_dispred#ff AS R ON FIRST 2 OUTPUT r5.<2>, r5.<3>
return r6
...
(124s) Tuple counts for Stmt::EnumSwitch::getAMissingCase#ff:
663127 ~4% {2} r1 = Stmt::EnumSwitch::getAMissingCase#ff#shared AS L AND NOT Stmt::EnumSwitch::getAMissingCase#ff#antijoin_rhs AS R(L.<0>, L.<1>)
return r1
(124s) Registering Stmt::EnumSwitch::getAMissingCase#ff + [] with content 2060ff326cvhihcsvoph6k9divuv4
(124s) >>> Wrote relation Stmt::EnumSwitch::getAMissingCase#ff with 663127 rows and 2 columns.
After:
(5s) Tuple counts for Stmt::EnumSwitch::getAMissingCase_dispred#ff#antijoin_rhs:
746029 ~0% {2} r1 = JOIN Stmt::EnumSwitch::getAMissingCase_dispred#ff#shared AS L WITH Enum::Enum::getAnEnumConstant_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, L.<1>
3116197 ~2% {3} r2 = JOIN r1 WITH Enum::EnumConstant::getInitializer_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r1.<1>, r1.<0>
3116197 ~0% {3} r3 = JOIN r2 WITH initialisers AS R ON FIRST 1 OUTPUT R.<2>, r2.<1>, r2.<2>
3116197 ~311% {3} r4 = JOIN r3 WITH Expr::Expr::getValue_dispred#ff AS R ON FIRST 1 OUTPUT r3.<1>, R.<1>, r3.<2>
234348 ~185% {2} r5 = JOIN r4 WITH Stmt::EnumSwitch::matchesValue#ff AS R ON FIRST 2 OUTPUT r4.<0>, r4.<2>
return r5
(5s) Registering Stmt::EnumSwitch::getAMissingCase_dispred#ff#antijoin_rhs + [] with content 173483d71508vl534mvlr1g0ehi12
(5s) >>> Wrote relation Stmt::EnumSwitch::getAMissingCase_dispred#ff#antijoin_rhs with 82902 rows and 2 columns.
(5s) Starting to evaluate predicate Stmt::EnumSwitch::getAMissingCase_dispred#ff/2@ae4c0b
(5s) Tuple counts for Stmt::EnumSwitch::getAMissingCase_dispred#ff:
746029 ~2% {2} r1 = JOIN Stmt::EnumSwitch::getAMissingCase_dispred#ff#shared AS L WITH Enum::Enum::getAnEnumConstant_dispred#ff AS R ON FIRST 1 OUTPUT L.<1>, R.<1>
663127 ~4% {2} r2 = r1 AND NOT Stmt::EnumSwitch::getAMissingCase_dispred#ff#antijoin_rhs AS R(r1.<0>, r1.<1>)
return r2
(5s) Registering Stmt::EnumSwitch::getAMissingCase_dispred#ff + [] with content 2060ff326cvhihcsvoph6k9divuv4
(5s) >>> Wrote relation Stmt::EnumSwitch::getAMissingCase_dispred#ff with 663127 rows and 2 columns.
This query used two fastTC operations that were already somewhat
inefficient on their own but could send the evaluator into an OOM loop
when run in parallel without enough RAM.
The fix is to recurse manually, starting just from the expressions that
are potential candidates for alerts.