Replacing `Value.booleanValue`. We wanted to match `Object.booleanValue` that
only gives a result if it is either `true` or `false`, but also wanted to keep
the flexibility to see if the Value _could_ be `true`/`false`. We don't have a
motivating usecase, so let's see if we ever need it :P
+ fix modernisation regression on py/jinja2/autoescape-false
Since `IntObjectInternal` extends `TInt`, and `TInt` is defined for all
instances of `Builtin.intValue`, and `Builtin.intValue` includes both `int` and
`long`, we don't need to handles Longs in a special manner, as we did in NumericObject.
Adds
- `StringValue` as a new class,
- `Value::booleanValue` which returns the boolean interpretation of the given
value, and
- `ClassValue::str` which returns the value of the `str` class, depending on the
Python version.
Some of the tests currently fail, since they can't reproduce the old tests
results (since the sinks/sources defined in the library code are not
HttpResponseTaintSink/HttpRequestTaintSource)
Our definition of `toString` for the internal tuple objects we create during the
points-to analysis may have been a _tad_ too ambitious. In particular, it can
easily lead to non-termination, e.g. using the following piece of code:
```python
x = ()
while True:
x = (x, x)
```
This commit cuts off the infinite recursion by replacing _nested_ tuples with
the string "...". In particular this means even non-recursive tuples will be cut
off at that point, so that the following tuples
```python
(1, "2")
((3, 4), [5, 6])
(1, 2, 3, 4, 5)
```
Get the following string representations.
```
"(int 1, '2', )"
"(..., List, )"
"(int 1, int 2, int 3, 2 more...)"
```
They are not tainted in assignment, only in use.
I also adopted an attempt at a better test-setup, where it's easy to see if
everything is the way you hoped for, instead of browsing through 100 of lines of
taint-step output :P
Original code:
exists(Expr e, For forloop | forloop = loop and e.pointsTo(_, _, capturing) |
not loop.contains(e)
)
The new version will preserve the same semantics. The problem with the first
rewrite was that `not loop.(For).somethingMore` would hold for any AstNode that
was not a For
Performance on `taers232c/GAMADV-X` (which exhibited pathological behaviour in
the most recent dist upgrade) went from ~670s to ~313s on
`py/hardcoded-credentials`.
There are still a few tuple counts in the 10-100 million range, but this commit
takes care of all of the ones that numbered in the billions. (A single tuple
count in the 100-1000 million range remains, but it appears to be less critical,
taking only two seconds to calculate.)
This used to take 30s on `cpython`.
```
Tuple counts for StatementNoEffect::side_effecting_binary#f:
46522 ~0% {2} r1 = ClassObject::ClassObject::hasAttribute_dispred#fb AS L AND NOT StatementNoEffect::side_effecting_binary#f#antijoin_rhs AS R(L.<0>, L.<1>)
46522 ~2% {2} r2 = SCAN r1 OUTPUT r1.<1>, r1.<0>
950960 ~2% {2} r3 = JOIN r2 WITH Operations::Operator::getSpecialMethodName_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r2.<1>
950960 ~2% {2} r4 = JOIN r3 WITH py_operators AS R ON FIRST 1 OUTPUT R.<2>, r3.<1>
950960 ~0% {3} r5 = JOIN r4 WITH AstGenerated::BinaryExpr_::getLeft_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r4.<1>, r4.<0>
122934382 ~0% {2} r6 = JOIN r2 WITH Operations::Cmpop::getSpecialMethodName_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r2.<1>
122934382 ~3% {3} r7 = JOIN r6 WITH project#Operations::Compare::compares_dispred#ffff#3_201#join_rhs AS R ON FIRST 1 OUTPUT R.<2>, r6.<1>, R.<1>
123885342 ~3% {3} r8 = r5 \/ r7
300 ~8% {1} r9 = JOIN r8 WITH project#Exprs::Expr::refersTo_dispred#ffff AS R ON FIRST 2 OUTPUT r8.<2>
return r9
```
With this commit, it takes a few milliseconds.
Should fix#2426.
Essentially, we disregard expressions used inside annotations, if these
annotations occur in a file that has `from __future__ import annotations`, as
this prevents the annotations from being evaluated.
* Adds fix for __init_subclass__ bug.
* Adds test case.
* Move test on name.
I think it makes more sense here, alongside the other "special" method names.
Fixes#1969.
The points-to analysis does not know that the assignment `input = raw_input`
cannot fail under Python 2, and so there are two possible values that `input`
could point-to after exiting the exception handler: the built-in `input`, or the
built-in `raw_input`. In the latter case we do not want to report the alert, and
so adding a check that the given function does not point-to the built-in
`raw_input` suffices.
Should fix#1833, #2137, and #2187.
Internally, comprehensions are (at present) elaborated into local functions and
iterators as described in [PEP-289](https://www.python.org/dev/peps/pep-0289/).
That is, something like:
```
g = (x**2 for x in range(10))
```
becomes something akin to
```
def __gen(exp):
for x in exp:
yield x**2
g = __gen(iter(range(10)))
```
In the context of the top-level of a class, this means `__gen` looks as if it is
a method of the class, and in particular `exp` looks like it's the `self`
argument of this method, which leads the points-to analysis to think that `exp`
is an instance of the surrounding class itself.
The fix in this case is pretty simple: we look for occurrences of `exp` (in fact
called `.0` internally -- carefully chosen to _not_ be a valid Python
identifier) and explicitly exclude this parameter from being classified as a
`self` parameter.
Having `toString()` defined to be `none()` is a major headache when debugging,
as `toString`-less results are silently elided. This PR puts dummy `toString`s
in place of the `none()`s.
(I am mostly creating this to see if it impacts our tests and/or the
performance. If not, we may as well merge it.)
Fixes#1832.
In the taint sink, we add an additional check that the given control-flow node
can indeed point to a value that is mutable. This takes care of the guard on the
type.
If and when we get around to adding configurations for all of the taint
analyses, we may want to implement this as a barrier instead, pruning any steps
that go through a type test where the type is not mutable.
Fixes https://github.com/Semmle/ql/issues/1572
Adjust mock so it's more aligned with what the flask code actually does. Tests
were passing before, even though we didn't handle the case in real code :\
If the legacy configuration is only enabled if there are no other
configurations, defining a configuration in an imported library can lead to
unwanted results. For example, code that uses `any(MyTaintKind t).taints(node)`
would *stop* working, if it did not define its own configuration. (this actually
happened to us)
We performed a dist-compare to ensure there is not a performance deg ration by
doing this. Results at https://git.semmle.com/gist/rasmuswl/a1eca07f3a92f5f65ee78d733e5d260e
Tests that were affected by this:
- RockPaperScissors + Simple: new edges because no configuration was defined for
SqlInjectionTaint or CommandInjectionTaint
- CleartextLogging + CleartextStorage: new edges because no configuration was
defined before, AND duplicate deges.
- TestNode: new edges because no configuration was defined before
- PathInjection: Duplicate edges
- TarSlip: Duplicate edges
- CommandInjection: Duplicate edges
- ReflectedXss: Duplicate edges
- SqlInjection: Duplicate edges
- CodeInjection: Duplicate edges
- StackTraceExposure: Duplicate edges
- UnsafeDeserialization: Duplicate edges
- UrlRedirect: Duplicate edges
We don't use a locked-down version of six, so some internal things probably
changed from the version used last time, and the versoin I have installed.
Long term fix would be to use a specific version of six for tests!
Due to internal PR#35123 we now actually run the tests under
`python/ql/test/2/...`
These seems like a regression, since the tests state that N is ok, but A and J
should not be allowed.
For now we can accept them, so we don't block all other Python PRs
Due to internal PR#35123 we now actually run the tests under
`python/ql/test/2/...`
Since we haven't done this in a while, test output has changed a bit. These
changes look perfectly fine.
The `multiple_invocation_paths` predicate had a bad join order where
we (essentially) joined `i1` with `i2` and only then joined `i1` and `i2`
separately to reduce the number of tuples. The join coming from `i1 != i2` had
little impact, but `i1.getFunction() = multi` made a big difference (and
similarly for `i2`). I factored out the code so that these joins would be done
more eagerly. Thus, we went from
```
[2019-11-06 16:53:05] (38s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths#ffff/4@2ce75a
[2019-11-06 16:53:35] (68s) Tuple counts for MethodCallOrder::multiple_invocation_paths#ffff:
134547 ~9% {2} r1 = SCAN CallGraph::TInvocation#fff AS I OUTPUT I.<0>, I.<2>
235284431 ~3% {4} r2 = JOIN r1 WITH CallGraph::TInvocation#fff AS R ON FIRST 1 OUTPUT r1.<0>, r1.<1>, R.<1>, R.<2>
235149884 ~3% {4} r3 = SELECT r2 ON r2.<3> != r2.<1>
235149884 ~4% {3} r4 = SCAN r3 OUTPUT r3.<1>, r3.<0>, r3.<3>
166753634 ~5% {4} r5 = JOIN r4 WITH #CallGraph::FunctionInvocation::getACallee_dispred#ffPlus#swapped AS R ON FIRST 1 OUTPUT R.<1>, r4.<2>, r4.<1>, r4.<0>
129778 ~0% {4} r6 = JOIN r5 WITH #CallGraph::FunctionInvocation::getACallee_dispred#ffPlus AS R ON FIRST 2 OUTPUT r5.<0>, r5.<3>, r5.<1>, r5.<2>
return r6
[2019-11-06 16:53:35] (68s) Registering MethodCallOrder::multiple_invocation_paths#ffff + [] with content 1705dcbc08kd9aa40rp2g2e9civhv
[2019-11-06 16:53:35] (68s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths#ffff with 129778 rows and 4 columns.
```
to
```
[2019-11-06 17:22:22] (25s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths_helper#ffff/4@586aec
[2019-11-06 17:22:22] (25s) Tuple counts for MethodCallOrder::multiple_invocation_paths_helper#ffff:
134547 ~0% {2} r1 = SCAN CallGraph::TInvocation#fff AS I OUTPUT I.<2>, I.<0>
88111 ~4% {3} r2 = JOIN r1 WITH #CallGraph::FunctionInvocation::getACallee_dispred#ffPlus#swapped AS R ON FIRST 1 OUTPUT R.<1>, r1.<1>, r1.<0>
761305 ~0% {4} r3 = JOIN r2 WITH #CallGraph::FunctionInvocation::getACallee_dispred#ffPlus AS R ON FIRST 1 OUTPUT r2.<1>, r2.<2>, r2.<0>, R.<1>
673194 ~0% {4} r4 = SELECT r3 ON r3.<3> != r3.<1>
673194 ~0% {4} r5 = SCAN r4 OUTPUT r4.<2>, r4.<1>, r4.<3>, r4.<0>
return r5
[2019-11-06 17:22:22] (25s) Registering MethodCallOrder::multiple_invocation_paths_helper#ffff + [] with content 20edaaecf25nldgp24d9c4et8m3kv
[2019-11-06 17:22:22] (25s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths_helper#ffff with 673194 rows and 4 columns.
[2019-11-06 17:22:22] (25s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs/4@9e5441
[2019-11-06 17:22:22] (25s) Tuple counts for MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs:
673194 ~0% {4} r1 = SCAN MethodCallOrder::multiple_invocation_paths_helper#ffff AS I OUTPUT I.<2>, I.<3>, I.<0>, I.<1>
return r1
[2019-11-06 17:22:22] (25s) Registering MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs + [] with content 2069301e655fi9mcovngg9hetfqas
[2019-11-06 17:22:22] (25s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs with 673194 rows and 4 columns.
[2019-11-06 17:22:22] (25s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths#ffff/4@2f7c34
[2019-11-06 17:22:22] (25s) Tuple counts for MethodCallOrder::multiple_invocation_paths#ffff:
134547 ~0% {2} r1 = SCAN CallGraph::TInvocation#fff AS I OUTPUT I.<2>, I.<0>
129778 ~0% {4} r2 = JOIN r1 WITH MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs AS R ON FIRST 2 OUTPUT R.<2>, R.<3>, r1.<0>, r1.<1>
return r2
[2019-11-06 17:22:22] (25s) Registering MethodCallOrder::multiple_invocation_paths#ffff + [] with content 1705dcbc08kd9aa40rp2g2e9civhv
[2019-11-06 17:22:22] (25s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths#ffff with 129778 rows and 4 columns.
[2019-11-06 17:22:22] (25s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths#ffff_0312#join_rhs/4@9f9146
[2019-11-06 17:22:22] (25s) Tuple counts for MethodCallOrder::multiple_invocation_paths#ffff_0312#join_rhs:
129778 ~0% {4} r1 = SCAN MethodCallOrder::multiple_invocation_paths#ffff AS I OUTPUT I.<0>, I.<3>, I.<1>, I.<2>
return r1
[2019-11-06 17:22:22] (25s) Registering MethodCallOrder::multiple_invocation_paths#ffff_0312#join_rhs + [] with content 17c3fe1fcbf6ghhdr7hiukqp41rst
[2019-11-06 17:22:22] (25s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths#ffff_0312#join_rhs with 129778 rows and 4 columns.
```
Execution time on `salt` went from 29.5s to somewhere below 299ms (the predicate
was not listed in the timing report).
As it turns out, there was a further bad join-order in the `global_name_used`
predicate. In this case, there was a common subexpression in the RA that was
being factored out and evaluated separately, producing a large number of tuples.
Before this change, any function that has a parameter that was called
password/credentials would be treated as returning sensitive data of that
kind. `py/clear-text-logging-sensitive-data` would alert if one of these are
logged, which has a LOT of false-positives.
This was brought up on the LGTM.com forums here:
https://discuss.lgtm.com/t/warn-when-always-failing-assert-is-reachable-rather-than-unreachable/2436
Essentially, in a complex chain of `elif` statements, like
```python
if x < 0:
...
elif x >= 0:
...
else:
...
```
the `else` clause is redundant, since the preceding conditions completely
exhaust the possible values for `x` (assuming `x` is an integer). Rather than
promoting the final `elif` clause to an `else` clause, it is common to instead
raise an explicit exception in the `else` clause. During execution, this
exception will never actually be raised, but its presence indicates that the
preceding conditions are intended to cover all possible cases.
I think it's a fair point. This is a clear instance where the alert, even if it
is technically correct, is not useful for the end user.
Also, I decided to make the exclusion fairly restrictive: it only applies if
the unreachable statement is an `assert False, ...` or `raise ...`, and only
if said statement is the first in the `else` block. Any other statements will
still be reported.
This would find instances of `thing = MyThing.objects.get(field=userinput)`, and
what seems to be a query that wants to match on `thing = MyThing();
thing.field=userinput`. Both are not vulnerable to user-input, due to the
build-in escaping by django.
The DjangoModelFieldWrite actually matches on `MyThing.field=userinput` and not
`thing.field=userinput`. I suspect this to be a mistake.
Matching on `thing.field=userinput`, would require this CodeQL:
attr.getObject(_).pointsTo().getClass() = model
This is vulnerable to SQL injection because of the quotes around %s -- added
some code that highlights this in test.py
Since our examples did this in the safe query, I ended up rewriting them
completely, causing a lot of trouble for myself :D
Hopefully it is more clear that you can get multiple results from getABaseType
because of multiple inheritance, and not because we are following the chain of
inheritance
Like the one existing in ControlFlowNode.
This is useful for checking class of value being poitned to, as
expr.pointsTo().getClass() = someClass
Without this you need to do
exists(Value v | v.getClass() = someClass | expr.pointsTo(v))
Eventually these files will subsume the current `queries.xml` files
at the top of query-containing and library directories. For now they're
just here to support internal testing of the tooling support for them
we're writing on.
Format and contents is a work in progress. If you're not in Semmle,
don't depend on anything here making sense (or staying stable) until
you see the version tags increase to something nonzero.
We could find no reason for using `Builtin::builtin` instead of
`Builtin::special`. Since all the other base types use `special`, and the old
Object API is using `special`, let's also do that :)
This commit fixes the results for
0d8a429b7e/files/mayaTools/cgm/lib/classes/AttrFactory.py?sort=name&dir=ASC&mode=heatmap#L90
```
def __init__(...):
if error_case:
return guiFactory.warning(...)
```
that was wrongly reporting _Explicit return in __init__ method._ as an error.