История

Chris Smowton ee63e60bb7 qlpacks: libraryPathDependencies -> dependencies		2022-10-28 16:07:36 +01:00
..
example	Python: CG trace: Don't abuse example dir	2020-07-22 14:22:04 +02:00
ql	qlpacks: libraryPathDependencies -> dependencies	2022-10-28 16:07:36 +01:00
src/cg_trace	spelling: processing	2022-10-13 11:21:09 -04:00
tests	Python: CG trace: Better handling of builtins without __module__	2020-07-24 19:13:53 +02:00
.flake8	Python: CG trace: blackify	2020-07-17 13:49:25 +02:00
.gitignore	Python: CG trace: Add helper.sh to run tracing against real projects	2020-07-23 17:37:01 +02:00
.isort.cfg	Python: CG trace: Make code modular	2020-07-17 14:40:54 +02:00
README.md	Python: Fix grammar	2020-09-07 14:59:07 +02:00
helper.sh	Python: CG trace: Make `./helper.sh` show help again	2020-07-24 18:59:29 +02:00
projects.json	Python: CG trace: Add support for flask	2020-07-24 20:06:53 +02:00
requirements.txt	Python: CG trace: blackify	2020-07-17 13:49:25 +02:00
setup.py	Python: CG trace: reconstruct call expr from bytecode	2020-07-20 11:28:05 +02:00

README.md

Recorded Call Graph Metrics

also known as call graph tracing.

Execute a python program and for each call being made, record the call and callee. This allows us to compare call graph resolution from static analysis with actual data -- that is, can we statically determine the target of each actual call correctly.

Using the call graph tracer does incur a heavy toll on the performance. Expect 10x longer to execute the program.

Number of calls recorded vary a little from run to run. I have not been able to pinpoint why.

Running against real projects

Currently it's possible to gather metrics from traced runs of the standard test suite of a few projects (defined in projects.json): youtube-dl, wcwidth, and flask.

To run against all projects, use

$ ./helper.sh all $(./helper.sh projects)

To view the results, use

$ head -n 100 projects/*/Metrics.txt

Expanding set of projects

It should be fairly straightforward to expand the set of projects. Most projects use tox for running their tests against multiple python versions. I didn't look into any kind of integration, but have manually picked out the instructions required to get going.

As an example, compare the tox.ini file from flask with the configuration

    "flask": {
        "repo": "https://github.com/pallets/flask.git",
        "sha": "21c3df31de4bc2f838c945bd37d185210d9bab1a",
        "module_command": "pytest -c /dev/null tests examples",
        "setup": [
            "pip install -r requirements/tests.txt",
            "pip install -q -e examples/tutorial[test]",
            "pip install -q -e examples/javascript[test]"
        ]
    }

Local development

Setup

Ensure you have at least Python 3.7
Create virtual environment python3 -m venv venv and activate it
Install dependencies pip install -r --upgrade requirements.txt
Install this codebase as an editable package pip install -e .
Setup your editor. If you're using VS Code, create a new project for this folder, and use these settings for correct autoformatting of code on save:

{
    "python.pythonPath": "venv/bin/python",
    "python.linting.enabled": true,
    "python.linting.flake8Enabled": true,
    "python.formatting.provider": "black",
    "editor.formatOnSave": true,
    "[python]": {
        "editor.codeActionsOnSave": {
            "source.organizeImports": true
        }
    },
    "python.autoComplete.extraPaths": [
        "src"
    ]
}

Enjoy writing code, and being able to run cg-trace on your command line 🎉

Using it

After following setup instructions above, you should be able to reproduce the example trace by running

cg-trace --xml example/simple.xml example/simple.py

You can also run traces for all tests and build a database by running tests/create-test-db.sh. Then run the queries inside the ql/ directory.

Tracing Limitations

Multi-threading

Should be possible by using threading.setprofile, but that hasn't been done yet.

Code that uses `sys.setprofile`

Since that is our mechanism for recording calls, any code that uses sys.setprofile will not work together with the call-graph tracer.

Class instantiation

Does not always fire off an event in the sys.setprofile function (neither in sys.settrace), so is not recorded. Example:

r = range(10)

when disassembled (python -m dis <file>):

  9          48 LOAD_NAME                7 (range)
             50 LOAD_CONST               5 (10)
             52 CALL_FUNCTION            1
             54 STORE_NAME               8 (r)

but no event 😞