Require_serial:true is better choice than pass_filename: false as it can
speed-up mypy for single file changes.
Significant gains can be achieved for single file changes and no cache for all
other files. This is majority of cases for our users who have pre-commits
installed as hooks because most people change only few files and never run
check with --all-files
When just one file is changed and no cache is built, the difference is drastic:
require_serial: true = 4s
pass_filenames: false = 13s
This PR does two things:
1. It enables the mypy cache (default folder name .mypy_cache) so that
repeated runs locally are quicker
2. It _disables_ passing only the changed files in.
Point 2 seems counter-intuitave, but in my testing running with all
files (airflow docs tests) was about twice as fast as without. My
hypothesis for why this happens is that when mypy is checking file x, it
has to check dependencies/imports for it too, and when we have
pass_filenames set runs multiple processes in parallel, and each of them
have to do this work!
Timings before and after:
- Before:
For all files
```
❯ time pre-commit run mypy -a
Run mypy.................................................................Passed
pre-commit run mypy -a 0.31s user 0.07s system 2% cpu 17.140 total
```
With only a single file
```
❯ time pre-commit run mypy --files airflow/configuration.py
Run mypy.................................................................Passed
pre-commit run mypy --files airflow/configuration.py 0.30s user 0.07s system 5% cpu 6.724 total
```
- After:
With a clean cache (`rm -rf .mypy_cache`):
```
$ time pre-commit run mypy
Run mypy.................................................................Passed
pre-commit run mypy -a 0.26s user 0.10s system 2% cpu 17.226 total
```
Clean cache with single file:
```
$ time pre-commit run mypy --file airflow/version.py
Run mypy.................................................................Passed
pre-commit run mypy --file airflow/version.py 0.23s user 0.07s system 4% cpu 7.091 total
```
Repeated run (cache folder exists):
```
$ time pre-commit run mypy --file airflow/version.py
Run mypy.................................................................Passed
pre-commit run mypy --file airflow/version.py 0.23s user 0.05s system 6% cpu 4.178 total
```
and for all files
```
airflow ❯ time pre-commit run mypy -a
Run mypy.................................................................Passed
pre-commit run mypy -a 0.25s user 0.09s system 6% cpu 4.833 total
```
While working on improving the way we run Kubernetes tests, we found out that I
need to fix handling of parameters - we change Kubernetes version used via Kind
and the old versions are no longer valid, however it was not properly
removed/saved.
We use the opportunity to add automated tests for that feature.
(cherry picked from commit 38dea9132d1fa36f4fbe871e2ab037be5ad3fab2)
We have far too much bash code around that is not automatically tested.
This is the first step to change it (simplifications and more tests are coming
soon).
There are cyclic imports detected seemingly randomly by pylint checks when some
of the PRs are run in CI
It was not deterministic because pylint usually uses as many processors as
many are available and it splits the list of .py files between the separate
pylint processors - depending on how the split is done, pylint check might
or might not detect it. The cycle is always detected when all files are used.
In order to make it more deterministic, all pylint and mypy errors were resolved
in all executors package and in dag_processor.
At the same time plugins_manager had also been moved out of the executors
and all of the operators/hooks/sensors/macros because it was also causing
cyclic dependencies and it's far easier to untangle those dependencies
in executor when we move the intialisation of all plugins to plugins_manager.
Additionally require_serial is set in pre-commit configuration to
make sure cycle detection is deterministic.
The slim image gave only very small gain on executing the tests in CI. The
image was significantly smaller, but then for local development and testing
you needed both full CI and SLIM-CI image.
This made the scripts and docker image needlessly complex - especially
in the wake of coming Production image it turned to be premature
optimisation really. While it sped-up (slightly - by 10-20 seconds) some
static check jobs in Travis, it increased time needed by developers
to have a working environment and to keep it updated every time it was
needed (by minutes)
Also having two separately managed images made it rather complex to join
some of the Travis CI jobs (there is a follow-up change with getting rid
of Checklicence image).
With this change both static checks and tests are executed using single
image. That also opens doors for further simplification of the scripts
and easier implementation of production image.
This change is to move build stage of pre-commit as late as possible in pre-commit
chain. This is useful if you do not want to rebuild the docker images needed
to run pylint/mypy/flake8 - and still see the result of other checks immediately
after you run commit.
This is needed to keep breeze --help in sync with the documentation.
It makes it easier for the follow-up changes needed for production
image to keep the docs in sync with the code.
This PR reimplements Kubernetes integration testing using kind,
a tool for running local Kubernetes clusters using Docker container
"nodes". The "nodes" are deployed to a separate docker daemon
(dind) started through docker-compose.
This commit adds full interactivity to pre-commits. Whenever you run pre-commit
and it detects that the image should be rebuild, an interactive question will
pop up instead of failing the build and asking to rebuild with REBUILD=yes
This is much nicer from the user perspective. You can choose whether to:
1) Rebuild the image (which will take some time)
2) Not rebuild the image (this will use the old image with hope it's OK)
3) Quit.
Answer to that question is carried across all images needed to rebuild.
There is the special "build" pre-commit hook that takes care about that.
Note that this interactive question cannot be asked if you run only
single pre-commit hook with Dockerfile because it can run multiple processes
and you can start building in parallel. This is not desired so instead we fail
such builds.
We have fairly complex python version detection in our CI scripts.
They have to handle several cases:
1) Running builds on DockerHub (we cannot pass different environment
variables there, so we detect python version based on the image
name being build (airflow:master-python3.7 -> PYTHON_VERSION=3.7)
2) Running builds on Travis CI. We use python version determined
from default python3 version available on the path. This way we
do not have to specify PYTHON_VERSION separately in each job,
we just specify which host python version is used for that job.
This makes a nice UI experience where you see python version in
Travis UI.
3) Running builds locally via scripts where we can pass PYTHON_VERSION
as environment variable.
4) Running builds locally for the first time with Breeze. By default
we determine the version based on default python3 version we have
in the host system (3.5, 3.6 or 3.7) and we use this one.
5) Selecting python version with Breeze's --python switch. This will
override python version but it will also store the last used version
of python in .build directory so that it is automatically used next
time.
This change adds necessary explanations to the code that works for
all the cases and fixes some of the edge-cases we had. It also
extracts the code to common directory.
The fuzzy licence matching implemented by Jarek Potiuk
was accepted and merged by Lucas-C in his pre-commit
hooks implementation (released today ver. 1.1.7)
so we can switch back to it.
Licence check for RAT runs too often (every time pre-commit is modified) and it
should only be run (locally) when any of *LICEN[S|C]E* files change. We anyhow
run full check on CI so this is local optimisation (it runs too long while you
play with .pre-commit-config.yaml - and we will probably be able to detect some
problems locally as well when new modules are added.