incubator-airflow/BREEZE.rst

1062 строки
43 KiB
ReStructuredText

.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
.. image:: images/AirflowBreeze_logo.png
:align: center
:alt: Airflow Breeze Logo
.. contents:: :local:
Airflow Breeze
==============
Airflow Breeze is an easy-to-use integration test environment managed via
`Docker Compose <https://docs.docker.com/compose/>`_ .
The environment is easy to use locally and it is also used by Airflow's CI Travis tests.
It's called *Airflow Breeze* as in **It's a Breeze to develop Airflow**
The advantages and disadvantages of using the environment vs. other ways of testing Airflow
are described in `CONTRIBUTING.md <CONTRIBUTING.md#integration-test-development-environment>`_.
Here is the short 10 minute video about Airflow Breeze
.. image:: http://img.youtube.com/vi/ffKFHV6f3PQ/0.jpg
:width: 480px
:height: 360px
:scale: 100 %
:alt: Airflow Breeze Simplified Development Workflow
:align: center
:target: http://www.youtube.com/watch?v=ffKFHV6f3PQ
Prerequisites
=============
Docker
------
You need latest stable Docker Community Edition installed and on the PATH. It should be
configured to be able to run ``docker`` commands directly and not only via root user. Your user
should be in the ``docker`` group. See `Docker installation guide <https://docs.docker.com/install/>`_
When you develop on Mac OS you usually have not enough disk space for Docker if you start using it
seriously. You should increase disk space available before starting to work with the environment.
Usually you have weird problems of docker containers when you run out of Disk space. It might not be
obvious that space is an issue. At least 128 GB of Disk space is recommended. You can also get by with smaller space but you should more
often clean the docker disk space periodically.
If you get into weird behaviour try `Cleaning up the images <#cleaning-up-the-images>`_.
See also `Docker for Mac - Space <https://docs.docker.com/docker-for-mac/space>`_ for details of increasing
disk space available for Docker on Mac.
Docker compose
--------------
Latest stable Docker Compose installed and on the PATH. It should be
configured to be able to run ``docker-compose`` command.
See `Docker compose installation guide <https://docs.docker.com/compose/install/>`_
Getopt and gstat
----------------
* If you are on MacOS
* you need gnu ``getopt`` and ``gstat`` to get Airflow Breeze running.
* Typically you need to run ``brew install gnu-getopt coreutils`` and then follow instructions (you need to link the gnu getopt
version to become first on the PATH). Make sure to re-login after yoy make the suggested changes.
* Then (with brew) link the gnu-getopt to become default as suggested by brew.
* If you use bash, you should run this command (and re-login):
.. code-block:: bash
echo 'export PATH="/usr/local/opt/gnu-getopt/bin:$PATH"' >> ~/.bash_profile
. ~/.bash_profile
* If you use zsh, you should run this command (and re-login):
.. code-block:: bash
echo 'export PATH="/usr/local/opt/gnu-getopt/bin:$PATH"' >> ~/.zprofile
. ~/.zprofile
* If you are on Linux
* run ``apt install util-linux coreutils`` or equivalent if your system is not Debian-based.
Memory
------
Minimum 4GB RAM is required to run the full ``docker`` environment.
On MacOS, the default 2GB of RAM available for your docker containers, but more memory is recommended
(4GB should be comfortable). For details see
`Docker for Mac - Advanced tab <https://docs.docker.com/v17.12/docker-for-mac/#advanced-tab>`_
How Breeze works
================
Entering Breeze
---------------
Your entry point for Airflow Breeze is `./breeze <./breeze>`_ script. You can run it with ``--help``
option to see the list of available flags. See `Airflow Breeze flags <#airflow-breeze-flags>`_ for details.
You can also `Set up autocomplete <#setting-up-autocomplete>`_ for the command and add the
checked-out airflow repository to your PATH to run breeze without the ./ and from any directory.
First time you run Breeze, it will pull and build local version of docker images.
It will pull latest Airflow CI images from `Airflow DockerHub <https://hub.docker.com/r/apache/airflow>`_
and use them to build your local docker images.
Stopping Breeze
---------------
After starting up, the environment runs in the background and takes precious memory.
You can always stop it via:
.. code-block:: bash
./breeze --stop-environment
Using the Airflow Breeze environment for testing
================================================
Setting up autocomplete
-----------------------
The ``breeze`` command comes with built-in bash/zsh autocomplete for its flags. When you start typing
the command you can use <TAB> to show all the available switches
nd to get autocompletion on typical values of parameters that you can use.
You can setup auto-complete automatically by running:
.. code-block:: bash
./breeze --setup-autocomplete
You get autocomplete working when you re-enter the shell.
Zsh autocompletion is currently limited to only autocomplete flags. Bash autocompletion also completes
flag values (for example python version or static check name).
Entering the environment
------------------------
You enter the integration test environment by running the ``./breeze`` script.
What happens next is the appropriate docker images are pulled, local sources are used to build local version
of the image and you are dropped into bash shell of the airflow container -
with all necessary dependencies started up. Note that the first run (per python) might take up to 10 minutes
on a fast connection to start. Subsequent runs should be much faster.
.. code-block:: bash
./breeze
Once you enter the environment you are dropped into bash shell and you can run tests immediately.
Choosing environment
--------------------
You can choose the optional flags you need with ``breeze``
You can specify for example python version to use, backend to use and environment
for testing - you can recreate the same environments as we have in matrix builds in Travis CI.
For example you could choose to run python 3.6 tests with mysql as backend and in docker
environment by:
.. code-block:: bash
./breeze --python 3.6 --backend mysql --env docker
The choices you made are persisted in ``./.build/`` cache directory so that next time when you use the
``breeze`` script, it will use the values that were used previously. This way you do not
have to specify them when you run the script. You can delete the ``.build/`` directory in case you want to
restore default settings.
The defaults when you run the environment are reasonable (python 3.6, sqlite, docker).
Mounting local sources to Breeze
--------------------------------
Important sources of airflow are mounted inside the ``airflow-testing`` container that you enter,
which means that you can continue editing your changes in the host in your favourite IDE and have them
visible in docker immediately and ready to test without rebuilding images. This can be disabled by specifying
``--skip-mounting-source-volume`` flag when running breeze, in which case you will have sources
embedded in the container - and changes to those sources will not be persistent.
After you run Breeze for the first time you will have an empty directory ``files`` in your source code
that will be mapped to ``/files`` in your docker container. You can pass any files there you need
to configure and run docker and they will not be removed between docker runs.
Running tests in Airflow Breeze
-------------------------------
Once you enter Airflow Breeze environment you should be able to simply run
`run-tests` at will. Note that if you want to pass extra parameters to nose
you should do it after '--'
For example, in order to just execute the "core" unit tests, run the following:
.. code-block:: bash
run-tests tests.core:TestCore -- -s --logging-level=DEBUG
or a single test method:
.. code-block:: bash
run-tests tests.core:TestCore.test_check_operators -- -s --logging-level=DEBUG
The tests will run ``airflow db reset`` and ``airflow db init`` the first time you
run tests in running container, so you can count on database being initialized.
All subsequent test executions within the same container will run without database
initialisation.
You can also optionally add --with-db-init flag if you want to re-initialize
the database.
.. code-block:: bash
run-tests --with-db-init tests.core:TestCore.test_check_operators -- -s --logging-level=DEBUG
Adding/modifying dependencies
-----------------------------
If you change apt dependencies in the ``Dockerfile`` or add python pacakges in ``setup.py`` or
javascript dependencies in ``package.json``. You can add dependencies temporarily for one Breeze
session or permanently in ``setup.py``, ``Dockerfile``, ``package.json``.
Installing dependencies for one Breeze session
..............................................
You can install dependencies inside the container using 'sudo apt install', 'pip install' or 'npm install'
(in airflow/www folder) respectively. This is useful if you want to test something quickly while in the
container. However, those changes are not persistent - they will disappear once you
exit the container (except npm dependencies in case your sources are mounted to the container). Therefore
if you want to persist a new dependency you have to follow with the second option.
Adding dependencies permanently
...............................
You can add the dependencies to the Dockerfile, setup.py or package.json and rebuild the image. This
should happen automatically if you modify any of setup.py, package.json or update Dockerfile itself.
After you exit the container and re-run ``breeze`` the Breeze detects changes in dependencies,
ask you to confirm rebuilding of the image and proceed to rebuilding the image if you confirm (or skip it
if you won't confirm). After rebuilding is done, it will drop you to shell. You might also provide
``--build-only`` flag to only rebuild images and not go into shell - it will then rebuild the image
and will not enter the shell.
Optimisation for apt dependencies during development
....................................................
During development, changing dependencies in apt-get closer to the top of the Dockerfile
will invalidate cache for most of the image and it will take long time to rebuild the image by Breeze.
Therefore it is a recommended practice to add new dependencies initially closer to the end
of the Dockerfile. This way dependencies will be incrementally added.
However before merge, those dependencies should be moved to the appropriate ``apt-get install`` command
which is already in the Dockerfile.
Debugging with ipdb
-------------------
You can debug any code you run in the container using ``ipdb`` debugger if you prefer console debugging.
It is as easy as copy&pasting this line into your code:
.. code-block:: python
import ipdb; ipdb.set_trace()
Once you hit the line you will be dropped into interactive ipdb debugger where you have colors
and auto-completion to guide your debugging. This works from the console where you started your program.
Note that in case of ``nosetest`` you need to provide ``--nocapture`` flag to avoid nosetests
capturing the stdout of your process.
Airflow directory structure inside Docker
-----------------------------------------
When you are in the container note that following directories are used:
.. code-block:: text
/opt/airflow - here sources of Airflow are mounted from the host (AIRFLOW_SOURCES)
/root/airflow - all the "dynamic" Airflow files are created here: (AIRFLOW_HOME)
airflow.db - sqlite database in case sqlite is used
dags - folder where non-test dags are stored (test dags are in /opt/airflow/tests/dags)
logs - logs from airflow executions are created there
unittest.cfg - unit test configuration generated when entering the environment
webserver_config.py - webserver configuration generated when running airflow in the container
Note that when run in your local environment ``/root/airflow/logs`` folder is actually mounted from your
``logs`` directory in airflow sources, so all logs created in the container are automatically visible in the host
as well. Every time you enter the container the logs directory is cleaned so that logs do not accumulate.
Port forwarding
---------------
When you run Airflow Breeze, the following ports are automatically forwarded:
* 28080 -> forwarded to airflow webserver -> airflow-testing:8080
* 25433 -> forwarded to postgres database -> postgres:5432
* 23306 -> forwarded to mysql database -> mysql:3306
You can connect to those ports/databases using:
* Webserver: ``http://127.0.0.1:28080``
* Postgres: ``jdbc:postgresql://127.0.0.1:25433/airflow?user=postgres&password=airflow``
* Mysql: ``jdbc:mysql://localhost:23306/airflow?user=root``
Note that you need to start the webserver manually with ``airflow webserver`` command if you want to connect
to the webserver (you can use ``tmux`` to multiply terminals).
For databases you need to run ``airflow db reset`` at least once (or run some tests) after you started
Airflow Breeze to get the database/tables created. You can connect to databases
with IDE or any other Database client:
.. image:: images/database_view.png
:align: center
:alt: Database view
You can change host port numbers used by setting appropriate environment variables:
* ``WEBSERVER_HOST_PORT``
* ``POSTGRES_HOST_PORT``
* ``MYSQL_HOST_PORT``
When you set those variables, next time when you enter the environment the new ports should be in effect.
Cleaning up the images
----------------------
You might need to cleanup your Docker environment occasionally. The images are quite big
(1.5GB for both images needed for static code analysis and CI tests). And if you often rebuild/update
images you might end up with some unused image data.
Cleanup can be performed with ``docker system prune`` command.
Make sure to `Stop Breeze <#stopping-breeze>`_ first with ``./breeze --stop-environment``.
If you run into disk space errors, we recommend you prune your docker images using the
``docker system prune --all`` command. You might need to restart the docker
engine before running this command.
You can check if your docker is clean by running ``docker images --all`` and ``docker ps --all`` - both
should return an empty list of images and containers respectively.
If you are on Mac OS and you end up with not enough disk space for Docker you should increase disk space
available for Docker. See `Prerequsites <#prerequisites>`_.
Troubleshooting
---------------
If you are having problems with the Breeze environment - try the following (after each step you
can check if your problem is fixed)
1. Check if you have enough disks space in Docker if you are on MacOS.
2. Stop Breeze - use ``./breeze --stop-environment``
3. Delete ``.build`` directory and run ``./breeze --force-pull-images``
4. `Clean up docker images <#cleaning-up-the-images>`_
5. Restart your docker engine and try again
6. Restart your machine and try again
7. Remove and re-install Docker CE and try again
In case the problems are not solved, you can set VERBOSE variable to "true" (`export VERBOSE="true"`)
and rerun failing command, and copy & paste the output from your terminal, describe the problem and
post it in [Airflow Slack](https://apache-airflow-slack.herokuapp.com/) #troubleshooting channel.
Using Breeze for other tasks
============================
Running static code checks
--------------------------
We have a number of static code checks that are run in Travis CI but you can run them locally as well.
All these tests run in python3.5 environment. Note that the first time you run the checks it might take some
time to rebuild the docker images required to run the tests, but all subsequent runs will be much faster -
the build phase will just check if your code has changed and rebuild as needed.
The checks below are run in a docker environment, which means that if you run them locally,
they should give the same results as the tests run in TravisCI without special environment preparation.
You run the checks via ``-S``, ``--static-check`` flags or ``-F``, ``--static-check-all-files``.
The former will run appropriate checks only for files changed and staged locally, the latter will run it
on all files. It can take a lot of time to run check for all files in case of pylint on MacOS due to slow
filesystem for Mac OS Docker. You can add arguments you should pass them after -- as extra arguments.
You cannot pass ``--files`` flage if you selected ``--static-check-all-files`` option.
You can see the list of available static checks via --help flag or use autocomplete. Most notably ``all``
static check runs all static checks configured. Also since pylint tests take a lot of time you can
also run special ``all-but-pylint`` check which will skip pylint checks.
Run mypy check in currently staged changes:
.. code-block:: bash
./breeze --static-check mypy
Run mypy check in all files:
.. code-block:: bash
./breeze --static-check-all-files mypy
Run flake8 check for tests.core.py file with verbose output:
.. code-block:: bash
./breeze --static-check flake8 -- --files tests/core.py --verbose
Run flake8 check for tests.core package with verbose output:
.. code-block:: bash
./breeze --static-check mypy -- --files tests/hooks/test_druid_hook.py
Run all tests on currently staged files:
.. code-block:: bash
./breeze --static-check all
Run all tests on all files:
.. code-block:: bash
./breeze --static-check-all-files all
Run all tests but pylint on all files:
.. code-block:: bash
./breeze --static-check-all-files all-but-pylint
Run pylint checks for all changed files:
.. code-block:: bash
./breeze --static-check pylint
Run pylint checks for selected files:
.. code-block:: bash
./breeze --static-check pylint -- --files airflow/configuration.py
Run pylint checks for all files:
.. code-block:: bash
./breeze --static-check-all-files pylint
The ``license`` check is also run via separate script and separate docker image containing
Apache RAT verification tool that checks for Apache-compatibility of licences within the codebase.
It does not take pre-commit parameters as extra args.
.. code-block:: bash
./breeze --static-check-all-files licenses
Building the documentation
--------------------------
The documentation is build using ``-O``, ``--build-docs`` command:
.. code-block:: bash
./breeze --build-docs
Results of the build can be found in ``docs/_build`` folder. Often errors during documentation generation
come from the docstrings of auto-api generated classes. During the docs building auto-api generated
files are stored in ``docs/_api`` folder - so that in case of problems with documentation you can
find where the problems with documentation originated from.
Running tests directly from host
--------------------------------
If you wish to run tests only and not drop into shell, you can run them by providing
-t, --test-target flag. You can add extra nosetest flags after -- in the commandline.
.. code-block:: bash
./breeze --test-target tests/hooks/test_druid_hook.py -- --logging-level=DEBUG
You can run the whole test suite with special '.' test target:
.. code-block:: bash
./breeze --test-target .
You can also specify individual tests or group of tests:
.. code-block:: bash
./breeze --test-target tests.core:TestCore
Pulling the latest images
-------------------------
Sometimes the image on DockerHub is rebuilt from the scratch. This happens for example when there is a
security update of the python version that all the images are based on.
In this case it is usually faster to pull latest images rather than rebuild them
from the scratch.
You can do it via ``--force-pull-images`` flag to force pull latest images from DockerHub.
In the future Breeze will warn you when you are advised to force pull images.
Running commands inside Docker
------------------------------
If you wish to run other commands/executables inside of Docker environment you can do it via
``-x``, ``--execute-command`` flag. Note that if you want to add arguments you should specify them
together with the command surrounded with " or ' or pass them after -- as extra arguments.
.. code-block:: bash
./breeze --execute-command "ls -la"
.. code-block:: bash
./breeze --execute-command ls -- --la
Running Docker Compose commands
-------------------------------
If you wish to run docker-compose command (such as help/pull etc. ) you can do it via
``-d``, ``--docker-compose`` flag. Note that if you want to add extra arguments you should specify them
after -- as extra arguments.
.. code-block:: bash
./breeze --docker-compose pull -- --ignore-pull-failures
Using your host IDE
===================
Configuring local virtualenv
----------------------------
In order to use your host IDE (for example IntelliJ's PyCharm/Idea) you need to have virtual environments
setup. Ideally you should have virtualenvs for all python versions that Airflow supports (3.5, 3.6, 3.7).
You can create the virtualenv using ``virtualenvwrapper`` - that will allow you to easily switch between
virtualenvs using workon command and mange your virtual environments more easily.
Typically creating the environment can be done by:
.. code-block:: bash
mkvirtualenv <ENV_NAME> --python=python<VERSION>
After the virtualenv is created, you must initialize it. Simply enter the environment
(using workon) and once you are in it run:
.. code-block:: bash
./breeze --initialize-local-virtualenv
Once initialization is done, you should select the virtualenv you initialized as the project's default
virtualenv in your IDE.
Running unit tests via IDE
--------------------------
After setting it up - you can use the usual "Run Test" option of the IDE and have all the
autocomplete and documentation support from IDE as well as you can debug and click-through
the sources of Airflow - which is very helpful during development. Usually you also can run most
of the unit tests (those that do not require prerequisites) directly from the IDE:
Running unit tests from IDE is as simple as:
.. image:: images/running_unittests.png
:align: center
:alt: Running unit tests
Some of the core tests use dags defined in ``tests/dags`` folder - those tests should have
``AIRFLOW__CORE__UNIT_TEST_MODE`` set to True. You can set it up in your test configuration:
.. image:: images/airflow_unit_test_mode.png
:align: center
:alt: Airflow Unit test mode
You cannot run all the tests this way - only unit tests that do not require external dependencies
such as postgres/mysql/hadoop etc. You should use
`Running tests in Airflow Breeze <#running-tests-in-airflow-breeze>`_ in order to run those tests. You can
still use your IDE to debug those tests as explained in the next chapter.
Debugging Airflow Breeze Tests in IDE
-------------------------------------
When you run example DAGs, even if you run them using UnitTests from within IDE, they are run in a separate
container. This makes it a little harder to use with IDE built-in debuggers.
Fortunately for IntelliJ/PyCharm it is fairly easy using remote debugging feature (note that remote
debugging is only available in paid versions of IntelliJ/PyCharm).
You can read general description `about remote debugging
<https://www.jetbrains.com/help/pycharm/remote-debugging-with-product.html>`_
You can setup your remote debug session as follows:
.. image:: images/setup_remote_debugging.png
:align: center
:alt: Setup remote debugging
Not that if you are on ``MacOS`` you have to use the real IP address of your host rather than default
localhost because on MacOS container runs in a virtual machine with different IP address.
You also have to remember about configuring source code mapping in remote debugging configuration to map
your local sources into the ``/opt/airflow`` location of the sources within the container.
.. image:: images/source_code_mapping_ide.png
:align: center
:alt: Source code mapping
Airflow Breeze flags
====================
These are the current flags of the `./breeze <./breeze>`_ script
.. code-block:: text
Usage: breeze [FLAGS] \
[-k]|[-S <STATIC_CHECK>]|[-F <STATIC_CHECK>]|[-O]|[-e]|[-a]|[-b]|[-t <TARGET>]|[-x <COMMAND>]|[-d <COMMAND>] \
-- <EXTRA_ARGS>
The swiss-knife-army tool for Airflow testings. It allows to perform various test tasks:
* Enter interactive environment when no command flags are specified (default behaviour)
* Stop the interactive environment with -k, --stop-environment command
* Run static checks - either for currently staged change or for all files with
-S, --static-check or -F, --static-check-all-files commanbd
* Build documentation with -O, --build-docs command
* Setup local virtualenv with -e, --setup-virtualenv command
* Setup autocomplete for itself with -a, --setup-autocomplete command
* Build docker image with -b, --build-only command
* Run test target specified with -t, --test-target connad
* Execute arbitrary command in the test environmenrt with -x, --execute-command command
* Execute arbitrary docker-compose command with -d, --docker-compose command
** Commands
By default the script enters IT environment and drops you to bash shell,
but you can also choose one of the commands to run specific actions instead:
-k, --stop-environment
Bring down running docker compose environment. When you start the environment, the docker
containers will continue running so that startup time is shorter. But they take quite a lot of
memory and CPU. This command stops all running containers from the environment.
-O, --build-docs
Build documentation.
-S, --static-check <STATIC_CHECK>
Run selected static checks for currently changed files. You should specify static check that
you would like to run or 'all' to run all checks. One of
[ all all-but-pylint check-hooks-apply check-merge-conflict check-executables-have-shebangs check-xml detect-private-key doctoc end-of-file-fixer flake8 forbid-tabs insert-license check-apache-license lint-dockerfile mixed-line-ending mypy pylint shellcheck].
You can pass extra arguments including options to to the pre-commit framework as
<EXTRA_ARGS> passed after --. For example:
'./breeze --static-check mypy' or
'./breeze --static-check mypy -- --files tests/core.py'
You can see all the options by adding --help EXTRA_ARG:
'./breeze --static-check mypy -- --help'
-F, --static-check-all-files <STATIC_CHECK>
Run selected static checks for all applicable files. You should specify static check that
you would like to run or 'all' to run all checks. One of
[ all all-but-pylint check-hooks-apply check-merge-conflict check-executables-have-shebangs check-xml detect-private-key doctoc end-of-file-fixer flake8 forbid-tabs insert-license check-apache-license lint-dockerfile mixed-line-ending mypy pylint shellcheck].
You can pass extra arguments including options to the pre-commit framework as
<EXTRA_ARGS> passed after --. For example:
'./breeze --static-check-all-files mypy' or
'./breeze --static-check-all-files mypy -- --verbose'
You can see all the options by adding --help EXTRA_ARG:
'./breeze --static-check-all-files mypy -- --help'
-e, --initialize-local-virtualenv
Initializes locally created virtualenv installing all dependencies of Airflow.
This local virtualenv can be used to aid autocompletion and IDE support as
well as run unit tests directly from the IDE. You need to have virtualenv
activated before running this command.
-a, --setup-autocomplete
Sets up autocomplete for breeze commands. Once you do it you need to re-enter the bash
shell and when typing breeze command <TAB> will provide autocomplete for parameters and values.
-b, --build-only
Only build docker images but do not enter the airflow-testing docker container.
-t, --test-target <TARGET>
Run the specified unit test target. There might be multiple
targets specified separated with comas. The <EXTRA_ARGS> passed after -- are treated
as additional options passed to nosetest. For example:
'./breeze --test-target tests.core -- --logging-level=DEBUG'
-x, --execute-command <COMMAND>
Run chosen command instead of entering the environment. The command is run using
'bash -c "<command with args>" if you need to pass arguments to your command, you need
to pass them together with command surrounded with " or '. Alternatively you can pass arguments as
<EXTRA_ARGS> passed after --. For example:
'./breeze --execute-command "ls -la"' or
'./breeze --execute-command ls -- --la'
-d, --docker-compose <COMMAND>
Run docker-compose command instead of entering the environment. Use 'help' command
to see available commands. The <EXTRA_ARGS> passed after -- are treated
as additional options passed to docker-compose. For example
'./breeze --docker-compose pull -- --ignore-pull-failures'
** General flags
-h, --help
Shows this help message.
-P, --python <PYTHON_VERSION>
Python version used for the image. This is always major/minor version.
One of [ 3.5 3.6 3.7 ]. Default is the python3 or python on the path.
-E, --env <ENVIRONMENT>
Environment to use for tests. It determines which types of tests can be run.
One of [ docker kubernetes ]. Default: docker
-B, --backend <BACKEND>
Backend to use for tests - it determines which database is used.
One of [ sqlite mysql postgres ]. Default: sqlite
-K, --kubernetes-version <KUBERNETES_VERSION>
Kubernetes version - only used in case of 'kubernetes' environment.
One of [ v1.13.0 ]. Default: v1.13.0
-M, --kubernetes-mode <KUBERNETES_MODE>
Kubernetes mode - only used in case of 'kubernetes' environment.
One of [ persistent_mode git_mode ]. Default: git_mode
-s, --skip-mounting-source-volume
Skips mounting local volume with sources - you get exactly what is in the
docker image rather than your current local sources of airflow.
-v, --verbose
Show verbose information about executed commands (enabled by default for running test)
-y, --assume-yes
Assume 'yes' answer to all questions.
-n, --assume-no
Assume 'no' answer to all questions.
-C, --toggle-suppress-cheatsheet
Toggles on/off cheatsheet displayed before starting bash shell
-A, --toggle-suppress-asciiart
Toggles on/off asciiart displayed before starting bash shell
** Dockerfile management flags
-D, --dockerhub-user
DockerHub user used to pull, push and build images. Default: apache.
-H, --dockerhub-repo
DockerHub repository used to pull, push, build images. Default: airflow.
-r, --force-build-images
Forces building of the local docker images. The images are rebuilt
automatically for the first time or when changes are detected in
package-related files, but you can force it using this flag.
-R, --force-build-images-clean
Force build images without cache. This will remove the pulled or build images
and start building images from scratch. This might take a long time.
-p, --force-pull-images
Forces pulling of images from DockerHub before building to populate cache. The
images are pulled by default only for the first time you run the
environment, later the locally build images are used as cache.
-u, --push-images
After rebuilding - uploads the images to DockerHub
It is useful in case you use your own DockerHub user to store images and you want
to build them locally. Note that you need to use 'docker login' before you upload images.
-c, --cleanup-images
Cleanup your local docker cache of the airflow docker images. This will not reclaim space in
docker cache. You need to 'docker system prune' (optionally with --all) to reclaim that space.
Internals of Airflow Breeze
===========================
Airflow Breeze is just a glorified bash script that is a "Swiss-Army-Knife" of Airflow testing. Under the
hood it uses other scripts that you can also run manually if you have problem with running the Breeze
environment. This chapter explains the inner details of Breeze.
Available Airflow Breeze environments
-------------------------------------
You can choose environment when you run Breeze with ``--env`` flag.
Running the default ``docker`` environment takes considerable amount of resources. You can run a slimmed-down
version of the environment - just the Apache Airflow container - by choosing ``bare`` environment instead.
The following environments are available:
* The ``docker`` environment (default): starts all dependencies required by full integration test-suite
(postgres, mysql, celery, etc.). This option is resource intensive so do not forget to
[Stop environment](#stopping-the-environment) when you are finished. This option is also RAM intensive
and can slow down your machine.
* The ``kubernetes`` environment: Runs airflow tests within a kubernetes cluster.
* The ``bare`` environment: runs airflow in docker without any external dependencies.
It will only work for non-dependent tests. You can only run it with sqlite backend.
Running manually static code checks
-----------------------------------
You can trigger the static checks from the host environment, without entering Docker container. You
do that by running appropriate scripts (The same is done in TravisCI)
* `<scripts/ci/ci_check_license.sh>`_ - checks if all licences are OK
* `<scripts/ci/ci_docs.sh>`_ - checks that documentation can be built without warnings
* `<scripts/ci/ci_flake8.sh>`_ - runs flake8 source code style guide enforcement tool
* `<scripts/ci/ci_lint_dockerfile.sh>`_ - runs lint checker for the Dockerfile
* `<scripts/ci/ci_mypy.sh>`_ - runs mypy type annotation consistency check
* `<scripts/ci/ci_pylint_main.sh>`_ - runs pylint static code checker for main files
* '`<scripts/ci/ci_pylint_tests.sh>`_ - runs pylint static code checker for tests
The scripts will ask to rebuild the images if needed.
You can force rebuilding of the images by deleting [.build](./build) directory. This directory keeps cached
information about the images already built and you can safely delete it if you want to start from the scratch.
After Documentation is built, the html results are available in [docs/_build/html](docs/_build/html) folder.
This folder is mounted from the host so you can access those files in your host as well.
Running manually static code checks in Docker
---------------------------------------------
If you are already in the Breeze Docker (by running ``./breeze`` command) you can also run the s
ame static checks from within container:
* Mypy: ``./scripts/ci/in_container/run_mypy.sh airflow tests``
* Pylint for main files: ``./scripts/ci/in_container/run_pylint_main.sh``
* Pylint for test files: ``./scripts/ci/in_container/run_pylint_tests.sh``
* Flake8: ``./scripts/ci/in_container/run_flake8.sh``
* Licence check: ``./scripts/ci/in_container/run_check_licence.sh``
* Documentation: ``./scripts/ci/in_container/run_docs_build.sh``
Running static code analysis for selected files
-----------------------------------------------
In all static check scripts - both in container and in the host you can also pass module/file path as
parameters of the scripts to only check selected modules or files. For example:
In container:
.. code-block::
./scripts/ci/in_container/run_pylint.sh ./airflow/example_dags/
or
.. code-block::
./scripts/ci/in_container/run_pylint.sh ./airflow/example_dags/test_utils.py
In host:
.. code-block::
./scripts/ci/ci_pylint.sh ./airflow/example_dags/
.. code-block::
./scripts/ci/ci_pylint.sh ./airflow/example_dags/test_utils.py
And similarly for other scripts.
Docker images used by Breeze
----------------------------
For all development tasks related integration tests and static code checks we are using Docker
images that are maintained in DockerHub under ``apache/airflow`` repository.
There are three images that we currently manage:
* **Slim CI** image that is used for static code checks (size around 500MB) - tag follows the pattern
of ``<BRANCH>-python<PYTHON_VERSION>-ci-slim`` (for example ``apache/airflow:master-python3.6-ci-slim``).
The image is built using the `<Dockerfile>`_ dockerfile.
* **Full CI image*** that is used for testing - containing a lot more test-related installed software
(size around 1GB) - tag follows the pattern of ``<BRANCH>-python<PYTHON_VERSION>-ci``
(for example ``apache/airflow:master-python3.6-ci``). The image is built using the
`<Dockerfile>`_ dockerfile.
* **Checklicence image** - an image that is used during licence check using Apache RAT tool. It does not
require any of the dependencies that the two CI images need so it is built using different Dockerfile
`<Dockerfile-checklicence>`_ and only contains Java + Apache RAT tool. The image is
labeled with ``checklicence`` label - for example ``apache/airflow:checklicence``. No versioning is used for
the checklicence image.
We also use a very small `<Dockerfile-context>`_ dockerfile in order to fix file permissions
for an obscure permission problem with Docker caching but it is not stored in ``apache/airflow`` registry.
Before you run tests or enter environment or run local static checks, the necessary local images should be
pulled and built from DockerHub. This happens automatically for the test environment but you need to
manually trigger it for static checks as described in `Building the images <#bulding-the-images>`_
and `Force pulling the images <#force-pulling-the-images>`_.
The static checks will fail and inform what to do if the image is not yet built.
Note that building the image first time pulls the pre-built version of images from DockerHub might take some
of time - but this wait-time will not repeat for subsequent source code changes.
However, changes to sensitive files like setup.py or Dockerfile will trigger a rebuild
that might take more time (but it is highly optimised to only rebuild what's needed)
In most cases re-building an image requires connectivity to network (for example to download new
dependencies). In case you work offline and do not want to rebuild the images when needed - you might set
``FORCE_ANSWER_TO_QUESTIONS`` variable to ``no`` as described in the
`Default behaviour for user interaction <#default-behaviour-for-user-interaction>`_ chapter.
See `Troubleshooting section <#troubleshooting>`_ for steps you can make to clean the environment.
Default behaviour for user interaction
--------------------------------------
Sometimes during the build user is asked whether to perform an action, skip it, or quit. This happens in case
of image rebuilding and image removal - they can take a lot of time and they are potentially destructive.
For automation scripts, you can export one of the three variables to control the default behaviour.
.. code-block::
export FORCE_ANSWER_TO_QUESTIONS="yes"
If ``FORCE_ANSWER_TO_QUESTIONS`` is set to ``yes``, the images will automatically rebuild when needed.
Images are deleted without asking.
.. code-block::
export FORCE_ANSWER_TO_QUESTIONS="no"
If ``FORCE_ANSWER_TO_QUESTIONS`` is set to ``no``, the old images are used even if re-building is needed.
This is useful when you work offline. Deleting images is aborted.
.. code-block::
export FORCE_ANSWER_TO_QUESTIONS="quit"
If ``FORCE_ANSWER_TO_QUESTIONS`` is set to ``quit``, the whole script is aborted. Deleting images is aborted.
If more than one variable is set, YES takes precedence over NO which take precedence over QUIT.
Running the whole suite of tests via scripts
--------------------------------------------
Running all tests with default settings (python 3.6, sqlite backend, docker environment):
.. code-block::
./scripts/ci/local_ci_run_airflow_testing.sh
Selecting python version, backend, docker environment:
.. code-block::
PYTHON_VERSION=3.5 BACKEND=postgres ENV=docker ./scripts/ci/local_ci_run_airflow_testing.sh
Running kubernetes tests:
.. code-block::
KUBERNETES_VERSION==v1.13.0 KUBERNETES_MODE=persistent_mode BACKEND=postgres ENV=kubernetes \
./scripts/ci/local_ci_run_airflow_testing.sh
* PYTHON_VERSION might be one of 3.5/3.6/3.7
* BACKEND might be one of postgres/sqlite/mysql
* ENV might be one of docker/kubernetes/bare
* KUBERNETES_VERSION - required for Kubernetes tests - currently KUBERNETES_VERSION=v1.13.0.
* KUBERNETES_MODE - mode of kubernetes, one of persistent_mode, git_mode
The available environments are described in ``
Fixing file/directory ownership
-------------------------------
On Linux there is a problem with propagating ownership of created files (known Docker problem). Basically
files and directories created in container are not owned by the host user (but by the root user in our case).
This might prevent you from switching branches for example if files owned by root user are created within
your sources. In case you are on Linux host and haa some files in your sources created by the root user,
you can fix the ownership of those files by running
.. code-block::
./scripts/ci/local_ci_fix_ownership.sh
Building the images
-------------------
You can manually trigger building of the local images using:
.. code-block::
./scripts/ci/local_ci_build.sh
The scripts that build the images are optimised to minimise the time needed to rebuild the image when
the source code of Airflow evolves. This means that if you already had the image locally downloaded and built,
the scripts will determine, the rebuild is needed in the first place. Then it will make sure that minimal
number of steps are executed to rebuild the parts of image (for example PIP dependencies) that will give
you an image consistent with the one used during Continuous Integration.
Force pulling the images
------------------------
You can also force-pull the images before building them locally so that you are sure that you download
latest images from DockerHub repository before building. This can be done with:
.. code-block::
./scripts/ci/local_ci_pull_and_build.sh
Convenience scripts
-------------------
Once you run ./breeze you can also execute various actions via generated convenience scripts
.. code-block::
Enter the environment : ./.build/cmd_run
Run command in the environment : ./.build/cmd_run "[command with args]" [bash options]
Run tests in the environment : ./.build/test_run [test-target] [nosetest options]
Run Docker compose command : ./.build/dc [help/pull/...] [docker-compose options]