incubator-airflow/BREEZE.rst

1047 строки
43 KiB
ReStructuredText

.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
.. image:: images/AirflowBreeze_logo.png
:align: center
:alt: Airflow Breeze Logo
.. contents:: :local:
About Airflow Breeze
====================
Airflow Breeze is an easy-to-use integration test environment managed via
`Docker Compose <https://docs.docker.com/compose/>`_.
The environment is available for local use and is also integrated into Airflow's CI Travis tests.
We called it *Airflow Breeze* as **It's a Breeze to develop Airflow**.
The advantages and disadvantages of using the Breeze environment vs. other ways of testing Airflow
are described in `CONTRIBUTING.rst <CONTRIBUTING.rst#integration-test-development-environment>`_.
Here is a short 10-minute video about Airflow Breeze:
.. image:: http://img.youtube.com/vi/ffKFHV6f3PQ/0.jpg
:width: 480px
:height: 360px
:scale: 100 %
:alt: Airflow Breeze Simplified Development Workflow
:align: center
:target: http://www.youtube.com/watch?v=ffKFHV6f3PQ
Prerequisites
=============
Docker Community Edition
------------------------
- **Version**: Install the latest stable Docker Community Edition and add it to the PATH.
- **Permissions**: Configure to run the ``docker`` commands directly and not only via root user.
Your user should be in the ``docker`` group.
See `Docker installation guide <https://docs.docker.com/install/>`_ for details.
- **Disk space**: On macOS, increase your available disk space before starting to work with
the environment. At least 128 GB of free disk space is recommended. You can also get by with a
smaller space but make sure to clean up the Docker disk space periodically.
See also `Docker for Mac - Space <https://docs.docker.com/docker-for-mac/space>`_ for details
on increasing disk space available for Docker on Mac.
- **Docker problems**: Sometimes it is not obvious that space is an issue when you run into
a problem with Docker. If you see a weird behaviour, try
`cleaning up the images <#cleaning-up-the-images>`_. Also see
`pruning <https://docs.docker.com/config/pruning/>`_ instructions from Docker.
Docker Compose
--------------
- **Version**: Install the latest stable Docker Compose and add it to the PATH.
See `Docker Compose Installation Guide <https://docs.docker.com/compose/install/>`_ for details.
- **Permissions**: Configure to run the ``docker-compose`` command.
Docker Images Used by Breeze
----------------------------
For all development tasks, related integration tests and static code checks, we use Docker
images maintained on the Docker Hub in the ``apache/airflow`` repository.
There are three images that we are currently managing:
* **Slim CI** image that is used for static code checks (size of ~500MB). Its tag follows the pattern
of ``<BRANCH>-python<PYTHON_VERSION>-ci-slim`` (for example, ``apache/airflow:master-python3.6-ci-slim``).
The image is built using the `<Dockerfile>`_ Dockerfile.
* **Full CI image*** that is used for testing. It contains a lot more test-related installed software
(size of ~1GB). Its tag follows the pattern of ``<BRANCH>-python<PYTHON_VERSION>-ci``
(for example, ``apache/airflow:master-python3.6-ci``). The image is built using the
`<Dockerfile>`_ Dockerfile.
* **Checklicense image** that is used during license check with the Apache RAT tool. It does not
require any of the dependencies that the two CI images need so it is built using a different Dockerfile
`<Dockerfile-checklicence>`_ and only contains Java + Apache RAT tool. The image is
labelled with ``checklicence``, for example: ``apache/airflow:checklicence``. No versioning is used for
the Checklicence image.
Before you run tests, enter the environment or run local static checks, the necessary local images should be
pulled and built from Docker Hub. This happens automatically for the test environment but you need to
manually trigger it for static checks as described in `Building the images <#bulding-the-images>`_
and `Pulling the latest images <#pulling-the-latest-images>`_.
The static checks will fail and inform what to do if the image is not yet built.
Building the image first time pulls a pre-built version of images from the Docker Hub, which may take some
time. But for subsequent source code changes, no wait time is expected.
However, changes to sensitive files like setup.py or Dockerfile will trigger a rebuild
that may take more time though it is highly optimized to only rebuild what is needed.
In most cases, rebuilding an image requires network connectivity (for example, to download new
dependencies). If you work offline and do not want to rebuild the images when needed, you can set the
``FORCE_ANSWER_TO_QUESTIONS`` variable to ``no`` as described in the
`Default behaviour for user interaction <#default-behaviour-for-user-interaction>`_ section.
See `Troubleshooting section <#troubleshooting>`_ for steps you can make to clean the environment.
Getopt and gstat
----------------
* For macOS, install GNU ``getopt`` and ``gstat`` utilities to get Airflow Breeze running.
Run ``brew install gnu-getopt coreutils`` and then follow instructions to link the gnu-getopt version to
become the first on the PATH. Make sure to re-login after you make the suggested changes.
If you use bash, run this command and re-login:
.. code-block:: bash
echo 'export PATH="/usr/local/opt/gnu-getopt/bin:$PATH"' >> ~/.bash_profile
. ~/.bash_profile
..
If you use zsh, run this command and re-login:
.. code-block:: bash
echo 'export PATH="/usr/local/opt/gnu-getopt/bin:$PATH"' >> ~/.zprofile
. ~/.zprofile
* For Linux, run ``apt install util-linux coreutils`` or an equivalent if your system is not Debian-based.
Memory
------
Minimum 4GB RAM is required to run the full Breeze environment.
On macOS, 2GB of RAM are available for your Docker containers by default, but more memory is recommended
(4GB should be comfortable). For details see
`Docker for Mac - Advanced tab <https://docs.docker.com/v17.12/docker-for-mac/#advanced-tab>`_.
Airflow Directory Structure inside Docker
-----------------------------------------
When you are in the container, the following directories are used:
.. code-block:: text
/opt/airflow - Contains sources of Airflow mounted from the host (AIRFLOW_SOURCES).
/root/airflow - Contains all the "dynamic" Airflow files (AIRFLOW_HOME), such as:
airflow.db - sqlite database in case sqlite is used;
dags - folder with non-test dags (test dags are in /opt/airflow/tests/dags);
logs - logs from Airflow executions;
unittest.cfg - unit test configuration generated when entering the environment;
webserver_config.py - webserver configuration generated when running Airflow in the container.
Note that when running in your local environment, the ``/root/airflow/logs`` folder is actually mounted
from your ``logs`` directory in the Airflow sources, so all logs created in the container are automatically
visible in the host as well. Every time you enter the container, the ``logs`` directory is
cleaned so that logs do not accumulate.
Using the Airflow Breeze Environment
=====================================
Airflow Breeze is a bash script serving as a "swiss-army-knife" of Airflow testing. Under the
hood it uses other scripts that you can also run manually if you have problem with running the Breeze
environment.
Breeze script allows performing the following tasks:
* Enter an interactive environment when no command flags are specified (default behaviour).
* Stop the interactive environment with ``-k``, ``--stop-environment`` command.
* Build a Docker image with ``-b``, ``--build-only`` command.
* Set up autocomplete for itself with ``-a``, ``--setup-autocomplete`` command.
* Build documentation with ``-O``, ``--build-docs`` command.
* Run static checks either for currently staged change or for all files with ``-S``, ``--static-check``
or ``-F``, ``--static-check-all-files`` commands.
* Set up local virtualenv with ``-e``, ``--setup-virtualenv`` command.
* Run a test target specified with ``-t``, ``--test-target`` command.
* Execute an arbitrary command in the test environment with ``-x``, ``--execute-command`` command.
* Execute an arbitrary docker-compose command with ``-d``, ``--docker-compose`` command.
Entering Breeze
---------------
You enter the Breeze integration test environment by running the ``./breeze`` script. You can run it with
the ``--help`` option to see the list of available flags. See `Airflow Breeze flags <#airflow-breeze-flags>`_
for details.
.. code-block:: bash
./breeze
First time you run Breeze, it pulls and builds a local version of Docker images.
It pulls the latest Airflow CI images from `Airflow DockerHub <https://hub.docker.com/r/apache/airflow>`_
and use them to build your local Docker images. Note that the first run (per python) might take up to 10
minutes on a fast connection to start. Subsequent runs should be much faster.
Once you enter the environment, you are dropped into bash shell of the Airflow container and you can
run tests immediately.
You can `set up autocomplete <#setting-up-autocomplete>`_ for commands and add the
checked-out Airflow repository to your PATH to run Breeze without the ./ and from any directory.
Stopping Breeze
---------------
After starting up, the environment runs in the background and takes precious memory.
You can always stop it via:
.. code-block:: bash
./breeze --stop-environment
Choosing a Breeze Environment
-----------------------------
You can use additional ``breeze`` flags to customize your environment. For example, you can specify a Python
version to use, backend and a container environment for testing. With Breeze, you can recreate the same
environments as we have in matrix builds in Travis CI.
For example, you can choose to run Python 3.6 tests with MySQL as backend and in the Docker environment as
follows:
.. code-block:: bash
./breeze --python 3.6 --backend mysql --env docker
The choices you make are persisted in the ``./.build/`` cache directory so that next time when you use the
``breeze`` script, it could use the values that were used previously. This way you do not have to specify
them when you run the script. You can delete the ``.build/`` directory in case you want to restore the
default settings.
The defaults when you run the Breeze environment are Python 3.6, Sqlite, and Docker.
Available Docker Environments
..............................
You can choose a container environment when you run Breeze with ``--env`` flag.
Running the default ``docker`` environment takes a considerable amount of resources. You can run a
slimmed-down version of the environment - just the Apache Airflow container - by choosing ``bare``
environment instead.
The following environments are available:
* The ``docker`` environment (default): starts all dependencies required by a full integration test suite
(Postgres, Mysql, Celery, etc). This option is resource intensive so do not forget to
[stop environment](#stopping-the-environment) when you are finished. This option is also RAM intensive
and can slow down your machine.
* The ``kubernetes`` environment: Runs Airflow tests within a Kubernetes cluster.
* The ``bare`` environment: runs Airflow in the Docker without any external dependencies.
It only works for independent tests. You can only run it with the sqlite backend.
Cleaning Up the Environment
---------------------------
You may need to clean up your Docker environment occasionally. The images are quite big
(1.5GB for both images needed for static code analysis and CI tests) and, if you often rebuild/update
them, you may end up with some unused image data.
To clean up the Docker environment:
1. `Stop Breeze <#stopping-breeze>`_ with ``./breeze --stop-environment``.
2. Run the ``docker system prune`` command.
3. Run ``docker images --all`` and ``docker ps --all`` to verify that your Docker is clean.
Both commands should return an empty list of images and containers respectively.
If you run into disk space errors, consider pruning your Docker images with the ``docker system prune --all``
command. You may need to restart the Docker Engine before running this command.
In case of disk space errors on macOS, increase the disk space available for Docker. See
`Prerequisites <#prerequisites>`_ for details.
Building the Images
-------------------
You can manually trigger building the local images using the script:
.. code-block::
./scripts/ci/local_ci_build.sh
The scripts that build the images are optimized to minimize the time needed to rebuild the image when
the source code of Airflow evolves. This means that if you already have the image locally downloaded and
built, the scripts will determine whether the rebuild is needed in the first place. Then the scripts will
make sure that minimal number of steps are executed to rebuild parts of the image (for example,
PIP dependencies) and will give you an image consistent with the one used during Continuous Integration.
Pulling the Latest Images
-------------------------
Sometimes the image on the Docker Hub needs to be rebuilt from scratch. This is required, for example,
when there is a security update of the Python version that all the images are based on.
In this case it is usually faster to pull the latest images rather than rebuild them
from scratch.
You can do it via the ``--force-pull-images`` flag to force pulling the latest images from the Docker Hub.
To manually force pulling the images for static checks, use the script:
.. code-block::
./scripts/ci/local_ci_pull_and_build.sh
In the future Breeze will warn you when you are recommended to pull images.
Running Arbitrary Commands in the Breeze Environment
-------------------------------------------------------
To run other commands/executables inside the Breeze Docker-based environment, use the
``-x``, ``--execute-command`` flag. To add arguments, specify them
together with the command surrounded with either ``"`` or ``'``, or pass them after -- as extra arguments.
.. code-block:: bash
./breeze --execute-command "ls -la"
.. code-block:: bash
./breeze --execute-command ls -- --la
Running Docker Compose Commands
-------------------------------
To run Docker Compose commands (such as ``help``, ``pull``, etc), use the
``-d``, ``--docker-compose`` flag. To add extra arguments, specify them
after -- as extra arguments.
.. code-block:: bash
./breeze --docker-compose pull -- --ignore-pull-failures
Mounting Local Sources to Breeze
--------------------------------
Important sources of Airflow are mounted inside the ``airflow-testing`` container that you enter.
This means that you can continue editing your changes on the host in your favourite IDE and have them
visible in the Docker immediately and ready to test without rebuilding images. You can disable mounting
by specifying ``--skip-mounting-source-volume`` flag when running Breeze. In this case you will have sources
embedded in the container and changes to these sources will not be persistent.
After you run Breeze for the first time, you will have an empty directory ``files`` in your source code,
which will be mapped to ``/files`` in your Docker container. You can pass there any files you need to
configure and run Docker. They will not be removed between Docker runs.
Adding/Modifying Dependencies
-----------------------------
If you need to change apt dependencies in the ``Dockerfile``, add Python packages in ``setup.py`` or
add javascript dependencies in ``package.json``, you can either add dependencies temporarily for a single
Breeze session or permanently in ``setup.py``, ``Dockerfile``, or ``package.json`` files.
Installing Dependencies for a Single Breeze Session
...................................................
You can install dependencies inside the container using ``sudo apt install``, ``pip install`` or
``npm install`` (in ``airflow/www`` folder) respectively. This is useful if you want to test something
quickly while you are in the container. However, these changes are not retained: they disappear once you
exit the container (except for theh npm dependencies if your sources are mounted to the container).
Therefore, if you want to retain a new dependency, follow the second option described below.
Adding Dependencies Permanently
...............................
You can add dependencies to the ``Dockerfile``, ``setup.py`` or ``package.json`` and rebuild the image. This
should happen automatically if you modify any of these files.
After you exit the container and re-run ``breeze``, Breeze detects changes in dependencies,
asks you to confirm rebuilding the image and proceeds with rebuilding if you confirm (or skip it
if you do not confirm). After rebuilding is done, Breeze drops you to shell. You may also provide the
``--build-only`` flag to only rebuild images and not to go into shell.
Changing apt Dependencies in the Dockerfile
....................................................
During development, changing dependencies in ``apt-get`` closer to the top of the ``Dockerfile``
invalidates cache for most of the image. It takes long time for Breeze to rebuild the image.
So, it is a recommended practice to add new dependencies initially closer to the end
of the ``Dockerfile``. This way dependencies will be added incrementally.
Before merge, these dependencies should be moved to the appropriate ``apt-get install`` command,
which is already in the ``Dockerfile``.
Port Forwarding
---------------
When you run Airflow Breeze, the following ports are automatically forwarded:
* 28080 -> forwarded to Airflow webserver -> airflow-testing:8080
* 25433 -> forwarded to Postgres database -> postgres:5432
* 23306 -> forwarded to MySQL database -> mysql:3306
You can connect to these ports/databases using:
* Webserver: ``http://127.0.0.1:28080``
* Postgres: ``jdbc:postgresql://127.0.0.1:25433/airflow?user=postgres&password=airflow``
* Mysql: ``jdbc:mysql://localhost:23306/airflow?user=root``
Start the webserver manually with the ``airflow webserver`` command if you want to connect
to the webserver. You can use ``tmux`` to multiply terminals.
For databases, you need to run ``airflow db reset`` at least once (or run some tests) after you started
Airflow Breeze to get the database/tables created. You can connect to databases with IDE or any other
database client:
.. image:: images/database_view.png
:align: center
:alt: Database view
You can change the used host port numbers by setting appropriate environment variables:
* ``WEBSERVER_HOST_PORT``
* ``POSTGRES_HOST_PORT``
* ``MYSQL_HOST_PORT``
If you set these variables, next time when you enter the environment the new ports should be in effect.
Setting Up Autocompletion
-------------------------
The ``breeze`` command comes with a built-in bash/zsh autocomplete option for its flags. When you start typing
the command, you can use <TAB> to show all the available switches and get autocompletion on typical
values of parameters that you can use.
You can set up the autocomplete option automatically by running:
.. code-block:: bash
./breeze --setup-autocomplete
You get the autocompletion working when you re-enter the shell.
Zsh autocompletion is currently limited to only autocomplete flags. Bash autocompletion also completes
flag values (for example, Python version or static check name).
Setting Defaults for User Interaction
--------------------------------------
Sometimes during the build, you are asked whether to perform an action, skip it, or quit. This happens
when rebuilding or removing an image - actions that take a lot of time and could be potentially destructive.
For automation scripts, you can export one of the three variables to control the default
interaction behaviour:
.. code-block::
export FORCE_ANSWER_TO_QUESTIONS="yes"
If ``FORCE_ANSWER_TO_QUESTIONS`` is set to ``yes``, the images are automatically rebuilt when needed.
Images are deleted without asking.
.. code-block::
export FORCE_ANSWER_TO_QUESTIONS="no"
If ``FORCE_ANSWER_TO_QUESTIONS`` is set to ``no``, the old images are used even if rebuilding is needed.
This is useful when you work offline. Deleting images is aborted.
.. code-block::
export FORCE_ANSWER_TO_QUESTIONS="quit"
If ``FORCE_ANSWER_TO_QUESTIONS`` is set to ``quit``, the whole script is aborted. Deleting images is aborted.
If more than one variable is set, ``yes`` takes precedence over ``no``, which takes precedence over ``quit``.
Building the Documentation
--------------------------
To build documentation in Breeze, use the ``-O``, ``--build-docs`` command:
.. code-block:: bash
./breeze --build-docs
Results of the build can be found in the ``docs/_build`` folder.
Often errors during documentation generation come from the docstrings of auto-api generated classes.
During the docs building auto-api generated files are stored in the ``docs/_api`` folder. This helps you
easily identify the location the problems with documentation originated from.
Testing and Debugging in Breeze
===============================
Debugging with ipdb
-------------------
You can debug any code you run in the container using ``ipdb`` debugger if you prefer console debugging.
It is as easy as copy&pasting this line into your code:
.. code-block:: python
import ipdb; ipdb.set_trace()
Once you hit the line, you will be dropped into an interactive ``ipdb`` debugger where you have colors
and autocompletion to guide your debugging. This works from the console where you started your program.
Note that in case of ``nosetest`` you need to provide the ``--nocapture`` flag to avoid nosetests
capturing the stdout of your process.
Running Unit Tests in Airflow Breeze
------------------------------------
Once you enter Airflow Breeze environment, you can simply use
``run-tests`` at will. Note that if you want to pass extra parameters to ``nose``,
you should do it after '--'.
For example, to execute the "core" unit tests, run the following:
.. code-block:: bash
run-tests tests.core:TestCore -- -s --logging-level=DEBUG
For a single test method, run:
.. code-block:: bash
run-tests tests.core:TestCore.test_check_operators -- -s --logging-level=DEBUG
The tests run ``airflow db reset`` and ``airflow db init`` the first time you
launch them in a running container, so you can count on the database being initialized.
All subsequent test executions within the same container will run without database
initialization.
You can also optionally add the ``--with-db-init`` flag if you want to re-initialize
the database.
.. code-block:: bash
run-tests --with-db-init tests.core:TestCore.test_check_operators -- -s --logging-level=DEBUG
Running Tests for a Specified Target
------------------------------------
If you wish to only run tests and not to drop into shell, you can do this by providing the
-t, --test-target flag. You can add extra nosetest flags after -- in the command line.
.. code-block:: bash
./breeze --test-target tests/hooks/test_druid_hook.py -- --logging-level=DEBUG
You can run the whole test suite with a special '.' test target:
.. code-block:: bash
./breeze --test-target .
You can also specify individual tests or a group of tests:
.. code-block:: bash
./breeze --test-target tests.core:TestCore
Running Static Code Checks
--------------------------
We have a number of static code checks that are run in Travis CI but you can also run them locally
in the Docker environment. All these tests run in Python 3.5 environment.
The first time you run the checks, it may take some time to rebuild the Docker images. But all the
subsequent runs will be much faster since the build phase will just check whether your code has changed
and rebuild as needed.
The static code checks launched in the Breeze Docker-based environment do not need a special environment
preparation and provide the same results as the similar tests launched in Travis CI.
You run the checks via ``-S``, ``--static-check`` flags or ``-F``, ``--static-check-all-files``.
The former ones run appropriate checks only for files changed and staged locally, the latter ones run checks
on all files.
Note that it may take a lot of time to run checks for all files with pylint on macOS due to a slow
filesystem for macOS Docker. As a workaround, you can add their arguments after ``--`` as extra arguments.
You cannot pass the ``--files`` flag if you select the ``--static-check-all-files`` option.
You can see the list of available static checks either via ``--help`` flag or by using the autocomplete
option. Note that the ``all`` static check runs all configured static checks. Also since pylint tests take
a lot of time, you can run a special ``all-but-pylint`` check that skips pylint checks.
Run the ``mypy`` check for the currently staged changes:
.. code-block:: bash
./breeze --static-check mypy
Run the ``mypy`` check for all files:
.. code-block:: bash
./breeze --static-check-all-files mypy
Run the ``flake8`` check for the ``tests.core.py`` file with verbose output:
.. code-block:: bash
./breeze --static-check flake8 -- --files tests/core.py --verbose
Run the ``flake8`` check for the ``tests.core`` package with verbose output:
.. code-block:: bash
./breeze --static-check mypy -- --files tests/hooks/test_druid_hook.py
Run all tests for the currently staged files:
.. code-block:: bash
./breeze --static-check all
Run all tests for all files:
.. code-block:: bash
./breeze --static-check-all-files all
Run all tests but pylint for all files:
.. code-block:: bash
./breeze --static-check-all-files all-but-pylint
Run pylint checks for all changed files:
.. code-block:: bash
./breeze --static-check pylint
Run pylint checks for selected files:
.. code-block:: bash
./breeze --static-check pylint -- --files airflow/configuration.py
Run pylint checks for all files:
.. code-block:: bash
./breeze --static-check-all-files pylint
The ``license`` check is run via a separate script and a separate Docker image containing the
Apache RAT verification tool that checks for Apache-compatibility of licenses within the codebase.
It does not take pre-commit parameters as extra arguments.
.. code-block:: bash
./breeze --static-check-all-files licenses
Running Static Code Checks from the Host
----------------------------------------
You can trigger the static checks from the host environment, without entering the Docker container. To do
this, run the following scripts (the same is done in Travis CI):
* `<scripts/ci/ci_check_license.sh>`_ - checks the licenses.
* `<scripts/ci/ci_docs.sh>`_ - checks that documentation can be built without warnings.
* `<scripts/ci/ci_flake8.sh>`_ - runs Flake8 source code style enforcement tool.
* `<scripts/ci/ci_lint_dockerfile.sh>`_ - runs lint checker for the Dockerfile.
* `<scripts/ci/ci_mypy.sh>`_ - runs a check for mypy type annotation consistency.
* `<scripts/ci/ci_pylint_main.sh>`_ - runs pylint static code checker for main files.
* '`<scripts/ci/ci_pylint_tests.sh>`_ - runs pylint static code checker for tests.
The scripts may ask you to rebuild the images, if needed.
You can force rebuilding the images by deleting the [.build](./build) directory. This directory keeps cached
information about the images already built and you can safely delete it if you want to start from scratch.
After documentation is built, the HTML results are available in the [docs/_build/html](docs/_build/html)
folder. This folder is mounted from the host so you can access those files on your host as well.
Running Static Code Checks in the Docker
------------------------------------------
If you are already in the Breeze Docker environment (by running the ``./breeze`` command),
you can also run the same static checks from the container:
* Mypy: ``./scripts/ci/in_container/run_mypy.sh airflow tests``
* Pylint for main files: ``./scripts/ci/in_container/run_pylint_main.sh``
* Pylint for test files: ``./scripts/ci/in_container/run_pylint_tests.sh``
* Flake8: ``./scripts/ci/in_container/run_flake8.sh``
* License check: ``./scripts/ci/in_container/run_check_licence.sh``
* Documentation: ``./scripts/ci/in_container/run_docs_build.sh``
Running Static Code Analysis for Selected Files
-----------------------------------------------
In all static check scripts, both in the container and host versions, you can also pass a module/file path as
parameters of the scripts to only check selected modules or files. For example:
In the Docker container:
.. code-block::
./scripts/ci/in_container/run_pylint.sh ./airflow/example_dags/
or
.. code-block::
./scripts/ci/in_container/run_pylint.sh ./airflow/example_dags/test_utils.py
On the host:
.. code-block::
./scripts/ci/ci_pylint.sh ./airflow/example_dags/
.. code-block::
./scripts/ci/ci_pylint.sh ./airflow/example_dags/test_utils.py
Running Test Suites via Scripts
--------------------------------------------
To run all tests with default settings (Python 3.6, Sqlite backend, "docker" environment), enter:
.. code-block::
./scripts/ci/local_ci_run_airflow_testing.sh
To select Python 3.5 version, Postgres backend, and a "docker" environment, specify:
.. code-block::
PYTHON_VERSION=3.5 BACKEND=postgres ENV=docker ./scripts/ci/local_ci_run_airflow_testing.sh
To run Kubernetes tests, enter:
.. code-block::
KUBERNETES_VERSION==v1.13.0 KUBERNETES_MODE=persistent_mode BACKEND=postgres ENV=kubernetes \
./scripts/ci/local_ci_run_airflow_testing.sh
* PYTHON_VERSION is one of 3.5/3.6/3.7
* BACKEND is one of postgres/sqlite/mysql
* ENV is one of docker/kubernetes/bare
* KUBERNETES_VERSION is required for Kubernetes tests. Currently, it is KUBERNETES_VERSION=v1.13.0.
* KUBERNETES_MODE is a mode of kubernetes: either persistent_mode or git_mode.
Using Your Host IDE with Breeze
===============================
Configuring local virtualenv
----------------------------
To use your host IDE (for example, IntelliJ's PyCharm/Idea), you need to set up virtual environments.
Ideally, you should have virtualenvs for all Python versions supported by Airflow (3.5, 3.6, 3.7).
You can create a virtualenv using ``virtualenvwrapper``. This allows you to easily switch between
virtualenvs using the ``workon`` command and manage your virtual environments more easily.
Typically creating the environment can be done by:
.. code-block:: bash
mkvirtualenv <ENV_NAME> --python=python<VERSION>
After the virtualenv is created, you need to initialize it. Simply enter the environment by
using ``workon`` and, once you are in it, run:
.. code-block:: bash
./breeze --initialize-local-virtualenv
Once initialization is done, select the virtualenv you initialized as a default project
virtualenv in your IDE.
Running Unit Tests via IDE
--------------------------
When setup is done, you can use the usual **Run Test** option of the IDE, have all the
autocomplete and documentation support from IDE as well as you can debug and click-through
the sources of Airflow, which is very helpful during development. Usually you can also run most
of the unit tests (those that do not have dependencies) directly from the IDE:
Running unit tests from IDE is as simple as:
.. image:: images/running_unittests.png
:align: center
:alt: Running unit tests
Some of the core tests use dags defined in ``tests/dags`` folder. Those tests should have
``AIRFLOW__CORE__UNIT_TEST_MODE`` set to True. You can set it up in your test configuration:
.. image:: images/airflow_unit_test_mode.png
:align: center
:alt: Airflow Unit test mode
You cannot run all the tests this way but only unit tests that do not require external dependencies
such as Postgres/MySQL/Hadoop/etc. You should use the
`run-tests <#running-tests-in-airflow-breeze>`_ command for these tests. You can
still use your IDE to debug those tests as explained in the next section.
Debugging Airflow Breeze Tests in IDE
-------------------------------------
When you run example DAGs, even if you run them using unit tests within IDE, they are run in a separate
container. This makes it a little harder to use with IDE built-in debuggers.
Fortunately, IntelliJ/PyCharm provides an effective remote debugging feature (but only in paid versions).
See additional details on
`remote debugging <https://www.jetbrains.com/help/pycharm/remote-debugging-with-product.html>`_.
You can set up your remote debugging session as follows:
.. image:: images/setup_remote_debugging.png
:align: center
:alt: Setup remote debugging
Note that on macOS, you have to use a real IP address of your host rather than default
localhost because on macOS the container runs in a virtual machine with a different IP address.
Make sure to configure source code mapping in the remote debugging configuration to map
your local sources to the ``/opt/airflow`` location of the sources within the container:
.. image:: images/source_code_mapping_ide.png
:align: center
:alt: Source code mapping
Breeze Command-Line Interface Reference
=======================================
Airflow Breeze Syntax
---------------------
This is the current syntax for `./breeze <./breeze>`_:
.. code-block:: text
Usage: breeze [FLAGS] \
[-k]|[-S <STATIC_CHECK>]|[-F <STATIC_CHECK>]|[-O]|[-e]|[-a]|[-b]|[-t <TARGET>]|[-x <COMMAND>]|[-d <COMMAND>] \
-- <EXTRA_ARGS>
Commands
By default, the ``breeze`` script enters an IT environment and drops you to a bash shell,
but you can also choose commands to run specific actions instead:
-k, --stop-environment
Stops running a Docker Compose environment. When you start the environment, the Docker
containers continue running so that startup time is shorter. But they take quite a lot of
memory and CPU. This command stops all running containers in the environment.
-O, --build-docs
Builds documentation.
-S, --static-check <STATIC_CHECK>
Runs selected static checks for currently changed files. Specify a static check that
you would like to run or use 'all' to run all checks. One of
[ all all-but-pylint check-hooks-apply check-merge-conflict check-executables-have-shebangs
check-xml detect-private-key doctoc end-of-file-fixer flake8 forbid-tabs insert-license
check-apache-license lint-dockerfile mixed-line-ending mypy pylint shellcheck].
You can pass extra arguments including options to the pre-commit framework as
<EXTRA_ARGS> passed after --. For example:
'./breeze --static-check mypy' or
'./breeze --static-check mypy -- --files tests/core.py'
You can see all the options by adding --help EXTRA_ARG:
'./breeze --static-check mypy -- --help'
-F, --static-check-all-files <STATIC_CHECK>
Runs selected static checks for all applicable files. Specify a static check that
you would like to run or use 'all' to run all checks. One of
[ all all-but-pylint check-hooks-apply check-merge-conflict check-executables-have-shebangs
check-xml detect-private-key doctoc end-of-file-fixer flake8 forbid-tabs insert-license
check-apache-license lint-dockerfile mixed-line-ending mypy pylint shellcheck].
You can pass extra arguments including options to the pre-commit framework as
<EXTRA_ARGS> passed after --. For example:
'./breeze --static-check-all-files mypy' or
'./breeze --static-check-all-files mypy -- --verbose'
You can see all the options by adding --help EXTRA_ARG:
'./breeze --static-check-all-files mypy -- --help'
-e, --initialize-local-virtualenv
Initializes a locally created virtualenv installing all dependencies of Airflow.
This local virtualenv can be used to aid autocompletion and IDE support as
well as run unit tests directly from the IDE. You need to have virtualenv
activated before running this command.
-a, --setup-autocomplete
Sets up autocompletion for breeze commands. Once you do it, you need to re-enter the bash
shell. When you type the breeze command, <TAB> will autocomplete parameters and values.
-b, --build-only
Only builds Docker images but does not enter the airflow-testing Docker container.
-t, --test-target <TARGET>
Runs the specified unit test target. You can specify multiple
targets separated with commas. The <EXTRA_ARGS> passed after -- are treated
as additional options passed to nosetest. For example:
'./breeze --test-target tests.core -- --logging-level=DEBUG'
-x, --execute-command <COMMAND>
Runs the specified command instead of entering the environment. The command is run using
'bash -c "<command with args>". If you need to pass arguments to your command, you need
to pass them together with the command surrounded with " or '. Alternatively, you can pass
arguments as <EXTRA_ARGS> passed after --. For example:
'./breeze --execute-command "ls -la"' or
'./breeze --execute-command ls -- --la'
-d, --docker-compose <COMMAND>
Runs the docker-compose command instead of entering the environment. Use the 'help' command
to see available commands. The <EXTRA_ARGS> passed after -- are treated
as additional options passed to docker-compose. For example:
'./breeze --docker-compose pull -- --ignore-pull-failures'
** General flags
-h, --help
Shows this help message.
-P, --python <PYTHON_VERSION>
Specifies a Python version for the image. This is always major/minor version.
One of [ 3.5 3.6 3.7 ]. Default is the python3 or python on the path.
-E, --env <ENVIRONMENT>
Specifies an environment for tests. The environment determines which types of tests can be run.
One of [ docker kubernetes ]. Default: docker.
-B, --backend <BACKEND>
Specifies backend for tests. It determines which database is used.
One of [ sqlite mysql postgres ]. Default: sqlite.
-K, --kubernetes-version <KUBERNETES_VERSION>
Specifies Kubernetes version. The flag is applicable if the 'kubernetes' environment is used.
One of [ v1.13.0 ]. Default: v1.13.0.
-M, --kubernetes-mode <KUBERNETES_MODE>
Specifies Kubernetes mode. The flag is applicable if the 'kubernetes' environment is used.
One of [ persistent_mode git_mode ]. Default: git_mode.
-s, --skip-mounting-source-volume
Skips mounting local volume with sources. You get exactly what is in the
Docker image rather than your current local sources of Airflow.
-v, --verbose
Shows verbose information about executed commands (enabled by default for running tests).
-y, --assume-yes
Assumes 'yes' answer to all questions.
-n, --assume-no
Assumes 'no' answer to all questions.
-C, --toggle-suppress-cheatsheet
Toggles on/off the cheatsheet displayed before starting bash shell.
-A, --toggle-suppress-asciiart
Toggles on/off asciiart displayed before starting bash shell.
** Dockerfile management flags
-D, --dockerhub-user
Specifies a Docker Hub user that pulls, pushes and builds images. Default: apache.
-H, --dockerhub-repo
Specifies a Docker Hub repository used to pull, push, and build images. Default: airflow.
-r, --force-build-images
Forces building the local Docker images. The images are rebuilt
automatically for the first time or when changes are detected in
package-related files, but you can force it using this flag.
-R, --force-build-images-clean
Forces building images without cache. This removes the pulled or built images
and starts building images from scratch. This may take time.
-p, --force-pull-images
Forces pulling images from Docker Hub before building to populate the cache. The
images are pulled by default only for the first time you run the
environment. Later the locally built images are used from the cache.
-u, --push-images
Uploads the images to Docker Hub after rebuilding.
It is useful in case you use your own Docker Hub user to store images and you want
to build them locally. Note that you need to use 'docker login' before you upload images.
-c, --cleanup-images
Cleans up your local Docker cache of the Airflow Docker images. This does not reclaim space in the
Docker cache. You need to 'docker system prune' (optionally with --all) to reclaim that space.
Convenience Scripts
-------------------
Once you run ``./breeze`` you can also execute various actions via generated convenience scripts:
.. code-block::
Enter the environment : ./.build/cmd_run
Run command in the environment : ./.build/cmd_run "[command with args]" [bash options]
Run tests in the environment : ./.build/test_run [test-target] [nosetest options]
Run Docker compose command : ./.build/dc [help/pull/...] [docker-compose options]
Troubleshooting
===============
If you are having problems with the Breeze environment, try the steps below. After each step you
can check whether your problem is fixed.
1. If you are on macOS, check if you have enough disk space for Docker.
2. Stop Breeze with ``./breeze --stop-environment``.
3. Delete the ``.build`` directory and run ``./breeze --force-pull-images``.
4. `Clean up Docker images <#cleaning-up-the-images>`_.
5. Restart your Docker Engine and try again.
6. Restart your machine and try again.
7. Re-install Docker CE and try again.
In case the problems are not solved, you can set the VERBOSE variable to "true" (``export VERBOSE="true"``),
rerun the failed command, copy-and-paste the output from your terminal to the
`Airflow Slack <https://apache-airflow-slack.herokuapp.com/>`_ #troubleshooting channel and
add the problem description.
Fixing File/Directory Ownership
-------------------------------
On Linux there is a problem with propagating ownership of created files (a known Docker problem). Basically,
files and directories created in the container are not owned by the host user (but by the root user in our
case). This may prevent you from switching branches, for example, if files owned by the root user are
created within your sources. In case you are on a Linux host and have some files in your sources created
y the root user, you can fix the ownership of those files by running this script:
.. code-block::
./scripts/ci/local_ci_fix_ownership.sh