[AIRFLOW-XXXX] document system tests mechanism better (#7774)

This commit is contained in:
Jarek Potiuk 2020-03-20 11:00:10 +01:00 коммит произвёл GitHub
Родитель 759ce2a80c
Коммит 6b3b8a4268
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
1 изменённых файлов: 233 добавлений и 17 удалений

Просмотреть файл

@ -486,16 +486,58 @@ The system tests execute a specified example DAG file that runs the DAG end-to-e
See more details about adding new system tests below.
Running System Tests
--------------------
Environment for System Tests
----------------------------
**Prerequisites:** You may need to set some variables to run system tests. If you need to
add some intialization of environment variables to Breeze, you can always add a
add some initialization of environment variables to Breeze, you can add a
``variables.env`` file in the ``files/airflow-breeze-config/variables.env`` file. It will be automatically
sourced when entering the Breeze environment.
sourced when entering the Breeze environment. You can also add some additional
initialization commands in this file if you want to execute something
always at the time of entering Breeze.
There are several typical operations you might want to perform such as:
* generating a file with random value used across the whole Breeze session (this is useful if
you want to use this random number in names of resources that you create in your service
* generate variables that will be used as name of your resources
* decrypt any variables and resources you keep as encrypted in your configuration files
* install additional packages that are needed in case you are doing tests with 1.10.* Airflow series
(see below)
Example variables.env file is shown here (this is part of the variables.env file that is used to
run Google Cloud system tests.
.. code-block:: bash
# Build variables. This file is sourced by Breeze.
# Also it is sourced during continuous integration build in Cloud Build
# Auto-export all variables
set -a
echo
echo "Reading variables"
echo
# Generate random number that will be used across your session
RANDOM_FILE="/random.txt"
if [[ ! -f "${RANDOM_FILE}" ]]; then
echo "${RANDOM}" > "${RANDOM_FILE}"
fi
RANDOM_POSTFIX=$(cat "${RANDOM_FILE}")
# install any packages from dist folder if they are available
if [[ ${RUN_AIRFLOW_1_10:=} == "true" ]]; then
pip install /dist/apache_airflow_providers_{google,postgres,mysql}*.whl || true
fi
To execute system tests, specify the ``--systems SYSTEMS``
flag where ``SYSTEMS`` is a coma-separated list of systems to run the system tests for.
Forwarding Authentication from the Host
----------------------------------------------------
@ -525,15 +567,18 @@ tests whenever an operator/hook/sensor is added/modified in a given system.
``@pytest.mark.system(SYSTEM_NAME)`` marker. The system name should follow the path defined in
the ``providers`` package (for example, the system tests from ``tests.providers.google.cloud``
package should be marked with ``@pytest.mark.system("google.cloud")``.
* If your system tests need some credential files to be available for an
authentication with external systems, make sure to keep these credentials in the
``files/airflow-breeze-config/keys`` directory. Mark your tests with
``@pytest.mark.credential_file(<FILE>)`` so that they are skipped if such a credential file is not there.
The tests should read the right credentials and authenticate on their own. The credentials are read
in Breeze from the ``/files`` directory. The local "files" folder is mounted to the "/files" folder in Breeze.
* If your system tests are long-lasting ones (i.e., require more than 20-30 minutes
to complete), mark them with the ```@pytest.markers.long_running`` marker.
Such tests are skipped by default unless you specify the ``--long-lasting`` flag to pytest.
Such tests are skipped by default unless you specify the ``--long-running`` flag to pytest.
* The system test itself (python class) does not have any logic. Such a test runs
the DAG specified by its ID. This DAG should contain the actual DAG logic
to execute. Make sure to define the DAG in ``providers/<SYSTEM_NAME>/example_dags``. These example DAGs
@ -543,31 +588,202 @@ tests whenever an operator/hook/sensor is added/modified in a given system.
the system class will take care about running the DAG. Note that the DAG_FOLDER should be
a subdirectory of the ``tests.test_utils.AIRFLOW_MAIN_FOLDER`` + ``providers/<SYSTEM_NAME>/example_dags``.
An example of a system test is available in:
``airflow.tests.providers.google.operators.test_natunal_language_system.CloudNaturalLanguageExampleDagsTest``.
A simple example of a system test is available in:
It runs the DAG defined in ``airflow.providers.google.cloud.example_dags.example_natural_language.py``.
``tests/providers/google/cloud/operators/test_compute_system.py``.
Running Tests for Older Airflow Versions
----------------------------------------
It runs two DAGs defined in ``airflow.providers.google.cloud.example_dags.example_compute.py`` and
``airflow.providers.google.cloud.example_dags.example_compute_igm.py``.
Preparing backport packages for System Tests for Airflow 1.10.* series
----------------------------------------------------------------------
In order to run system tests with old Airflow version you need to prepare backport packages. This
can be done by running ``./scripts/ci/ci_prepare_bacport_packages.sh <PACKAGES TO BUILD>``. For
example the below command will build google postgres and mysql packages:
.. code-block:: bash
./scripts/ci/ci_prepare_bacport_packages.sh google postgres mysql
Those packages will be prepared in ./dist folder. This folder is mapped to /dist folder
when you enter Breeze, so it is easy to automate installing of those packages for testing.
Installing backported for Airflow 1.10.* series
-----------------------------------------------
The tests can be executed against the master version of Airflow but they also work
with older versions. This is especially useful to test back-ported operators
from Airflow 2.0 to 1.10.* versions.
To run the tests for Airflow 1.10.* series, you need to run Breeze with
``--install-airflow-version==<VERSION>`` to install a different version of Airflow.
``--install-airflow-version=<VERSION>`` to install a different version of Airflow.
If ``current`` is specified (default), then the current version of Airflow is used.
Otherwise, the released version of Airflow is installed.
The commands make sure that the source version of master Airflow is removed and the released version of
Airflow from ``Pypi`` is installed. Note that tests sources are not removed and they can be used
to run tests (unit tests and system tests) against the freshly installed version.
The ``-install-airflow-version=<VERSION>`` command make sure that the current (from sources) version of
Airflow is removed and the released version of Airflow from ``Pypi`` is installed. Note that tests sources
are not removed and they can be used to run tests (unit tests and system tests) against the
freshly installed version.
You should automate installing of the backport packages in your own
``./files/airflow-breeze-config/variables.env`` file. You should make it depend on
``RUN_AIRFLOW_1_10`` variable value equals to "true" so that
installation of backport packages is only performed when you install airflow 1.10.*.
The backport packages are available in ``/dist`` directory if they were prepared as described
in the previous chapter.
Typically the command in you variables.env file will be similar to:
.. code-block:: bash
# install any packages from dist folder if they are available
if [[ ${RUN_AIRFLOW_1_10:=} == "true" ]]; then
pip install /dist/apache_airflow_providers_{google,postgres,mysql}*.whl || true
fi
The command above will automatically install backported google, postgres amd mysql packages if they
were prepared before entering breeze.
Running system tests for backported packages in Airflow 1.10.* series
---------------------------------------------------------------------
Once you installed 1.10.* Airflow version with ``--install-airflow-version`` and prepared and
installed the required packages via ``variables.env`` it should be as easy as running
``pytest --systems=<SYSTEM_NAME> TEST_NAME``. Note that we have default timeout for runnning
system tests set to 8 minutes and some system tests might take much longer to run and you might
want to add ``-o faulthandler_timeout=2400`` (2400s = 40 minutes for example) to your
pytest command.
Typical system test session
---------------------------
Here is the typical session that you need to do to run system tests:
1. Prepare backport packages
.. code-block:: bash
./scripts/ci/ci_prepare_bacport_packages.sh google postgres mysql
2. Enter breeze with installing Airflow 1.10.*, forwarding credentials and installing
backported packages (you need appropriate line in ``./files/airflow-breeze-config/variables.env``)
.. code-block:: bash
./breeze --install-airflow-version 1.10.9 --python 3.6 --db-reset --forward-credentials restart
This will:
* install Airflow 1.10.9
* restarts the whole environment (i.e. recreates metadata database from the scratch)
* run Breeze with python 3.6 version
* reset the Airflow database
* forward your local credentials to Breeze
3. Run the tests:
.. code-block:: bash
pytest -o faulthandler_timeout=2400 --systems=google tests/providers/google/cloud/operators/test_compute_system.py
Iteration with System Tests if your resources are slow to create
----------------------------------------------------------------
When you want to iterate on system tests, you might want to create slow resources first.
If you need to setup some external resources for your tests (for example compute instances in Google Cloud)
you should set them up and teardown in the setUp/tearDown methods of your tests.
Since those resources might be slow to create you might want to add some helpers that
set them up and tear them down separately via manual operations. This way you can iterate on
the tests without waiting for setUp and tearDown with every test.
In this case you should build in a mechanism to skip setUp and tearDown in case you manually
created the resources. Somewhat complex example of that can be found in
``tests.providers.google.cloud.operators.test_cloud_sql_system.py`` and the helper is
available in ``tests.providers.google.cloud.operators.test_cloud_sql_system_helper.py``.
When the helper is run with ``--action create`` to create cloud sql instances which are very slow
to create and set-up so that you can iterate on running the system tests without
loosing the time for creating theme every time. A temporary file is created to prevent from
setting up and tearing down the instances when running the test.
This example also shows how you can use the random number generated at the entry of Breeze if you
have it in your variables.env (see previous chapter). In case of Cloud SQL you cannot reuse the
same instance name for a week so we generate random number that is used across the whole session
and store it in ``/random.txt`` file so that the names are unique during tests.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Important !!!!!!!!!!!!!!!!!!!!!!!!!!!!
Do not forget to delete manually created resources before leaving the
Breeze session. They are usually expensive to run.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Important !!!!!!!!!!!!!!!!!!!!!!!!!!!!
Note that in case you have to update your backported operators or system tests (they are part of
the backport packageS) you need to rebuild the packages outside of breeze and
``pip remove/pip install`` those packages in order to get them installed. This is not needed
if you run system tests with ``current`` airflow version, so it is better to iterate with the
system tests with the ``current`` version and fix all problems there and only afterwards run
the tests with Airflow 1.10.*
Typical session then looks as follows:
1. Prepare backport packages
.. code-block:: bash
./scripts/ci/ci_prepare_bacport_packages.sh google postgres mysql
2. Enter breeze with installing Airflow 1.10.*, forwarding credentials and installing
backported packages (you need appropriate line in ``./files/airflow-breeze-config/variables.env``)
.. code-block:: bash
./breeze --install-airflow-version 1.10.9 --python 3.6 --db-reset --forward-credentials restart
3. Run create action in helper (to create slowly created resources):
.. code-block:: bash
python tests/providers/google/cloud/operators/test_cloud_sql_system_helper.py --action create
4. Run the tests:
.. code-block:: bash
pytest -o faulthandler_timeout=2400 --systems=google tests/providers/google/cloud/operators/test_compute_system.py
5. In case you are running backport packages tests you need to rebuild and reinstall a package
every time you change the operators/hooks or example_dags. The example below shows reinstallation
of the google package:
In the host:
.. code-block:: bash
./scripts/ci/ci_prepare_backport_packages.sh google
In the container:
.. code-block:: bash
pip uninstall apache-airflow-providers-google
pip install /dist/apache_airflow_providers_google-*.whl
The points 4. and 5. can be repeated multiple times without leaving the container
6. Run delete action in helper:
.. code-block:: bash
python tests/providers/google/cloud/operators/test_cloud_sql_system_helper.py --action delete
This works best for system tests: all the system tests should work for at least latest released 1.10.x
Airflow version. Some of the unit and integration tests might also work in the same
fashion but it is not necessary or expected.
Local and Remote Debugging in IDE
=================================