2020-04-02 20:52:11 +03:00
|
|
|
|
.. Licensed to the Apache Software Foundation (ASF) under one
|
|
|
|
|
or more contributor license agreements. See the NOTICE file
|
|
|
|
|
distributed with this work for additional information
|
|
|
|
|
regarding copyright ownership. The ASF licenses this file
|
|
|
|
|
to you under the Apache License, Version 2.0 (the
|
|
|
|
|
"License"); you may not use this file except in compliance
|
|
|
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
|
|
|
|
|
|
.. http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
|
|
|
|
|
.. Unless required by applicable law or agreed to in writing,
|
|
|
|
|
software distributed under the License is distributed on an
|
|
|
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
|
|
|
KIND, either express or implied. See the License for the
|
|
|
|
|
specific language governing permissions and limitations
|
|
|
|
|
under the License.
|
|
|
|
|
|
|
|
|
|
.. contents:: :local:
|
|
|
|
|
|
|
|
|
|
Airflow docker images
|
|
|
|
|
=====================
|
|
|
|
|
|
|
|
|
|
Airflow has two images (build from Dockerfiles):
|
|
|
|
|
|
2020-10-10 13:58:09 +03:00
|
|
|
|
* Production image (Dockerfile) - that can be used to build your own production-ready Airflow installation
|
|
|
|
|
You can read more about building and using the production image in the
|
2020-12-07 03:05:21 +03:00
|
|
|
|
`Production Deployments <https://airflow.apache.org/docs/apache-airflow/stable/production-deployment.html>`_ document.
|
|
|
|
|
The image is built using `Dockerfile <Dockerfile>`_
|
2020-10-10 13:58:09 +03:00
|
|
|
|
|
|
|
|
|
* CI image (Dockerfile.ci) - used for running tests and local development. The image is built using
|
|
|
|
|
`Dockerfile.ci <Dockerfile.ci>`_
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
|
|
|
|
Image naming conventions
|
|
|
|
|
========================
|
|
|
|
|
|
|
|
|
|
The images are named as follows:
|
|
|
|
|
|
|
|
|
|
``apache/airflow:<BRANCH_OR_TAG>-python<PYTHON_MAJOR_MINOR_VERSION>[-ci][-manifest]``
|
|
|
|
|
|
|
|
|
|
where:
|
|
|
|
|
|
2020-12-26 13:20:47 +03:00
|
|
|
|
* ``BRANCH_OR_TAG`` - branch or tag used when creating the image. Examples: ``master``,
|
|
|
|
|
``v2-0-test``, ``v1-10-test``, ``2.0.0``. The ``master``, ``v1-10-test`` ``v2-0-test`` labels are
|
|
|
|
|
built from branches so they change over time. The ``1.10.*`` and ``2.*`` labels are built from git tags
|
|
|
|
|
and they are "fixed" once built.
|
|
|
|
|
* ``PYTHON_MAJOR_MINOR_VERSION`` - version of python used to build the image. Examples: ``3.6``, ``3.7``,
|
|
|
|
|
``3.8``
|
2020-04-02 20:52:11 +03:00
|
|
|
|
* The ``-ci`` suffix is added for CI images
|
|
|
|
|
* The ``-manifest`` is added for manifest images (see below for explanation of manifest images)
|
|
|
|
|
|
2020-11-08 13:20:31 +03:00
|
|
|
|
We also store (to increase speed of local build/pulls) python images that were used to build
|
|
|
|
|
the CI images. Each CI image, when built uses current python version of the base images. Those
|
|
|
|
|
python images are regularly updated (with bugfixes/security fixes), so for example python3.8 from
|
|
|
|
|
last week might be a different image than python3.8 today. Therefore whenever we push CI image
|
|
|
|
|
to airflow repository, we also push the python image that was used to build it this image is stored
|
2020-12-26 13:20:47 +03:00
|
|
|
|
as ``apache/airflow:python<PYTHON_MAJOR_MINOR_VERSION>-<BRANCH_OR_TAG>``.
|
2020-11-08 13:20:31 +03:00
|
|
|
|
|
|
|
|
|
Since those are simply snapshots of the existing python images, DockerHub does not create a separate
|
|
|
|
|
copy of those images - all layers are mounted from the original python images and those are merely
|
|
|
|
|
labels pointing to those.
|
|
|
|
|
|
2020-04-02 20:52:11 +03:00
|
|
|
|
Building docker images
|
|
|
|
|
======================
|
|
|
|
|
|
|
|
|
|
The easiest way to build those images is to use `<BREEZE.rst>`_.
|
|
|
|
|
|
|
|
|
|
Note! Breeze by default builds production image from local sources. You can change it's behaviour by
|
|
|
|
|
providing ``--install-airflow-version`` parameter, where you can specify the
|
2020-10-21 15:32:41 +03:00
|
|
|
|
tag/branch used to download Airflow package from in GitHub repository. You can
|
2020-04-15 14:05:02 +03:00
|
|
|
|
also change the repository itself by adding ``--dockerhub-user`` and ``--dockerhub-repo`` flag values.
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
|
|
|
|
You can build the CI image using this command:
|
|
|
|
|
|
2020-06-16 13:36:46 +03:00
|
|
|
|
.. code-block:: bash
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
|
|
|
|
./breeze build-image
|
|
|
|
|
|
|
|
|
|
You can build production image using this command:
|
|
|
|
|
|
2020-06-16 13:36:46 +03:00
|
|
|
|
.. code-block:: bash
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
|
|
|
|
./breeze build-image --production-image
|
|
|
|
|
|
|
|
|
|
By adding ``--python <PYTHON_MAJOR_MINOR_VERSION>`` parameter you can build the
|
|
|
|
|
image version for the chosen python version.
|
|
|
|
|
|
|
|
|
|
The images are build with default extras - different extras for CI and production image and you
|
2020-09-21 20:31:42 +03:00
|
|
|
|
can change the extras via the ``--extras`` parameters and add new ones with ``--additional-extras``.
|
|
|
|
|
You can see default extras used via ``./breeze flags``.
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
|
|
|
|
For example if you want to build python 3.7 version of production image with
|
|
|
|
|
"all" extras installed you should run this command:
|
|
|
|
|
|
2020-06-16 13:36:46 +03:00
|
|
|
|
.. code-block:: bash
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
|
|
|
|
./breeze build-image --python 3.7 --extras "all" --production-image
|
|
|
|
|
|
2020-09-21 20:31:42 +03:00
|
|
|
|
If you just want to add new extras you can add them like that:
|
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
|
|
./breeze build-image --python 3.7 --additional-extras "all" --production-image
|
|
|
|
|
|
2020-04-02 20:52:11 +03:00
|
|
|
|
The command that builds the CI image is optimized to minimize the time needed to rebuild the image when
|
|
|
|
|
the source code of Airflow evolves. This means that if you already have the image locally downloaded and
|
|
|
|
|
built, the scripts will determine whether the rebuild is needed in the first place. Then the scripts will
|
|
|
|
|
make sure that minimal number of steps are executed to rebuild parts of the image (for example,
|
|
|
|
|
PIP dependencies) and will give you an image consistent with the one used during Continuous Integration.
|
|
|
|
|
|
|
|
|
|
The command that builds the production image is optimised for size of the image.
|
|
|
|
|
|
|
|
|
|
In Breeze by default, the airflow is installed using local sources of Apache Airflow.
|
|
|
|
|
|
|
|
|
|
You can also build production images from PIP packages via providing ``--install-airflow-version``
|
|
|
|
|
parameter to Breeze:
|
|
|
|
|
|
2020-06-16 13:36:46 +03:00
|
|
|
|
.. code-block:: bash
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
2020-09-21 20:31:42 +03:00
|
|
|
|
./breeze build-image --python 3.7 --additional-extras=presto \
|
2020-12-26 13:20:47 +03:00
|
|
|
|
--production-image --install-airflow-version=2.0.0
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
2020-12-05 21:53:09 +03:00
|
|
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
|
|
|
|
On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
|
2021-01-19 13:46:06 +03:00
|
|
|
|
might work with Apache Airflow as of 20.3.3, but it might lead to errors in installation. It might
|
|
|
|
|
depend on your choice of extras. In order to install Airflow you might need to either downgrade
|
|
|
|
|
pip to version 20.2.4 ``pip install --upgrade pip==20.2.4`` or, in case you use Pip 20.3,
|
|
|
|
|
you need to add option ``--use-deprecated legacy-resolver`` to your pip install command.
|
|
|
|
|
|
|
|
|
|
While ``pip 20.3.3`` solved most of the ``teething`` problems of 20.3, this note will remain here until we
|
|
|
|
|
set ``pip 20.3`` as official version in our CI pipeline where we are testing the installation as well.
|
|
|
|
|
Due to those constraints, only ``pip`` installation is currently officially supported.
|
|
|
|
|
|
|
|
|
|
While they are some successes with using other tools like `poetry <https://python-poetry.org/>`_ or
|
|
|
|
|
`pip-tools <https://pypi.org/project/pip-tools/>`_, they do not share the same workflow as
|
|
|
|
|
``pip`` - especially when it comes to constraint vs. requirements management.
|
|
|
|
|
Installing via ``Poetry`` or ``pip-tools`` is not currently supported.
|
|
|
|
|
|
|
|
|
|
If you wish to install airflow using those tools you should use the constraint files and convert
|
|
|
|
|
them to appropriate format and workflow that your tool requires.
|
2020-12-05 21:53:09 +03:00
|
|
|
|
|
|
|
|
|
|
2020-04-02 20:52:11 +03:00
|
|
|
|
This will build the image using command similar to:
|
|
|
|
|
|
2020-06-16 13:36:46 +03:00
|
|
|
|
.. code-block:: bash
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
2020-09-21 20:31:42 +03:00
|
|
|
|
pip install \
|
2020-12-26 13:20:47 +03:00
|
|
|
|
apache-airflow[async,amazon,celery,cncf.kubernetes,docker,dask,elasticsearch,ftp,grpc,hashicorp,http,ldap,google,microsoft.azure,mysql,postgres,redis,sendgrid,sftp,slack,ssh,statsd,virtualenv]==2.0.0 \
|
|
|
|
|
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.0.0/constraints-3.6.txt"
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
|
|
|
|
You can also build production images from specific Git version via providing ``--install-airflow-reference``
|
2020-07-20 15:36:03 +03:00
|
|
|
|
parameter to Breeze (this time constraints are taken from the ``constraints-master`` branch which is the
|
|
|
|
|
HEAD of development for constraints):
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
2020-06-16 13:36:46 +03:00
|
|
|
|
.. code-block:: bash
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
2020-07-20 15:36:03 +03:00
|
|
|
|
pip install "https://github.com/apache/airflow/archive/<tag>.tar.gz#egg=apache-airflow" \
|
|
|
|
|
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt"
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
2020-12-07 01:36:33 +03:00
|
|
|
|
You can also skip installing airflow by providing ``--install-airflow-version none`` parameter to Breeze:
|
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
|
|
./breeze build-image --python 3.7 --additional-extras=presto \
|
|
|
|
|
--production-image --install-airflow-version=none --install-from-local-files-when-building
|
|
|
|
|
|
|
|
|
|
In this case you usually install airflow and all packages in ``docker-context-files`` folder.
|
|
|
|
|
|
2020-06-16 13:36:46 +03:00
|
|
|
|
Using cache during builds
|
|
|
|
|
=========================
|
|
|
|
|
|
2020-06-28 18:38:17 +03:00
|
|
|
|
Default mechanism used in Breeze for building CI images uses images pulled from DockerHub or
|
|
|
|
|
GitHub Image Registry. This is done to speed up local builds and CI builds - instead of 15 minutes
|
2020-06-16 13:36:46 +03:00
|
|
|
|
for rebuild of CI images, it takes usually less than 3 minutes when cache is used. For CI builds this is
|
2020-06-28 18:38:17 +03:00
|
|
|
|
usually the best strategy - to use default "pull" cache. This is default strategy when
|
|
|
|
|
`<BREEZE.rst>`_ builds are performed.
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
2020-06-28 18:38:17 +03:00
|
|
|
|
For Production Image - which is far smaller and faster to build, it's better to use local build cache (the
|
|
|
|
|
standard mechanism that docker uses. This is the default strategy for production images when
|
|
|
|
|
`<BREEZE.rst>`_ builds are performed. The first time you run it, it will take considerably longer time than
|
|
|
|
|
if you use the pull mechanism, but then when you do small, incremental changes to local sources,
|
|
|
|
|
Dockerfile image= and scripts further rebuilds with local build cache will be considerably faster.
|
|
|
|
|
|
|
|
|
|
You can also disable build cache altogether. This is the strategy used by the scheduled builds in CI - they
|
|
|
|
|
will always rebuild all the images from scratch.
|
|
|
|
|
|
|
|
|
|
You can change the strategy by providing one of the ``--build-cache-local``, ``--build-cache-pulled`` or
|
|
|
|
|
even ``--build-cache-disabled`` flags when you run Breeze commands. For example:
|
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
|
|
./breeze build-image --python 3.7 --build-cache-local
|
|
|
|
|
|
|
|
|
|
Will build the CI image using local build cache (note that it will take quite a long time the first
|
|
|
|
|
time you run it).
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
2020-06-28 18:38:17 +03:00
|
|
|
|
./breeze build-image --python 3.7 --production-image --build-cache-pulled
|
|
|
|
|
|
|
|
|
|
Will build the production image with pulled images as cache.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
|
|
./breeze build-image --python 3.7 --production-image --build-cache-disabled
|
|
|
|
|
|
|
|
|
|
Will build the production image from the scratch.
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
2020-06-28 18:38:17 +03:00
|
|
|
|
You can also turn local docker caching by setting ``DOCKER_CACHE`` variable to "local", "pulled",
|
|
|
|
|
"disabled" and exporting it.
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
|
|
export DOCKER_CACHE="local"
|
|
|
|
|
|
2020-06-28 18:38:17 +03:00
|
|
|
|
or
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
2020-06-28 18:38:17 +03:00
|
|
|
|
export DOCKER_CACHE="disabled"
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Choosing image registry
|
|
|
|
|
=======================
|
|
|
|
|
|
|
|
|
|
By default images are pulled and pushed from and to DockerHub registry when you use Breeze's push-image
|
|
|
|
|
or build commands.
|
|
|
|
|
|
|
|
|
|
Our images are named like that:
|
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
2020-12-26 13:20:47 +03:00
|
|
|
|
apache/airflow:<BRANCH_OR_TAG>-pythonX.Y - for production images
|
|
|
|
|
apache/airflow:<BRANCH_OR_TAG>-pythonX.Y-ci - for CI images
|
|
|
|
|
apache/airflow:<BRANCH_OR_TAG>-pythonX.Y-build - for production build stage
|
|
|
|
|
apache/airflow:pythonX.Y-<BRANCH_OR_TAG> - for python base image used for both CI and PROD image
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
|
|
|
|
For example:
|
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
|
|
apache/airflow:master-python3.6 - production "latest" image from current master
|
|
|
|
|
apache/airflow:master-python3.6-ci - CI "latest" image from current master
|
2020-12-26 13:20:47 +03:00
|
|
|
|
apache/airflow:v2-0-test-python2.7-ci - CI "latest" image from current v2-0-test branch
|
|
|
|
|
apache/airflow:2.0.0-python3.6 - production image for 2.0.0 release
|
|
|
|
|
apache/airflow:python3.6-master - base python image for the master branch
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
2021-01-19 01:15:13 +03:00
|
|
|
|
You can see DockerHub images at `<https://hub.docker.com/r/apache/airflow>`_
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
2021-01-21 18:16:09 +03:00
|
|
|
|
Using GitHub registries as build cache
|
|
|
|
|
--------------------------------------
|
|
|
|
|
|
2020-06-16 13:36:46 +03:00
|
|
|
|
By default DockerHub registry is used when you push or pull such images.
|
|
|
|
|
However for CI builds we keep the images in GitHub registry as well - this way we can easily push
|
|
|
|
|
the images automatically after merge requests and use such images for Pull Requests
|
|
|
|
|
as cache - which makes it much it much faster for CI builds (images are available in cache
|
|
|
|
|
right after merged request in master finishes it's build), The difference is visible especially if
|
|
|
|
|
significant changes are done in the Dockerfile.CI.
|
|
|
|
|
|
|
|
|
|
The images are named differently (in Docker definition of image names - registry URL is part of the
|
|
|
|
|
image name if DockerHub is not used as registry). Also GitHub has its own structure for registries
|
|
|
|
|
each project has its own registry naming convention that should be followed. The name of
|
2021-01-21 18:16:09 +03:00
|
|
|
|
images for GitHub registry are different as they must follow limitation of the registry used.
|
|
|
|
|
|
|
|
|
|
We are still using Github Packages as registry, but we are in the process of testing and switching
|
|
|
|
|
to GitHub Container Registry, and the naming conventions are slightly different (GitHub Packages
|
|
|
|
|
required all packages to have "organization/repository/" URL prefix ("apache/airflow/",
|
|
|
|
|
where in GitHub Container Registry, all images are in "organization" not in "repository" and they are all
|
|
|
|
|
in organization wide "apache/" namespace rather than in "apache/airflow/" one).
|
|
|
|
|
We are adding "airflow-" as prefix for image names of all Airflow images instead.
|
|
|
|
|
The images are linked to the repository via ``org.opencontainers.image.source`` label in the image.
|
|
|
|
|
|
|
|
|
|
Naming convention for GitHub Packages
|
|
|
|
|
-------------------------------------
|
|
|
|
|
|
|
|
|
|
Images built as "Run ID snapshot":
|
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
|
|
docker.pkg.github.com.io/apache-airflow/<BRANCH>-pythonX.Y-ci-v2:<RUNID> - for CI images
|
|
|
|
|
docker.pkg.github.com/apache-airflow/<BRANCH>-pythonX.Y-v2:<RUNID> - for production images
|
|
|
|
|
docker.pkg.github.com/apache-airflow/<BRANCH>-pythonX.Y-build-v2:<RUNID> - for production build stage
|
|
|
|
|
docker.pkg.github.com/apache-airflow/pythonX.Y-<BRANCH>-v2:X.Y-slim-buster-<RUN_ID> - for base python images
|
|
|
|
|
|
|
|
|
|
Latest images (pushed when master merge succeeds):
|
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
|
|
docker.pkg.github.com/apache/airflow/<BRANCH>-pythonX.Y-ci-v2:latest - for CI images
|
|
|
|
|
docker.pkg.github.com/apache/airflow/<BRANCH>-pythonX.Y-v2:latest - for production images
|
|
|
|
|
docker.pkg.github.com/apache/airflow/<BRANCH>-pythonX.Y-build-v2:latest - for production build stage
|
|
|
|
|
docker.pkg.github.com/apache/airflow/python-<BRANCH>-v1:X.Y-slim-buster - for base python images
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Naming convention for GitHub Container Registry
|
|
|
|
|
-----------------------------------------------
|
|
|
|
|
|
|
|
|
|
Images built as "Run ID snapshot":
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
2021-01-21 18:16:09 +03:00
|
|
|
|
ghcr.io/apache/airflow-<BRANCH>-pythonX.Y-ci-v2:<RUNID> - for CI images
|
|
|
|
|
ghcr.io/apache/airflow-<BRANCH>-pythonX.Y-v2:<RUNID> - for production images
|
|
|
|
|
ghcr.io/apache/airflow-<BRANCH>-pythonX.Y-build-v2:<RUNID> - for production build stage
|
|
|
|
|
ghcr.io/apache/airflow-pythonX.Y-<BRANCH>-v2:X.Y-slim-buster-<RUN_ID> - for base python images
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
2021-01-21 18:16:09 +03:00
|
|
|
|
Latest images (pushed when master merge succeeds):
|
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
|
|
ghcr.io/apache/airflow-<BRANCH>-pythonX.Y-ci-v2:latest - for CI images
|
|
|
|
|
ghcr.io/apache/airflow-<BRANCH>-pythonX.Y-v2:latest - for production images
|
|
|
|
|
ghcr.io/apache/airflow-<BRANCH>-pythonX.Y-build-v2:latest - for production build stage
|
|
|
|
|
ghcr.io/apache/airflow-python-<BRANCH>-v2:X.Y-slim-buster - for base python images
|
|
|
|
|
|
|
|
|
|
Note that we never push or pull "release" images to GitHub registry. It is only used for CI builds
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
|
|
|
|
You can see all the current GitHub images at `<https://github.com/apache/airflow/packages>`_
|
|
|
|
|
|
2021-01-21 18:16:09 +03:00
|
|
|
|
|
|
|
|
|
In order to interact with the GitHub images you need to add ``--use-github-registry`` flag to the pull/push
|
2020-06-16 13:36:46 +03:00
|
|
|
|
commands in Breeze. This way the images will be pulled/pushed from/to GitHub rather than from/to
|
|
|
|
|
DockerHub. Images are build locally as ``apache/airflow`` images but then they are tagged with the right
|
2021-01-21 18:16:09 +03:00
|
|
|
|
GitHub tags for you. You can also specify ``--github-registry`` option and choose which of the
|
|
|
|
|
GitHub registries are used (``docker.pkg.github.com`` chooses GitHub Packages and ``ghcr.io`` chooses
|
|
|
|
|
GitHub Container Registry).
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
|
|
|
|
You can read more about the CI configuration and how CI builds are using DockerHub/GitHub images
|
|
|
|
|
in `<CI.rst>`_.
|
|
|
|
|
|
|
|
|
|
Note that you need to be committer and have the right to push to DockerHub and GitHub and you need to
|
2021-01-21 18:16:09 +03:00
|
|
|
|
be logged in. Only committers can push images directly. You need to login with your
|
|
|
|
|
Personal Access Token with "packages" scope to be able to push to those repositories or pull from them
|
|
|
|
|
in case of GitHub Packages.
|
|
|
|
|
|
|
|
|
|
GitHub Packages:
|
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
|
|
docker login docker.pkg.github.com
|
|
|
|
|
|
|
|
|
|
GitHub Container Registry
|
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
|
|
docker login ghcr.io
|
|
|
|
|
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
2020-04-02 20:52:11 +03:00
|
|
|
|
Technical details of Airflow images
|
|
|
|
|
===================================
|
|
|
|
|
|
2020-04-21 18:27:09 +03:00
|
|
|
|
The CI image is used by Breeze as shell image but it is also used during CI build.
|
2020-04-02 20:52:11 +03:00
|
|
|
|
The image is single segment image that contains Airflow installation with "all" dependencies installed.
|
2020-06-16 13:36:46 +03:00
|
|
|
|
It is optimised for rebuild speed. It installs PIP dependencies from the current branch first -
|
|
|
|
|
so that any changes in setup.py do not trigger reinstalling of all dependencies.
|
|
|
|
|
There is a second step of installation that re-installs the dependencies
|
2020-04-02 20:52:11 +03:00
|
|
|
|
from the latest sources so that we are sure that latest dependencies are installed.
|
|
|
|
|
|
|
|
|
|
The production image is a multi-segment image. The first segment "airflow-build-image" contains all the
|
|
|
|
|
build essentials and related dependencies that allow to install airflow locally. By default the image is
|
2020-09-15 21:49:27 +03:00
|
|
|
|
build from a released version of Airflow from GitHub, but by providing some extra arguments you can also
|
2020-04-02 20:52:11 +03:00
|
|
|
|
build it from local sources. This is particularly useful in CI environment where we are using the image
|
|
|
|
|
to run Kubernetes tests. See below for the list of arguments that should be provided to build
|
|
|
|
|
production image from the local sources.
|
|
|
|
|
|
2020-06-16 13:36:46 +03:00
|
|
|
|
The image is primarily optimised for size of the final image, but also for speed of rebuilds - the
|
2020-07-12 11:12:56 +03:00
|
|
|
|
'airflow-build-image' segment uses the same technique as the CI builds for pre-installing PIP dependencies.
|
2020-10-21 15:32:41 +03:00
|
|
|
|
It first pre-installs them from the right GitHub branch and only after that final airflow installation is
|
|
|
|
|
done from either local sources or remote location (PIP or GitHub repository).
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
2020-09-29 16:30:00 +03:00
|
|
|
|
Customizing the image
|
|
|
|
|
.....................
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
2020-09-29 16:30:00 +03:00
|
|
|
|
Customizing the image is an alternative way of adding your own dependencies to the image.
|
|
|
|
|
|
|
|
|
|
The easiest way to build the image image is to use ``breeze`` script, but you can also build such customized
|
|
|
|
|
image by running appropriately crafted docker build in which you specify all the ``build-args``
|
|
|
|
|
that you need to add to customize it. You can read about all the args and ways you can build the image
|
|
|
|
|
in the `<#ci-image-build-arguments>`_ chapter below.
|
|
|
|
|
|
|
|
|
|
Here just a few examples are presented which should give you general understanding of what you can customize.
|
|
|
|
|
|
2020-12-26 13:20:47 +03:00
|
|
|
|
This builds the production image in version 3.7 with additional airflow extras from 2.0.0 PyPI package and
|
2020-09-29 16:30:00 +03:00
|
|
|
|
additional apt dev and runtime dependencies.
|
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
|
|
docker build . -f Dockerfile.ci \
|
|
|
|
|
--build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
|
|
|
|
|
--build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \
|
2020-12-12 14:01:58 +03:00
|
|
|
|
--build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \
|
2020-12-26 13:20:47 +03:00
|
|
|
|
--build-arg AIRFLOW_VERSION="2.0.0" \
|
|
|
|
|
--build-arg AIRFLOW_INSTALL_VERSION="==2.0.0" \
|
|
|
|
|
--build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2-0" \
|
2020-09-29 16:30:00 +03:00
|
|
|
|
--build-arg AIRFLOW_SOURCES_FROM="empty" \
|
|
|
|
|
--build-arg AIRFLOW_SOURCES_TO="/empty" \
|
|
|
|
|
--build-arg ADDITIONAL_AIRFLOW_EXTRAS="jdbc"
|
|
|
|
|
--build-arg ADDITIONAL_PYTHON_DEPS="pandas"
|
|
|
|
|
--build-arg ADDITIONAL_DEV_APT_DEPS="gcc g++"
|
|
|
|
|
--build-arg ADDITIONAL_RUNTIME_APT_DEPS="default-jre-headless"
|
|
|
|
|
--tag my-image
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
the same image can be built using ``breeze`` (it supports auto-completion of the options):
|
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
|
|
./breeze build-image -f Dockerfile.ci \
|
2020-12-26 13:20:47 +03:00
|
|
|
|
--production-image --python 3.7 --install-airflow-version=2.0.0 \
|
2020-09-29 16:30:00 +03:00
|
|
|
|
--additional-extras=jdbc --additional-python-deps="pandas" \
|
|
|
|
|
--additional-dev-apt-deps="gcc g++" --additional-runtime-apt-deps="default-jre-headless"
|
2020-04-02 20:52:11 +03:00
|
|
|
|
You can build the default production image with standard ``docker build`` command but they will only build
|
|
|
|
|
default versions of the image and will not use the dockerhub versions of images as cache.
|
|
|
|
|
|
|
|
|
|
|
2020-09-29 16:30:00 +03:00
|
|
|
|
You can customize more aspects of the image - such as additional commands executed before apt dependencies
|
|
|
|
|
are installed, or adding extra sources to install your dependencies from. You can see all the arguments
|
|
|
|
|
described below but here is an example of rather complex command to customize the image
|
|
|
|
|
based on example in `this comment <https://github.com/apache/airflow/issues/8605#issuecomment-690065621>`_:
|
|
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
|
|
docker build . -f Dockerfile.ci \
|
|
|
|
|
--build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
|
|
|
|
|
--build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \
|
2020-12-12 14:01:58 +03:00
|
|
|
|
--build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \
|
2020-12-26 13:20:47 +03:00
|
|
|
|
--build-arg AIRFLOW_VERSION="2.0.0" \
|
|
|
|
|
--build-arg AIRFLOW_INSTALL_VERSION="==2.0.0" \
|
|
|
|
|
--build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2-0" \
|
2020-09-29 16:30:00 +03:00
|
|
|
|
--build-arg AIRFLOW_SOURCES_FROM="empty" \
|
|
|
|
|
--build-arg AIRFLOW_SOURCES_TO="/empty" \
|
|
|
|
|
--build-arg ADDITIONAL_AIRFLOW_EXTRAS="slack" \
|
|
|
|
|
--build-arg ADDITIONAL_PYTHON_DEPS="apache-airflow-backport-providers-odbc \
|
|
|
|
|
azure-storage-blob \
|
|
|
|
|
sshtunnel \
|
|
|
|
|
google-api-python-client \
|
|
|
|
|
oauth2client \
|
|
|
|
|
beautifulsoup4 \
|
|
|
|
|
dateparser \
|
|
|
|
|
rocketchat_API \
|
|
|
|
|
typeform" \
|
|
|
|
|
--build-arg ADDITIONAL_DEV_APT_DEPS="msodbcsql17 unixodbc-dev g++" \
|
|
|
|
|
--build-arg ADDITIONAL_DEV_APT_COMMAND="curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add --no-tty - && curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list" \
|
|
|
|
|
--build-arg ADDITIONAL_DEV_ENV_VARS="ACCEPT_EULA=Y" \
|
|
|
|
|
--build-arg ADDITIONAL_RUNTIME_APT_COMMAND="curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add --no-tty - && curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list" \
|
|
|
|
|
--build-arg ADDITIONAL_RUNTIME_APT_DEPS="msodbcsql17 unixodbc git procps vim" \
|
|
|
|
|
--build-arg ADDITIONAL_RUNTIME_ENV_VARS="ACCEPT_EULA=Y" \
|
|
|
|
|
--tag my-image
|
|
|
|
|
|
|
|
|
|
CI image build arguments
|
|
|
|
|
........................
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
|
|
|
|
The following build arguments (``--build-arg`` in docker build command) can be used for CI images:
|
|
|
|
|
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| Build argument | Default value | Description |
|
|
|
|
|
+==========================================+==========================================+==========================================+
|
|
|
|
|
| ``PYTHON_BASE_IMAGE`` | ``python:3.6-slim-buster`` | Base python image |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
2020-12-26 13:20:47 +03:00
|
|
|
|
| ``AIRFLOW_VERSION`` | ``2.0.0`` | version of Airflow |
|
2020-04-02 20:52:11 +03:00
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``PYTHON_MAJOR_MINOR_VERSION`` | ``3.6`` | major/minor version of Python (should |
|
|
|
|
|
| | | match base image) |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``DEPENDENCIES_EPOCH_NUMBER`` | ``2`` | increasing this number will reinstall |
|
|
|
|
|
| | | all apt dependencies |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``PIP_NO_CACHE_DIR`` | ``true`` | if true, then no pip cache will be |
|
|
|
|
|
| | | stored |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``HOME`` | ``/root`` | Home directory of the root user (CI |
|
|
|
|
|
| | | image has root user as default) |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``AIRFLOW_HOME`` | ``/root/airflow`` | Airflow’s HOME (that’s where logs and |
|
|
|
|
|
| | | sqlite databases are stored) |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``AIRFLOW_SOURCES`` | ``/opt/airflow`` | Mounted sources of Airflow |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``CASS_DRIVER_NO_CYTHON`` | ``1`` | if set to 1 no CYTHON compilation is |
|
|
|
|
|
| | | done for cassandra driver (much faster) |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``AIRFLOW_REPO`` | ``apache/airflow`` | the repository from which PIP |
|
2020-06-16 13:36:46 +03:00
|
|
|
|
| | | dependencies are pre-installed |
|
2020-04-02 20:52:11 +03:00
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``AIRFLOW_BRANCH`` | ``master`` | the branch from which PIP dependencies |
|
2020-06-16 13:36:46 +03:00
|
|
|
|
| | | are pre-installed |
|
2020-04-02 20:52:11 +03:00
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``AIRFLOW_CI_BUILD_EPOCH`` | ``1`` | increasing this value will reinstall PIP |
|
|
|
|
|
| | | dependencies from the repository from |
|
|
|
|
|
| | | scratch |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
2020-10-10 13:58:09 +03:00
|
|
|
|
| ``AIRFLOW_CONSTRAINTS_LOCATION`` | | If not empty, it will override the |
|
|
|
|
|
| | | source of the constraints with the |
|
|
|
|
|
| | | specified URL or file. Note that the |
|
|
|
|
|
| | | file has to be in docker context so |
|
|
|
|
|
| | | it's best to place such file in |
|
|
|
|
|
| | | one of the folders included in |
|
2020-11-21 21:21:43 +03:00
|
|
|
|
| | | .dockerignore. for example in the |
|
|
|
|
|
| | | 'docker-context-files'. Note that the |
|
|
|
|
|
| | | location does not work for the first |
|
|
|
|
|
| | | stage of installation when the |
|
|
|
|
|
| | | stage of installation when the |
|
|
|
|
|
| | | ``AIRFLOW_PRE_CACHED_PIP_PACKAGES`` is |
|
|
|
|
|
| | | set to true. Default location from |
|
|
|
|
|
| | | GitHub is used in this case. |
|
2020-10-17 12:16:28 +03:00
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``AIRFLOW_CONSTRAINTS_REFERENCE`` | ``constraints-master`` | reference (branch or tag) from GitHub |
|
|
|
|
|
| | | repository from which constraints are |
|
|
|
|
|
| | | used. By default it is set to |
|
|
|
|
|
| | | ``constraints-master`` but can be |
|
2020-12-26 13:20:47 +03:00
|
|
|
|
| | | ``constraints-2-0`` for 2.0.* versions |
|
2020-10-17 12:16:28 +03:00
|
|
|
|
| | | ``constraints-1-10`` for 1.10.* versions |
|
|
|
|
|
| | | or it could point to specific version |
|
2020-12-26 13:20:47 +03:00
|
|
|
|
| | | for example ``constraints-2.0.0`` |
|
2020-10-17 12:16:28 +03:00
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``INSTALL_PROVIDERS_FROM_SOURCES`` | ``true`` | If set to false and image is built from |
|
|
|
|
|
| | | sources, all provider packages are not |
|
|
|
|
|
| | | installed. By default when building from |
|
|
|
|
|
| | | sources, all provider packages are also |
|
|
|
|
|
| | | installed together with the core airflow |
|
|
|
|
|
| | | package. It has no effect when |
|
|
|
|
|
| | | installing from PyPI or GitHub repo. |
|
2020-10-10 13:58:09 +03:00
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
2021-01-02 13:16:51 +03:00
|
|
|
|
| ``INSTALL_FROM_DOCKER_CONTEXT_FILES`` | ``false`` | If set to true, Airflow, providers and |
|
|
|
|
|
| | | all dependencies are installed from |
|
|
|
|
|
| | | from locally built/downloaded |
|
|
|
|
|
| | | .whl and .tar.gz files placed in the |
|
|
|
|
|
| | | ``docker-context-files``. In certain |
|
|
|
|
|
| | | corporate environments, this is required |
|
|
|
|
|
| | | to install airflow from such pre-vetted |
|
|
|
|
|
| | | packages rather than from PyPI. For this |
|
|
|
|
|
| | | to work, also set ``INSTALL_FROM_PYPI``. |
|
|
|
|
|
| | | Note that packages starting with |
|
|
|
|
|
| | | ``apache?airflow`` glob are treated |
|
|
|
|
|
| | | differently than other packages. All |
|
|
|
|
|
| | | ``apache?airflow`` packages are |
|
|
|
|
|
| | | installed with dependencies limited by |
|
|
|
|
|
| | | airflow constraints. All other packages |
|
|
|
|
|
| | | are installed without dependencies |
|
|
|
|
|
| | | 'as-is'. If you wish to install airflow |
|
|
|
|
|
| | | via 'pip download' with all dependencies |
|
|
|
|
|
| | | downloaded, you have to rename the |
|
|
|
|
|
| | | apache airflow and provider packages to |
|
|
|
|
|
| | | not start with ``apache?airflow`` glob. |
|
2020-10-10 13:58:09 +03:00
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
2020-04-02 20:52:11 +03:00
|
|
|
|
| ``AIRFLOW_EXTRAS`` | ``all`` | extras to install |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
2021-01-08 22:11:35 +03:00
|
|
|
|
| ``UPGRADE_TO_NEWER_DEPENDENCIES`` | ``false`` | If set to true, the dependencies are |
|
|
|
|
|
| | | upgraded to newer versions matching |
|
|
|
|
|
| | | setup.py before installation. |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``CONTINUE_ON_PIP_CHECK_FAILURE`` | ``false`` | By default the image will fail if pip |
|
|
|
|
|
| | | check fails for it. This is good for |
|
|
|
|
|
| | | interactive building but on CI the |
|
|
|
|
|
| | | image should be built regardless - we |
|
|
|
|
|
| | | have a separate step to verify image. |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
2020-12-07 01:36:33 +03:00
|
|
|
|
| ``INSTALL_FROM_PYPI`` | ``true`` | If set to true, Airflow is installed |
|
2021-01-24 00:07:22 +03:00
|
|
|
|
| | | from PyPI. If you want to install |
|
2020-10-15 16:19:18 +03:00
|
|
|
|
| | | Airflow from externally provided binary |
|
|
|
|
|
| | | package you can set it to false, place |
|
|
|
|
|
| | | the package in ``docker-context-files`` |
|
2020-12-07 01:36:33 +03:00
|
|
|
|
| | | and set |
|
|
|
|
|
| | | ``INSTALL_FROM_DOCKER_CONTEXT_FILES`` to |
|
|
|
|
|
| | | true. For this you have to also set the |
|
2020-10-15 16:19:18 +03:00
|
|
|
|
| | | ``AIRFLOW_PRE_CACHED_PIP_PACKAGES`` flag |
|
2020-12-07 01:36:33 +03:00
|
|
|
|
| | | to false |
|
2020-10-15 16:19:18 +03:00
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
2020-09-27 19:00:03 +03:00
|
|
|
|
| ``AIRFLOW_PRE_CACHED_PIP_PACKAGES`` | ``true`` | Allows to pre-cache airflow PIP packages |
|
|
|
|
|
| | | from the GitHub of Apache Airflow |
|
|
|
|
|
| | | This allows to optimize iterations for |
|
|
|
|
|
| | | Image builds and speeds up CI builds |
|
|
|
|
|
| | | But in some corporate environments it |
|
|
|
|
|
| | | might be forbidden to download anything |
|
|
|
|
|
| | | from public repositories. |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
2020-06-16 13:36:46 +03:00
|
|
|
|
| ``ADDITIONAL_AIRFLOW_EXTRAS`` | | additional extras to install |
|
2020-06-02 10:27:09 +03:00
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
2020-06-16 13:36:46 +03:00
|
|
|
|
| ``ADDITIONAL_PYTHON_DEPS`` | | additional python dependencies to |
|
2020-04-02 20:52:11 +03:00
|
|
|
|
| | | install |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
2020-09-29 16:30:00 +03:00
|
|
|
|
| ``DEV_APT_COMMAND`` | (see Dockerfile) | Dev apt command executed before dev deps |
|
|
|
|
|
| | | are installed in the first part of image |
|
2020-06-10 00:05:43 +03:00
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
2020-09-29 16:30:00 +03:00
|
|
|
|
| ``ADDITIONAL_DEV_APT_COMMAND`` | | Additional Dev apt command executed |
|
|
|
|
|
| | | before dev dep are installed |
|
|
|
|
|
| | | in the first part of the image |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``DEV_APT_DEPS`` | (see Dockerfile) | Dev APT dependencies installed |
|
|
|
|
|
| | | in the first part of the image |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``ADDITIONAL_DEV_APT_DEPS`` | | Additional apt dev dependencies |
|
|
|
|
|
| | | installed in the first part of the image |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``ADDITIONAL_DEV_APT_ENV`` | | Additional env variables defined |
|
|
|
|
|
| | | when installing dev deps |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``RUNTIME_APT_COMMAND`` | (see Dockerfile) | Runtime apt command executed before deps |
|
|
|
|
|
| | | are installed in first part of the image |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``ADDITIONAL_RUNTIME_APT_COMMAND`` | | Additional Runtime apt command executed |
|
|
|
|
|
| | | before runtime dep are installed |
|
|
|
|
|
| | | in the second part of the image |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``RUNTIME_APT_DEPS`` | (see Dockerfile) | Runtime APT dependencies installed |
|
|
|
|
|
| | | in the second part of the image |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``ADDITIONAL_RUNTIME_APT_DEPS`` | | Additional apt runtime dependencies |
|
|
|
|
|
| | | installed in second part of the image |
|
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
|
|
|
|
| ``ADDITIONAL_RUNTIME_APT_ENV`` | | Additional env variables defined |
|
|
|
|
|
| | | when installing runtime deps |
|
2020-06-10 00:05:43 +03:00
|
|
|
|
+------------------------------------------+------------------------------------------+------------------------------------------+
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
|
|
|
|
Here are some examples of how CI images can built manually. CI is always built from local sources.
|
|
|
|
|
|
|
|
|
|
This builds the CI image in version 3.7 with default extras ("all").
|
|
|
|
|
|
2020-06-16 13:36:46 +03:00
|
|
|
|
.. code-block:: bash
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
|
|
|
|
docker build . -f Dockerfile.ci --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
|
|
|
|
|
--build-arg PYTHON_MAJOR_MINOR_VERSION=3.7
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This builds the CI image in version 3.6 with "gcp" extra only.
|
|
|
|
|
|
2020-06-16 13:36:46 +03:00
|
|
|
|
.. code-block:: bash
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
|
|
|
|
docker build . -f Dockerfile.ci --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
|
|
|
|
|
--build-arg PYTHON_MAJOR_MINOR_VERSION=3.6 --build-arg AIRFLOW_EXTRAS=gcp
|
|
|
|
|
|
|
|
|
|
|
2020-06-02 10:27:09 +03:00
|
|
|
|
This builds the CI image in version 3.6 with "apache-beam" extra added.
|
|
|
|
|
|
2020-06-16 13:36:46 +03:00
|
|
|
|
.. code-block:: bash
|
2020-06-02 10:27:09 +03:00
|
|
|
|
|
|
|
|
|
docker build . -f Dockerfile.ci --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
|
|
|
|
|
--build-arg PYTHON_MAJOR_MINOR_VERSION=3.6 --build-arg ADDITIONAL_AIRFLOW_EXTRAS="apache-beam"
|
|
|
|
|
|
|
|
|
|
This builds the CI image in version 3.6 with "mssql" additional package added.
|
|
|
|
|
|
2020-06-16 13:36:46 +03:00
|
|
|
|
.. code-block:: bash
|
2020-06-02 10:27:09 +03:00
|
|
|
|
|
|
|
|
|
docker build . -f Dockerfile.ci --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
|
|
|
|
|
--build-arg PYTHON_MAJOR_MINOR_VERSION=3.6 --build-arg ADDITIONAL_PYTHON_DEPS="mssql"
|
|
|
|
|
|
2020-06-10 00:05:43 +03:00
|
|
|
|
This builds the CI image in version 3.6 with "gcc" and "g++" additional apt dev dependencies added.
|
|
|
|
|
|
|
|
|
|
.. code-block::
|
|
|
|
|
|
|
|
|
|
docker build . -f Dockerfile.ci --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
|
2020-09-29 16:30:00 +03:00
|
|
|
|
--build-arg PYTHON_MAJOR_MINOR_VERSION=3.6 --build-arg ADDITIONAL_DEV_APT_DEPS="gcc g++"
|
2020-06-10 00:05:43 +03:00
|
|
|
|
|
|
|
|
|
This builds the CI image in version 3.6 with "jdbc" extra and "default-jre-headless" additional apt runtime dependencies added.
|
|
|
|
|
|
|
|
|
|
.. code-block::
|
|
|
|
|
|
|
|
|
|
docker build . -f Dockerfile.ci --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
|
|
|
|
|
--build-arg PYTHON_MAJOR_MINOR_VERSION=3.6 --build-arg AIRFLOW_EXTRAS=jdbc --build-arg ADDITIONAL_RUNTIME_DEPS="default-jre-headless"
|
|
|
|
|
|
2020-04-02 20:52:11 +03:00
|
|
|
|
Production images
|
2020-09-21 20:31:42 +03:00
|
|
|
|
-----------------
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
2020-09-21 20:31:42 +03:00
|
|
|
|
You can find details about using, building, extending and customising the production images in the
|
2020-12-07 03:05:21 +03:00
|
|
|
|
`Latest documentation <https://airflow.apache.org/docs/apache-airflow/stable/production-deployment.html>`_
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
2020-06-10 00:05:43 +03:00
|
|
|
|
|
2020-04-02 20:52:11 +03:00
|
|
|
|
Image manifests
|
|
|
|
|
---------------
|
|
|
|
|
|
|
|
|
|
Together with the main CI images we also build and push image manifests. Those manifests are very small images
|
2020-12-26 13:20:47 +03:00
|
|
|
|
that contain only content of randomly generated file at the 'crucial' part of the CI image building.
|
|
|
|
|
This is in order to be able to determine very quickly if the image in the docker registry has changed a
|
|
|
|
|
lot since the last time. Unfortunately docker registry (specifically DockerHub registry) has no anonymous
|
|
|
|
|
way of querying image details via API. You really need to download the image to inspect it.
|
|
|
|
|
We workaround it in the way that always when we build the image we build a very small image manifest
|
|
|
|
|
containing randomly generated UUID and push it to registry together with the main CI image.
|
|
|
|
|
The tag for the manifest image reflects the image it refers to with added ``-manifest`` suffix.
|
|
|
|
|
The manifest image for ``apache/airflow:master-python3.6-ci`` is named
|
2020-04-02 20:52:11 +03:00
|
|
|
|
``apache/airflow:master-python3.6-ci-manifest``.
|
|
|
|
|
|
2020-12-26 13:20:47 +03:00
|
|
|
|
The image is quickly pulled (it is really, really small) when important files change and the content
|
|
|
|
|
of the randomly generated UUID is compared with the one in our image. If the contents are different
|
|
|
|
|
this means that the user should rebase to latest master and rebuild the image with pulling the image from
|
|
|
|
|
the repo as this will likely be faster than rebuilding the image locally.
|
|
|
|
|
|
|
|
|
|
The random UUID is generated right after pre-cached pip install is run - and usually it means that
|
|
|
|
|
significant changes have been made to apt packages or even the base python image has changed.
|
|
|
|
|
|
2020-04-02 20:52:11 +03:00
|
|
|
|
Pulling the Latest Images
|
|
|
|
|
-------------------------
|
|
|
|
|
|
|
|
|
|
Sometimes the image needs to be rebuilt from scratch. This is required, for example,
|
|
|
|
|
when there is a security update of the Python version that all the images are based on and new version
|
|
|
|
|
of the image is pushed to the repository. In this case it is usually faster to pull the latest
|
|
|
|
|
images rather than rebuild them from scratch.
|
|
|
|
|
|
|
|
|
|
You can do it via the ``--force-pull-images`` flag to force pulling the latest images from the Docker Hub.
|
|
|
|
|
|
|
|
|
|
For production image:
|
|
|
|
|
|
2020-06-16 13:36:46 +03:00
|
|
|
|
.. code-block:: bash
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
|
|
|
|
./breeze build-image --force-pull-images --production-image
|
|
|
|
|
|
|
|
|
|
For CI image Breeze automatically uses force pulling in case it determines that your image is very outdated,
|
|
|
|
|
however uou can also force it with the same flag.
|
|
|
|
|
|
2020-06-16 13:36:46 +03:00
|
|
|
|
.. code-block:: bash
|
2020-04-02 20:52:11 +03:00
|
|
|
|
|
|
|
|
|
./breeze build-image --force-pull-images
|
|
|
|
|
|
2020-06-16 13:36:46 +03:00
|
|
|
|
Embedded image scripts
|
|
|
|
|
======================
|
|
|
|
|
|
|
|
|
|
Both images have a set of scripts that can be used in the image. Those are:
|
|
|
|
|
* /entrypoint - entrypoint script used when entering the image
|
|
|
|
|
* /clean-logs - script for periodic log cleaning
|
|
|
|
|
|
|
|
|
|
Running the CI image
|
|
|
|
|
====================
|
|
|
|
|
|
|
|
|
|
The entrypoint in the CI image contains all the initialisation needed for tests to be immediately executed.
|
2020-08-21 18:21:57 +03:00
|
|
|
|
It is copied from ``scripts/in_container/entrypoint_ci.sh``.
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
|
|
|
|
The default behaviour is that you are dropped into bash shell. However if RUN_TESTS variable is
|
|
|
|
|
set to "true", then tests passed as arguments are executed
|
|
|
|
|
|
|
|
|
|
The entrypoint performs those operations:
|
|
|
|
|
|
|
|
|
|
* checks if the environment is ready to test (including database and all integrations). It waits
|
|
|
|
|
until all the components are ready to work
|
|
|
|
|
|
|
|
|
|
* installs older version of Airflow (if older version of Airflow is requested to be installed
|
|
|
|
|
via ``INSTALL_AIRFLOW_VERSION`` variable.
|
|
|
|
|
|
|
|
|
|
* Sets up Kerberos if Kerberos integration is enabled (generates and configures Kerberos token)
|
|
|
|
|
|
2020-12-17 12:53:35 +03:00
|
|
|
|
* Sets up ssh keys for ssh tests and restarts the SSH server
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
|
|
|
|
* Sets all variables and configurations needed for unit tests to run
|
|
|
|
|
|
|
|
|
|
* Reads additional variables set in ``files/airflow-breeze-config/variables.env`` by sourcing that file
|
|
|
|
|
|
|
|
|
|
* In case of CI run sets parallelism to 2 to avoid excessive number of processes to run
|
|
|
|
|
|
|
|
|
|
* In case of CI run sets default parameters for pytest
|
|
|
|
|
|
|
|
|
|
* In case of running integration/long_running/quarantined tests - it sets the right pytest flags
|
|
|
|
|
|
|
|
|
|
* Sets default "tests" target in case the target is not explicitly set as additional argument
|
|
|
|
|
|
|
|
|
|
* Runs system tests if RUN_SYSTEM_TESTS flag is specified, otherwise runs regular unit and integration tests
|
|
|
|
|
|
|
|
|
|
|
2020-09-21 20:31:42 +03:00
|
|
|
|
Using, customising, and extending the production image
|
|
|
|
|
======================================================
|
2020-06-16 13:36:46 +03:00
|
|
|
|
|
2020-12-07 03:05:21 +03:00
|
|
|
|
You can read more about using, customising, and extending the production image in the
|
|
|
|
|
`documentation <https://airflow.apache.org/docs/apache-airflow/stable/production-deployment.html>`_.
|