Better compatibility/diagnostics for arbitrary UID in docker image (#15162)
The PROD image of airflow is OpenShift compatible and it can be
run with either 'airflow' user (UID=50000) or with any other
user with (GID=0).
This change adds umask 0002 to make sure that whenever the image
is extended and new directories get created, the directories are
group-writeable for GID=0. This is added in the default
entrypoint.
The entrypoint will fail if it is not run as airflow user or if
other, arbitrary user is used with GID != 0.
Fixes: #15107
(cherry picked from commit ce91872ecc
)
This commit is contained in:
Родитель
1f67edd127
Коммит
a7e80b194f
|
@ -487,7 +487,7 @@ WORKDIR ${AIRFLOW_HOME}
|
|||
|
||||
EXPOSE 8080
|
||||
|
||||
RUN usermod -g 0 airflow
|
||||
RUN usermod -g 0 airflow -G ${AIRFLOW_GID}
|
||||
|
||||
USER ${AIRFLOW_UID}
|
||||
|
||||
|
|
|
@ -21,7 +21,7 @@
|
|||
|
||||
# User and group of airflow user
|
||||
uid: 50000
|
||||
gid: 50000
|
||||
gid: 0
|
||||
|
||||
# Airflow home directory
|
||||
# Used for mount paths
|
||||
|
|
|
@ -51,10 +51,10 @@ Those are the most common arguments that you use when you want to build a custom
|
|||
+------------------------------------------+------------------------------------------+------------------------------------------+
|
||||
| ``AIRFLOW_UID`` | ``50000`` | Airflow user UID. |
|
||||
+------------------------------------------+------------------------------------------+------------------------------------------+
|
||||
| ``AIRFLOW_GID`` | ``50000`` | Airflow group GID. Note that most files |
|
||||
| | | created on behalf of airflow user belong |
|
||||
| | | to the ``root`` group (0) to keep |
|
||||
| | | OpenShift Guidelines compatibility. |
|
||||
| ``AIRFLOW_GID`` | ``50000`` | Airflow group GID. Note that writable |
|
||||
| | | files/dirs, created on behalf of airflow |
|
||||
| | | user are set to the ``root`` group (0) |
|
||||
| | | to allow arbitrary UID to run the image. |
|
||||
+------------------------------------------+------------------------------------------+------------------------------------------+
|
||||
| ``AIRFLOW_CONSTRAINTS_REFERENCE`` | | Reference (branch or tag) from GitHub |
|
||||
| | | where constraints file is taken from |
|
||||
|
|
|
@ -89,6 +89,11 @@ You should be aware, about a few things:
|
|||
PIP packages are installed to ``~/.local`` folder as if the ``--user`` flag was specified when running PIP.
|
||||
Note also that using ``--no-cache-dir`` is a good idea that can help to make your image smaller.
|
||||
|
||||
.. note::
|
||||
Only as of ``2.0.1`` image the ``--user`` flag is turned on by default by setting ``PIP_USER`` environment
|
||||
variable to ``true``. This can be disabled by un-setting the variable or by setting it to ``false``. In the
|
||||
2.0.0 image you had to add the ``--user`` flag as ``pip install --user`` command.
|
||||
|
||||
* If your apt, or PyPI dependencies require some of the ``build-essential`` or other packages that need
|
||||
to compile your python dependencies, then your best choice is to follow the "Customize the image" route,
|
||||
because you can build a highly-optimized (for size) image this way. However it requires to checkout sources
|
||||
|
@ -103,10 +108,22 @@ You should be aware, about a few things:
|
|||
a command ``docker build . --tag my-image:my-tag`` (where ``my-image`` is the name you want to name it
|
||||
and ``my-tag`` is the tag you want to tag the image with.
|
||||
|
||||
* If your way of extending image requires to create writable directories, you MUST remember about adding
|
||||
``umask 0002`` step in your RUN command. This is necessary in order to accommodate our approach for
|
||||
running the image with an arbitrary user. Such user will always run with ``GID=0`` -
|
||||
the entrypoint will prevent non-root GIDs. You can read more about it in
|
||||
:ref:`arbitrary docker user <arbitrary-docker-user>` documentation for the entrypoint. The
|
||||
``umask 0002`` is set as default when you enter the image, so any directories you create by default
|
||||
in runtime, will have ``GID=0`` and will be group-writable.
|
||||
|
||||
.. note::
|
||||
As of 2.0.1 image the ``--user`` flag is turned on by default by setting ``PIP_USER`` environment variable
|
||||
to ``true``. This can be disabled by un-setting the variable or by setting it to ``false``. In the
|
||||
2.0.0 image you had to add the ``--user`` flag as ``pip install --user`` command.
|
||||
Only as of ``2.0.2`` the default group of ``airflow`` user is ``root``. Previously it was ``airflow``,
|
||||
so if you are building your images based on an earlier image, you need to manually change the default
|
||||
group for airflow user:
|
||||
|
||||
.. code-block:: docker
|
||||
|
||||
RUN usermod -g 0 airflow
|
||||
|
||||
Examples of image extending
|
||||
---------------------------
|
||||
|
@ -131,6 +148,18 @@ The following example adds ``lxml`` python package from PyPI to the image.
|
|||
:start-after: [START Dockerfile]
|
||||
:end-before: [END Dockerfile]
|
||||
|
||||
A ``umask`` requiring example
|
||||
.............................
|
||||
|
||||
The following example adds a new directory that is supposed to be writable for any arbitrary user
|
||||
running the container.
|
||||
|
||||
.. exampleinclude:: docker-examples/extending/writable-directory/Dockerfile
|
||||
:language: Dockerfile
|
||||
:start-after: [START Dockerfile]
|
||||
:end-before: [END Dockerfile]
|
||||
|
||||
|
||||
A ``build-essential`` requiring package example
|
||||
...............................................
|
||||
|
||||
|
|
|
@ -0,0 +1,21 @@
|
|||
# Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
# contributor license agreements. See the NOTICE file distributed with
|
||||
# this work for additional information regarding copyright ownership.
|
||||
# The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
# (the "License"); you may not use this file except in compliance with
|
||||
# the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# This is an example Dockerfile. It is not intended for PRODUCTION use
|
||||
# [START Dockerfile]
|
||||
FROM apache/airflow:2.0.1
|
||||
RUN umask 0002; \
|
||||
mkdir -p ~/writeable-directory
|
||||
# [END Dockerfile]
|
|
@ -87,13 +87,49 @@ The image entrypoint works as follows:
|
|||
command to execute and result of this evaluation is used as ``AIRFLOW__CELERY__BROKER_URL``. The
|
||||
``_CMD`` variable takes precedence over the ``AIRFLOW__CELERY__BROKER_URL`` variable.
|
||||
|
||||
Creating system user
|
||||
--------------------
|
||||
.. _arbitrary-docker-user:
|
||||
|
||||
Allowing arbitrary user to run the container
|
||||
--------------------------------------------
|
||||
|
||||
Airflow image is Open-Shift compatible, which means that you can start it with random user ID and the
|
||||
group id ``0`` (``root``). If you want to run the image with user different than Airflow, you MUST set
|
||||
GID of the user to ``0``. In case you try to use different group, the entrypoint exits with error.
|
||||
|
||||
In order to accommodate a number of external libraries and projects, Airflow will automatically create
|
||||
such an arbitrary user in (`/etc/passwd`) and make it's home directory point to ``/home/airflow``.
|
||||
Many of 3rd-party libraries and packages require home directory of the user to be present, because they
|
||||
need to write some cache information there, so such a dynamic creation of a user is necessary.
|
||||
|
||||
Such arbitrary user has to be able to write to certain directories that needs write access, and since
|
||||
it is not advised to allow write access to "other" for security reasons, the OpenShift
|
||||
guidelines introduced the concept of making all such folders have the ``0`` (``root``) group id (GID).
|
||||
All the directories that need write access in the Airflow production image have GID set to 0 (and
|
||||
they are writable for the group). We are following that concept and all the directories that need
|
||||
write access follow that.
|
||||
|
||||
The GID=0 is set as default for the ``airflow`` user, so any directories it creates have GID set to 0
|
||||
by default. The entrypoint sets ``umask`` to be ``0002`` - this means that any directories created by
|
||||
the user have also "group write" access for group ``0`` - they will be writable by other users with
|
||||
``root`` group. Also whenever any "arbitrary" user creates a folder (for example in a mounted volume), that
|
||||
folder will have a "group write" access and ``GID=0``, so that execution with another, arbitrary user
|
||||
will still continue to work, even if such directory is mounted by another arbitrary user later.
|
||||
|
||||
The ``umask`` setting however only works for runtime of the container - it is not used during building of
|
||||
the image. If you would like to extend the image and add your own packages, you should remember to add
|
||||
``umask 0002`` in front of your docker command - this way the directories created by any installation
|
||||
that need group access will also be writable for the group. This can be done for example this way:
|
||||
|
||||
.. code-block:: docker
|
||||
|
||||
RUN umask 0002; \
|
||||
do_something; \
|
||||
do_otherthing;
|
||||
|
||||
|
||||
Airflow image is Open-Shift compatible, which means that you can start it with random user ID and group id 0.
|
||||
Airflow will automatically create such a user and make it's home directory point to ``/home/airflow``.
|
||||
You can read more about it in the "Support arbitrary user ids" chapter in the
|
||||
`Openshift best practices <https://docs.openshift.com/container-platform/4.1/openshift_images/create-images.html#images-create-guide-openshift_create-images>`_.
|
||||
`Openshift best practices <https://docs.openshift.com/container-platform/4.7/openshift_images/create-images.html#images-create-guide-openshift_create-images>`_.
|
||||
|
||||
|
||||
Waits for Airflow DB connection
|
||||
-------------------------------
|
||||
|
|
|
@ -15,7 +15,6 @@
|
|||
# KIND, either express or implied. See the License for the
|
||||
# specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
# Might be empty
|
||||
AIRFLOW_COMMAND="${1}"
|
||||
|
||||
|
@ -244,6 +243,47 @@ function exec_to_bash_or_python_command_if_specified() {
|
|||
fi
|
||||
}
|
||||
|
||||
function check_uid_gid() {
|
||||
if [[ $(id -g) == "0" ]]; then
|
||||
return
|
||||
fi
|
||||
if [[ $(id -u) == "50000" ]]; then
|
||||
>&2 echo
|
||||
>&2 echo "WARNING! You should run the image with GID (Group ID) set to 0"
|
||||
>&2 echo " even if you use 'airflow' user (UID=50000)"
|
||||
>&2 echo
|
||||
>&2 echo " You started the image with UID=$(id -u) and GID=$(id -g)"
|
||||
>&2 echo
|
||||
>&2 echo " This is to make sure you can run the image with an arbitrary UID in the future."
|
||||
>&2 echo
|
||||
>&2 echo " See more about it in the Airflow's docker image documentation"
|
||||
>&2 echo " http://airflow.apache.org/docs/docker-stack/entrypoint"
|
||||
>&2 echo
|
||||
# We still allow the image to run with `airflow` user.
|
||||
return
|
||||
else
|
||||
>&2 echo
|
||||
>&2 echo "ERROR! You should run the image with GID=0"
|
||||
>&2 echo
|
||||
>&2 echo " You started the image with UID=$(id -u) and GID=$(id -g)"
|
||||
>&2 echo
|
||||
>&2 echo "The image should always be run with GID (Group ID) set to 0 regardless of the UID used."
|
||||
>&2 echo " This is to make sure you can run the image with an arbitrary UID."
|
||||
>&2 echo
|
||||
>&2 echo " See more about it in the Airflow's docker image documentation"
|
||||
>&2 echo " http://airflow.apache.org/docs/docker-stack/entrypoint"
|
||||
# This will not work so we fail hard
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
check_uid_gid
|
||||
|
||||
# Set umask to 0002 to make all the directories created by the current user group-writeable
|
||||
# This allows the same directories to be writeable for any arbitrary user the image will be
|
||||
# run with, when the directory is created on a mounted volume and when that volume is later
|
||||
# reused with a different UID (but with GID=0)
|
||||
umask 0002
|
||||
|
||||
CONNECTION_CHECK_MAX_COUNT=${CONNECTION_CHECK_MAX_COUNT:=20}
|
||||
readonly CONNECTION_CHECK_MAX_COUNT
|
||||
|
|
Загрузка…
Ссылка в новой задаче