Граф коммитов

127 Коммитов

Автор SHA1 Сообщение Дата
Jarek Potiuk 88199eefcc
Fix problem with wrong constraint name in v2-0-stable branch (#15494)
While cherry-picking docker image changei to v2-0-test, the
value of the arg was wrongly rename (similarly to other parameters) with
`constraintis-2.0` where it should remain as `constraints`.
This is the name of constraint file to use, and it's value might
be either `constraints-no-providers`, `constraints`, or
`constraints-source-providers`.

This change restores proper default of the arg.

Fixes: #15493
2021-04-22 23:31:17 +02:00
Jarek Potiuk a7e80b194f Better compatibility/diagnostics for arbitrary UID in docker image (#15162)
The PROD image of airflow is OpenShift compatible and it can be
run with either 'airflow' user (UID=50000) or with any other
user with (GID=0).

This change adds umask 0002 to make sure that whenever the image
is extended and new directories get created, the directories are
group-writeable for GID=0. This is added in the default
entrypoint.

The entrypoint will fail if it is not run as airflow user or if
other, arbitrary user is used with GID != 0.

Fixes: #15107
(cherry picked from commit ce91872ecc)
2021-04-15 14:00:32 +01:00
Jarek Potiuk 588c1a1fd0 Adds dill exclusion to Dockerfiles to accomodate upcoming beam fix (#15048)
* Upgrades moto to newer version (~=2.0)

According to https://github.com/spulec/moto/issues/3535#issuecomment-808706939
1.3.17 version of moto with a fix to be compatible with mock> 4.0.3 is
not going to be released because of breaking changes. Therefore we need
to migrate to newer version of moto.

At the same time we can get rid of the old botocore limitation, which
was added apparently to handle some test errors. We are relying fully
on what boto3 depends on.

Upgrading dependencies also discovered that mysql tests need to
be fixed because upgraded version of dependencies cause some test
failure (those turned out to be badly written tests).

* Adds dill exclusion to Dockerfiles to accomodate upcoming beam fix

With the upcoming apache-beam change where mock library will be
removed from install dependencies, we will be able to remove
`apache-beam` exclusion in our CI scripts. This will be a final
step of cleaning dependencies so that we have a truly
golden set of constraints that will allow to install airflow
and all community managed providers (we managed to fix all those
dependency issues for all packages but apache-beam).

The fix https://github.com/apache/beam/pull/14328 when merged
and Apache Beam is released will allow us to migrate to the new
version and get rid of the CI exclusion for beam.

Closes: #14994
(cherry picked from commit ec962b01b7)
2021-04-15 14:00:31 +01:00
Jarek Potiuk edbf49c645 Prepare ad-hoc release of the four previously excluded providers (#14655)
Documentation update for the four previously excluded providers that
got extra fixes/bumping to the latest version of the libraries.

* apache.beam
* apache.druid
* microsoft.azure
* snowflake

(cherry picked from commit b753c7fa60)
2021-04-15 14:00:31 +01:00
Jarek Potiuk 9d9f68e562 Much easier to use and better documented Docker image (#14911)
Previously you had to specify AIRFLOW_VERSION_REFERENCE and
AIRFLOW_CONSTRAINTS_REFERENCE to point to the right version
of Airflow. Now those values are auto-detected if not specified
(but you can still override them)

This change allowed to simplify and restructure the Dockerfile
documentation - following the recent change in separating out
the docker-stack, production image building documentation has
been improved to reflect those simplifications. It should be
much easier to grasp by the novice users now - very clear
distinction and separation is made between the two types of
building your own images - customizing or extending - and it
is now much easier to follow examples and find out how to
build your own image. The criteria on which approach to
choose were put first and forefront.

Examples have been reviewed, fixed and put in a logical
sequence. From the most basic ones to the most advanced,
with clear indication where the basic aproach ends and where
the "power-user" one starts. The examples were also separated
out to separate files and included from there - also the
example Docker images and build commands are executable
and tested automatically in CI, so they are guaranteed
to work.

Finally The build arguments were split into sections - from most
basic to most advanced and each section links to appropriate
example section, showing how to use those parameters.

Fixes: #14848
Fixes: #14255
2021-04-15 14:00:31 +01:00
Jarek Potiuk 89a2eb0412 Fixes default group of Airflow user. (#14944)
The production image did not have root group set as default for
the airflow user. This was not a big problem unless you extended
the image - in which case you had to change the group manually
when copying the images in order to keep the image OpenShift
compatible (i.e. runnable with any user and root group).

This PR fixes it by changing default group of airflow user
to root, which also works when you extend the image.

```
Connected.
airflow@53f70b1e3675:/opt/airflow$ ls
dags  logs
airflow@53f70b1e3675:/opt/airflow$ cd dags/
airflow@53f70b1e3675:/opt/airflow/dags$ ls -l
total 4
-rw-r--r-- 1 airflow root 1648 Mar 22 23:16 test_dag.py
airflow@53f70b1e3675:/opt/airflow/dags$
```
2021-04-15 14:00:31 +01:00
Ash Berlin-Taylor 72bec72ff4 Prepare to switch master branch for main. (#14688)
There are many more references to "master" (even in our own repo) than
this, but this commit is the first step: to that process.

It makes CI run on the main branch (once it exists), re-words a few
cases where we can to easily not refer to master anymore.

This doesn't yet re-name the `constraints-master` or `master-*` images -
that will be done in a future PR.

(We don't be able to entirely eliminate "master" from our repo as we
refer to a lot of other GitHub repos that we can't change.)

(cherry picked from commit 0dea083fcb)
2021-04-15 14:00:30 +01:00
Jarek Potiuk 178dde3aa9 By default PIP will install all packages in .local folder (#14125)
In order to optimize the Docker image, we use the ~/.local
folder copied from build imge (this gives huge optimisations
regarding the docker image size). So far we instructed the users
to add --user flag manually when installing any packages when they
extend the images, however this has proven to be problematic as
users rarely read the whole documentation and simply try what they
know.

This PR attempts to fix it. `PIP_USER` variable is set to `true`
in the final image, which means that the installation by default
will use ~/.local folder as target. This can be disabled by
unsetting the variable or setting it to `false`.

Also since pylint version has been released to 2.7.0, it fixes
a few pylint versions so that we can update to the latest constraints.

(cherry picked from commit ca35bd7f7f)
2021-03-03 15:05:56 +01:00
Kamil Breguła ec829673e2 Update hadolint from v1.18.0 to v1.22.1 (#14509)
* Update hadolint from v1.22.1 to v1.18.0

* fixup! Update hadolint from v1.22.1 to v1.18.0

* fixup! fixup! Update hadolint from v1.22.1 to v1.18.0

Co-authored-by: Kamil Breguła <kamilbregula@apache.org>
(cherry picked from commit cc7260a9e8)
2021-03-03 11:04:31 +01:00
Jarek Potiuk 803c5eba6f Implements generation of separate constraints for core and providers (#14227)
There are two types of constraints now:

* default constraints that contain all depenedncies of airflow,
  all the provider packages released at the time of the relese
  of that version, as well as all transitive dependencies. Following
  those constraints, you can be sure Airflow's installation is
  repeatable

* no-providers constraints - containing only the dependencies needed
  for core airflow installation. This allows to install/upgrade
  airflow without also forcing the provider's to be installed at
  specific version of Airflow.

This allows for flexible management of Airflow and Provider
packages separately. Documentation about it has been added.

Also the provider 'extras' for apache airflow do not keep direct
dependencies to the packages needed by the provider. Those
dependencies are now transitive only - so 'provider' extras only
depend on 'apache-airflow-provider-EXTRA' package and all
the dependencies are transitive. This will help in the future
to avoid conflicts when installing newer providers using extras.

(cherry picked from commit d524cec99d)
2021-03-03 11:04:25 +01:00
Jarek Potiuk 35c7a45085 Disable progress bar for PIP installation (#14126)
When the image is prepared, PIP installation produces progress
bars which are annoying - especially in the CI environment.

This PR adds argument to control progress bar and sets it to "off"
for CI builds.

(cherry picked from commit 9b7852e047)
2021-03-03 00:58:56 +01:00
Jarek Potiuk 94ae38f8a9 Restores flexible installation version, fixes manual tag build process. (#14107)
Revert "Fix Commands to install Airflow in docker/install_airflow.sh (#14099)"

This reverts commit 68758b8260.

Also fixes the docker build script that was the reason for original
attempt to fix it.

(cherry picked from commit 212d5cd315)
2021-02-09 12:12:20 +01:00
Jarek Potiuk d15bef1a8e Adds capability of switching to Github Container Registry (#13726)
* Adds capability of switching to Github Container Registry

Currently we are using GitHub Packages to cache images for the
build. GitHub Packages are "legacy" storage of binary artifacts
for GitHub and as of September 2020 they introduced Github
Container Registry as more stable, easier to manage replacement
for container storage. It includes complete self-management
of the images including permission management, public access,
retention management and many more.

More about it here:

https://github.blog/2020-09-01-introducing-github-container-registry/

Recently we started to experience unstable behaviour of the
Github Packages ('unknown blob' and manifest v1 vs. v2 when
pushing images to it. So together with ASF we proposed to
enable Github Container Registry and it happened as of
January 2020.

More about it in https://issues.apache.org/jira/browse/INFRA-20959

We are currently in the testing phase, especially when it
comes to management of permissions - the model of permission
mangement is not the same for Container Registry as it was
for GitHub Packages (it was per-repository in GitHub Packages,
but it is organization-wide in the Container Registry.

This PR introduces an option to use GitHub Container Registry
rather than GitHub Packages. It is implemented in both - CI
level and Breeze level allowing to seamlessly switch between
those two solutions:

In Breeze (which we use to test pushing/pulling the images)
--github-registry option was added with `ghcr.io` (Github Container
Registry) or `docker.pkg.github.com` (GitHub Packages).

In CI the same can be achieved by setting GITHUB_REGISTRY value
(same values possible as for --github-registry Breeze parameter)

* fixup! Adds capability of switching to Github Container Registry

(cherry picked from commit 2c6c7fdb23)
2021-01-21 20:01:27 +00:00
Jarek Potiuk 58d99dcee3 Remove chmod +x for installation script for docker build. (#13772)
We've introduced chmod a+x for installation scripts in Dockerfiles.
but this turned out to be a bad idea. This was to accomodate
building on Azure Deveops which has filesystem that does not
keep executable bit. But the side-effect of it that the
layer of the script is invalidated when the permission is changed
to +x on linux. The problem is that the script has locally (on
checkout) different permissions depending on umask setting.

Therefore changing permissions for the image to +a is not best.

Instead we are running the scripts with bash directly, which does
not require changing of executable bit.

(cherry picked from commit 18d9320c26)
2021-01-21 19:56:35 +00:00
Jarek Potiuk 01bd703fd2 Disables provider's manager warning for source-installed prod image. (#13729)
When production image is built for development purpose, by default
it installs all providers from sources, but not all dependencies
are installed for all providers. Many providers require more
dependencies and when you try to import those packages via
provider's manager, they fail to import and print warnings.

Those warnings are now turned into debug messages, in case
AIRFLOW_INSTALLATION_METHOD=".", which is set when
production image is built locally from sources. This is helpful
especially when you use locally build production image to
run K8S tests - otherwise the logs are flooded with
warnings.

This problem does not happe in CI, because there by default
production image is built from locally prepared packages
and it does not contain sources from providers that are not
installed via packages.

(cherry picked from commit f74da5025d)
2021-01-21 19:54:25 +00:00
Jarek Potiuk 96b281d8b2 Switches to latest version of snowflake connector (#13654)
This should allow us to release a new version of snowflake
provider that is not interacting with other providers via
monkeypatching of SSL classes.

Fixes #12881

(cherry picked from commit 6e90dfc38b)
2021-01-21 19:47:11 +00:00
Jarek Potiuk 3b5e8efb34 Removes provider-imposed requirements from setup.cfg (#13409)
This change removes the provider-imposed requirements from the
airflow's setup.cfg to additional configuration in the
breeze/CI scripts. This does not change constraint apprach
when installing airflow, the constraints to those versions
remain as they were, but airflow package does not have to
have the limits in 'install_requires' section which makes
it much more "standalone.

We can add more requirements there as needed or remove
them when provider's dependencies change.

Also thanks to using --upgrade-to-newer-dependencies flag in
Breeze, the instructions on what to do when there is
a problem with conflicting dependencies are much simpler.

You do not need any more to set the label in PR
to test how upgrade to newer dependencies will look like,
you can test it yourself locally.

This is a final step of making airflow package fully
independent from the provider's dependencies.

(cherry picked from commit f49f36b6a0)
2021-01-21 19:20:43 +00:00
Jarek Potiuk 9b0ea24ad6 Install airflow and providers together from context files (#13441)
Airflow and provider packages need to be installed together to
make sure that constrainst are taken into account and that airflow
does not get reinstalled from PyPI when eager upgrade runs.

(cherry picked from commit bc6f5ea088)
2021-01-21 18:52:33 +00:00
Jarek Potiuk 6707bbe74c Add extras when installing prod image from packages (#13432)
In the latest change #13422 change in the way product images are
prepared removed extras from installed airflow - thus caused
failing production image verification check.

This change restores extras when airflow is installed from packages

(cherry picked from commit 3a731108f5)
2021-01-21 18:51:15 +00:00
Jarek Potiuk 9bfc783449 Removes pip download when installing from local packages (#13422)
This PR improves building production image from local packages,
in preparation for moving provider requirements out of setup.cfg.

Previously `pip download` step was executed in the CI scripts
in order to download all the packages that were needed. However
this had two problems:

1) PIP download was executed outside of Dockerfile in CI scripts
   which means that any change to requirements there could not
   be executed in 'workflow_run' event - because main branch version
   of CI scripts is used there. We want to add extra requirements
   when installing airflow so in order to be able to change
   it, those requirements should be added in Dockerfile.
   This will be done in the follow-up #13409 PR.

2) Packages downloaded with PIP download have a "file" version
   rather than regular == version when you run pip freeze/check.
   This looks weird and while you can figure out the version
   from file name, when you `pip install` them, they look
   much more normal. The airflow package and provider package
   will still get the "file" form but this is ok because we are
   building those packages from sources and they are not yet
   available in PyPI.

Example:

  adal==1.2.5
  aiohttp==3.7.3
  alembic==1.4.3
  amqp==2.6.1
  apache-airflow @ file:///docker-context-files/apache_airflow-2.1.0.dev0-py3-none-any.whl
  apache-airflow-providers-amazon @ file:///docker-context-files/apache_airflow_providers_amazon-1.0.0-py3-none-any.whl
  apache-airflow-providers-celery @ file:///docker-context-files/apache_airflow_providers_celery-1.0.0-py3-none-any.whl
  ...

With this PR, we do not `pip download` all packages, but instead
we prepare airflow + providers packages as .whl files and
install them from there (all the dependencies are installed
from PyPI)

(cherry picked from commit e436883583)
2021-01-21 18:47:14 +00:00
Jarek Potiuk 643d878bc8 Production image can also be upgraded to newer dependencies (#13345)
Previously UPGRADE_TO_LATEST_CONSTRAINTS variable controlled
whether the CI image uses latest dependencies rather than
fixed constraints. This PR brings it also to PROD image.

The name of the ARG is changed to UPGRADE_TO_NEWER_DEPENDENCIES
as this corresponds better with the intention.

(cherry picked from commit 82fa048c12)
2021-01-21 18:36:19 +00:00
Jarek Potiuk 047a19790a Adds missing LDAP "extra" dependencies to ldap provider. (#13308)
It seems that for quite some time (1.10.4) the "ldap" extra
missed python-ldap dependency.

https://issues.apache.org/jira/browse/AIRFLOW-5261

Also LDAP seems to be popular enough to be added as default
extra in the production image.

Fixes #13306

(cherry picked from commit d23ac9b235)
2021-01-21 18:34:29 +00:00
Jarek Potiuk dc843192e6 Rename PIP_VERSION to AIRFLOW_PIP_VERSION (#13320)
Some older versions of PIP (including the one in dockerhub!) treat
all env variables starting with PIP_ as a way to pass
options. Setting PIP_VERSION to 20.2.4 and exporting it causes
error "ValueError: invalid truth value '20.2.4'" because it
does not have --version option and it treats it as --verbose

¯\_(ツ)_/¯

You can read more about it here:

https://github.com/pypa/pip/issues/4528

This PR renames the variable to avoid this side effect.

(cherry picked from commit 8fed541192)
2021-01-21 18:29:21 +00:00
John Bampton ca20ba079b Fix spelling (#13130)
(cherry picked from commit 8529cb1c7d)
2021-01-21 17:45:49 +00:00
Jarek Potiuk cc87caa0ce Update default versions v2-0-test in the 2.0 branch (#12962) 2020-12-14 18:30:35 +00:00
Jarek Potiuk abf2a4264b
Install airflow and providers from dist and verifies them (#13033)
* Install airflow and providers from dist and verifies them

This check is there to prevent problems similar to those reported
in #13027 and fixed in #13031.

Previously we always built airflow from wheels, only providers were
installed from sdist packages and tested. In this version both
airflow and providers are installed using the same package format
(sdist or wheel).

* Update scripts/in_container/entrypoint_ci.sh

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-12-12 19:38:30 +01:00
Jarek Potiuk db027735a7
Changes release image preparation to use PyPI packages (#12990)
* Changes release image preparation to use PyPI packages

Since we released all teh provider packages to PyPI now in
RC version, we can now change the mechanism to prepare the
production to use released packages in case of tagged builds.

The "branch" production images are still prepared using the
CI images and .whl packages built from sources, but the
release packages are built from officially released PyPI
packages.

Also some corrections and updates were made to the release process:

* the constraint tags when RC candidate is sent should contain
  rcn suffix.

* there was missing step about pushing the release tag once the
  release is out

* pushing tag to GitHub should be done after the PyPI packages
  are uploaded, so that automated image building in DockerHub
  can use those packages.

* added a note that in case we will release some provider
  packages that depend on the just released airflow version
  they shoudl be released after airflow is in PyPI but before
  the tag is pushed to GitHub (also to allow the image to be
  build automatically from the released packages)

Fixes: #12970

* Update dev/README_RELEASE_AIRFLOW.md

Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>

* Update dev/README_RELEASE_AIRFLOW.md

Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>

Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
2020-12-12 12:01:58 +01:00
jao6693 2ec03cd926
Update Dockerfile (#12987)
Fix permission issue in Azure DevOps when running the script install_mysql.sh, which prevents the build to succeed

/bin/bash: ./scripts/docker/install_mysql.sh: Permission denied
The command '/bin/bash -o pipefail -e -u -x -c ./scripts/docker/install_mysql.sh dev' returned a non-zero code: 126
##[error]The command '/bin/bash -o pipefail -e -u -x -c ./scripts/docker/install_mysql.sh dev' returned a non-zero code: 126
##[error]The process '/usr/bin/docker' failed with exit code 126
2020-12-10 21:01:49 +01:00
Ash Berlin-Taylor 63ea88d1b1
Apply labels to Docker images in a single instruction (#12931)
* Apply labels to Docker images in a single instruction

While looking at the build logs for something else I noticed this
oddity at the end of the CI logs:

```
Tue, 08 Dec 2020 21:20:19 GMT Step 125/135 : LABEL org.apache.airflow.distro="debian"
...
Tue, 08 Dec 2020 21:21:14 GMT Step 133/135 : LABEL org.apache.airflow.commitSha=${COMMIT_SHA}
Tue, 08 Dec 2020 21:21:14 GMT  ---> Running in 1241a5f6cdb7
Tue, 08 Dec 2020 21:21:21 GMT Removing intermediate container 1241a5f6cdb7
```

Applying all the labels took 1m2s! Hopefully applying these in a single
layer/command should speed things up.

A less extreme example still took 43s

```
Tue, 08 Dec 2020 20:44:40 GMT Step 125/135 : LABEL org.apache.airflow.distro="debian"
...
Tue, 08 Dec 2020 20:45:18 GMT Step 133/135 : LABEL org.apache.airflow.commitSha=${COMMIT_SHA}
Tue, 08 Dec 2020 20:45:18 GMT  ---> Running in dc601207dbcb
Tue, 08 Dec 2020 20:45:23 GMT Removing intermediate container dc601207dbcb
Tue, 08 Dec 2020 20:45:23 GMT  ---> 5aae5dd0f702
```

* Update Dockerfile
2020-12-09 06:19:38 +01:00
Jarek Potiuk ed1825c026
Production images on CI are now built from packages (#12685)
So far, the production images of Airflow were using sources
when they were built on CI. This PR changes that, to build
airflow + providers packages first and install them
rather than use sources as installation mechanism.

Part of #12261
2020-12-06 23:36:33 +01:00
Jarek Potiuk 0451d84ea2
Pins PIP to 20.2.4 in our Dockerfiles (#12738)
Until we make sure that the new resolver in PIP 20.3 works
we should pin PIP to 20.2.4.

This is hopefully a temporary measure.

Part of #12737
2020-12-01 17:39:55 +01:00
Kaxil Naik c457c975b8
Use AIRFLOW_CONSTRAINTS_LOCATION when passed during docker build (#12604)
Previously, even though this was passed during docker build it was
ignored. This commit fixes it
2020-11-25 08:43:47 +01:00
Jarek Potiuk 37548f09ac
Fixes unneeded docker-context-files added in CI (#12534)
We do not need to add docker-context-files in CI before we run
first "cache" PIP installation. Adding it might cause the effect
that the cache will always be invalidated in case someone has
a file added there before building and pushing the image.

This PR fixes the problem by adding docker-context files later
in the Dockerfile and changing the constraints location
used in the "cache" step to always use the github constraints in
this case.

Closes #12509
2020-11-21 19:21:43 +01:00
Jarek Potiuk 53e5d8f1f2
The .pypirc file is read from docker-context-files (#11779)
If you used context from git repo, the .piprc file was missing and
COPY in Dockerfile is not conditional.

This change copies the .pypirc conditionally from the
docker-context-files folder instead.

Also it was needlessly copied in the main image where it is not
needed and it was even dangerous to do so.
2020-10-23 17:55:15 +02:00
Jarek Potiuk eba1d91b35
Fixes ROVIDERS -> PROVIDERS typo in Dockerfile (#11738)
There was a typo in the original file when review was made in
the #11529 but apparently this typo was still left in one place
and as the result, providers have not been installed in the
master Dockerfile.

Fixes #11695
2020-10-22 11:02:14 +02:00
John Bampton 172820db4d
Fix case of GitHub (#11398) 2020-10-21 14:32:41 +02:00
Jarek Potiuk 925f7619e1
Behaviour to install all airflow providers added (#11529)
In Airflow 2.0 we decided to split Airlow into separate providers.
this means that when you prepare core airflow package, providers
are not installed by default. This is not very convenient for
local development though and for docker images built from sources,
where you would like to install all providers by default.

A new INSTALL_ALL_AIRFLOW_PROVIDERS environment variable controls
this behaviour now. It is is set to "true", all packages including
provider packages are installed. If missing or set to false, only
the core provider package is installed.

For Breeze, the default is set to "true", as for those cases you
want to install all providers in your environment. Similarly if you
build the production image from sources. However when you build
image using github tag or pip package, you should specify
appropriate extras to install the required provider packages.

Note that if you install Airflow via 'pip install .' from sources
in local virtualenv, provider packages are not going to be
installed unless you set INSTALL_ALL_AIRFLOW_PROVIDERS to "true".

Fixes #11489
2020-10-17 11:16:28 +02:00
Jarek Potiuk e7dc964619
Adds capability of installing wheel packages in CI image (#11527)
The production image had the capability of installing images from
wheels (for security teams/air-gaped systems). This capability
might also be useful when building CI image espeically when
we are installing separately core and providers packages and
we do not yet have provider packages available in PyPI.

This is an intermediate step to implement #11490
2020-10-15 15:19:18 +02:00
Jarek Potiuk 45d33dbd43
Add capability of customising PyPI sources (#11385)
* Add capability of customising PyPI sources

This change adds capability of customising installation of PyPI
modules via custom .pypirc file. This might allow to install
dependencies from in-house, vetted registry of PyPI
2020-10-11 06:19:57 +02:00
Jarek Potiuk 04973904c3
Constraints and PIP packages can be installed from local sources (#11382)
* Constraints and PIP packages can be installed from local sources

This is the final part of implementing #11171 based on feedback
from enterprise customers we worked with. They want to have
a capability of building the image using binary wheel packages
that are locally available and the official Dockerfile. This means
that besides the official APT sources the Dockerfile build should
not needd GitHub, nor any other external files pulled from outside
including PIP repository.

This change also includes documentation on how to prepare set of
such binaries ready for inspection and review by security teams
in Enterprise environment. Such sets of "known-working-binary-whl"
files can then be separately committed, tracked and scrutinized
in an artifact repository of such an Enterprise.

Fixes: #11171

* Update docs/production-deployment.rst
2020-10-10 12:58:09 +02:00
Jarek Potiuk ebd7150862
More customizable build process for Docker images (#11176)
* Allows more customizations for image building.

This is the third (and not last) part of making the Production
image more corporate-environment friendly. It's been prepared
for the request of one of the big Airflow user (company) that
has rather strict security requirements when it comes to
preparing and building images. They are committed to
synchronizing with the progress of Apache Airflow 2.0 development
and making the image customizable so that they can build it using
only sources controlled by them internally was one of the important
requirements for them.

This change adds the possibilty of customizing various steps in
the build process:

* adding custom scripts to be run before installation of both
  build image and runtime image. This allows for example to
  add installing custom GPG keys, and adding custom sources.

* customizing the way NodeJS and Yarn are installed in the
  build image segment - as they might rely on their own way
  of installation.

* adding extra packages to be installed during both build and
  dev segment build steps. This is crucial to achieve the same
  size optimizations as the original image.

* defining additional environment variables (for example
  environment variables that indicate acceptance of the EULAs
  in case of installing proprietary packages that require
  EULA acceptance - both in the build image and runtime image
  (again the goal is to keep the image optimized for size)

The image build process remains the same when no customization
options are specified, but having those options increases
flexibility of the image build process in corporate environments.

This is part of #11171.

This change also fixes some of the issues opened and raised by
other users of the Dockerfile.

Fixes: #10730
Fixes: #10555
Fixes: #10856

Input from those issues has been taken into account when this
change was designed so that the cases described in those issues
could be implemented. Example from one of the issue landed as
an example way of building highly customized Airflow Image
using those customization options.

Depends on #11174

* Update IMAGES.rst

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-09-29 15:30:00 +02:00
Jarek Potiuk 044b441257
Conditional MySQL Client installation (#11174)
This is the second step of making the Production Docker Image more
corporate-environment friendly, by making MySQL client installation
optional. Instaling MySQL Client on Debian requires to reach out
to oracle deb repositories which might not be approved by security
teams when you build the images. Also not everyone needs MySQL
client or might want to install their own MySQL client or MariaDB
client - from their own repositories.

This change makes the installation step separated out to
script (with prod/dev installation option). The prod/dev separation
is needed because MySQL needs to be installed with dev libraries
in the "Build" segment of the image (requiring build essentials
etc.) but in "Final" segment of the image only runtime libraries
are needed.

Part of #11171

Depends on #11173.
2020-09-27 18:56:58 +02:00
Jarek Potiuk f16354bc02
Optionally disables PIP cache from GitHub during the build (#11173)
This is first step of implementing the corporate-environment
friendly way of building images, where in the corporate
environment, this might not be possible to install the packages
using the GitHub cache initially.

Part of #11171
2020-09-27 18:00:03 +02:00
Jarek Potiuk 4a46f4368b
Allows to build production images for 1.10.2 and 1.10.1 Airflow (#10983)
Airflow below 1.10.2 required SLUGIFY_USES_TEXT_UNIDECODE env
variable to be set to yes.

Our production Dockerfile and Breeze supports building images
for any version of airflow >= 1.10.1 but it failed on
1.10.2 and 1.10.1 because this variable was not set.

You can now set the variable when building image manually
and Breeze does it automatically if image is 1.10.1 or 1.10.2

Fixes #10974
2020-09-17 14:25:34 +02:00
Jarek Potiuk d9920faa80
The entrypoints in Docker Image should be owned by Airflow (#10853)
Since we are running the airflow image as airflow user, the
entrypoint and clear-logs scripts should also be set as airflow.

This had no impact if you actually run this as root user or
when your group was root (which was recommended).
2020-09-12 10:54:25 +02:00
Jarek Potiuk 018ae0ed95
The PIP version is not pinned to 19.0.2 any more (#10542)
Fixes #10516
2020-08-25 15:45:59 +02:00
Jarek Potiuk 1cf1af664f
Do not override in_container scripts when building the image (#10442)
After #10368, we've changed the way we build the images
on CI. We are overriding the ci scripts that we use
to build the image with the scripts taken from master
to not give roque PR authors the possibiility to run
something with the write credentials.

We should not override the in_container scripts, however
because they become part of the image, so we should use
those that came with the PR. That's why we have to move
the "in_container" scripts out of the "ci" folder and
only override the "ci" folder with the one from
master. We've made sure that those scripts in ci
are self-contained and they do not need reach outside of
that folder.

Also the static checks are done with local files mounted
on CI because we want to check all the files - not only
those that are embedded in the container.
2020-08-21 17:21:57 +02:00
Jarek Potiuk e17985382c
Kubernetes image is extended rather than customized (#10399)
The EMBEDDED dags were only really useful for testing
but it required to customise built production image
(run with extra --build-arg flag). This is not needed
as it is better to extend the image instead with FROM
and add dags afterwards. This way you do not have
to rebuild the image while iterating on it.
2020-08-19 14:19:05 +02:00
Jarek Potiuk 306a6660fd
Docker images are now consistently labelled and a bit smaller (#10387)
Extracted from #10368
2020-08-19 02:03:22 +02:00
Jarek Potiuk de9eaeb434
Constraint files are now maintained automatically (#9889)
* Constraint files are now maintained automatically

* No need to generate requirements when setup.py changes
* requirements are kept in separate orphan branches not in main repo
* merges to master verify if latest requirements are working and
  push tested requirements to orphaned branches
* we keep history of requirement changes and can label them
  individually for each version (by constraint-1.10.n tag name)
* consistently changed all references to be 'constraints' not
  'requirements'
2020-07-20 14:36:03 +02:00