Граф коммитов

53 Коммитов

Автор SHA1 Сообщение Дата
jao6693 d84faa36a0
Update Dockerfile.ci (#12988)
Fix permission issue in Azure DevOps when running the script install_mysql.sh, which prevents the build to succeed

/bin/bash: ./scripts/docker/install_mysql.sh: Permission denied
The command '/bin/bash -o pipefail -e -u -x -c ./scripts/docker/install_mysql.sh dev' returned a non-zero code: 126
##[error]The command '/bin/bash -o pipefail -e -u -x -c ./scripts/docker/install_mysql.sh dev' returned a non-zero code: 126
##[error]The process '/usr/bin/docker' failed with exit code 126
2020-12-10 21:00:41 +01:00
Ash Berlin-Taylor 63ea88d1b1
Apply labels to Docker images in a single instruction (#12931)
* Apply labels to Docker images in a single instruction

While looking at the build logs for something else I noticed this
oddity at the end of the CI logs:

```
Tue, 08 Dec 2020 21:20:19 GMT Step 125/135 : LABEL org.apache.airflow.distro="debian"
...
Tue, 08 Dec 2020 21:21:14 GMT Step 133/135 : LABEL org.apache.airflow.commitSha=${COMMIT_SHA}
Tue, 08 Dec 2020 21:21:14 GMT  ---> Running in 1241a5f6cdb7
Tue, 08 Dec 2020 21:21:21 GMT Removing intermediate container 1241a5f6cdb7
```

Applying all the labels took 1m2s! Hopefully applying these in a single
layer/command should speed things up.

A less extreme example still took 43s

```
Tue, 08 Dec 2020 20:44:40 GMT Step 125/135 : LABEL org.apache.airflow.distro="debian"
...
Tue, 08 Dec 2020 20:45:18 GMT Step 133/135 : LABEL org.apache.airflow.commitSha=${COMMIT_SHA}
Tue, 08 Dec 2020 20:45:18 GMT  ---> Running in dc601207dbcb
Tue, 08 Dec 2020 20:45:23 GMT Removing intermediate container dc601207dbcb
Tue, 08 Dec 2020 20:45:23 GMT  ---> 5aae5dd0f702
```

* Update Dockerfile
2020-12-09 06:19:38 +01:00
Jarek Potiuk ed1825c026
Production images on CI are now built from packages (#12685)
So far, the production images of Airflow were using sources
when they were built on CI. This PR changes that, to build
airflow + providers packages first and install them
rather than use sources as installation mechanism.

Part of #12261
2020-12-06 23:36:33 +01:00
Ash Berlin-Taylor 2936c13a44
Get airflow version from importlib.metadata rather than hard-coding (#12786)
One less thing to change, and one less pre-commit step needed :)
2020-12-04 16:42:25 +00:00
Jarek Potiuk 0451d84ea2
Pins PIP to 20.2.4 in our Dockerfiles (#12738)
Until we make sure that the new resolver in PIP 20.3 works
we should pin PIP to 20.2.4.

This is hopefully a temporary measure.

Part of #12737
2020-12-01 17:39:55 +01:00
Jarek Potiuk fa8af2d165
Enable PIP check for both CI and PROD image (#12664)
This PR enables PIP check after constraints have been updated
to be stable and 'pip check' compliant in #12636
2020-11-27 21:33:50 +01:00
Kaxil Naik c457c975b8
Use AIRFLOW_CONSTRAINTS_LOCATION when passed during docker build (#12604)
Previously, even though this was passed during docker build it was
ignored. This commit fixes it
2020-11-25 08:43:47 +01:00
Jarek Potiuk 37548f09ac
Fixes unneeded docker-context-files added in CI (#12534)
We do not need to add docker-context-files in CI before we run
first "cache" PIP installation. Adding it might cause the effect
that the cache will always be invalidated in case someone has
a file added there before building and pushing the image.

This PR fixes the problem by adding docker-context files later
in the Dockerfile and changing the constraints location
used in the "cache" step to always use the github constraints in
this case.

Closes #12509
2020-11-21 19:21:43 +01:00
Jarek Potiuk 167b9b9889
Simplifies check whether the CI image should be rebuilt (#12181)
Rather than counting changed layers in the image (which was
enigmatic, difficult and prone to some magic number) we rely now
on random file generated while building the image.

We are using the docker image caching mechanism here. The random
file will be regenerated only when the previous layer (which is
about installling Airflow dependencies for the first time) gets
rebuild. And for us this is the indication, that the building
the image will take quite some time. This layer should be
relatively static - even if setup.py changes the CI image is
designed in the way that the first time installation of Airflow
dependencies is not invalidated.

This should lead to faster and less frequent rebuild for people
using Breeze and static checks.
2020-11-13 22:21:39 +01:00
Jarek Potiuk 9b7e7603c4
Docker context files should be available earlier (#12219)
If you want to override constraints with local version,
the docker-context-files should be earlier in the Dockerfile
2020-11-11 11:00:16 +01:00
Daniel Imberman 0d1ad6648e
Add Python Helm testing framework (#11693)
* Helm Python Testing

* helm change

* add back args
2020-10-27 18:29:47 -07:00
Ash Berlin-Taylor d5bfffca81
Use packaging.version, not semver module for version comparisons (#11854)
Semver module doesn't like python version specifiers such as `0.0.2a1`
-- since packaging module is already a dep from setup tools, and is what
the python ecosystem uses to do version handling it makes sense to use
it.
2020-10-26 15:13:09 +00:00
Jarek Potiuk 925f7619e1
Behaviour to install all airflow providers added (#11529)
In Airflow 2.0 we decided to split Airlow into separate providers.
this means that when you prepare core airflow package, providers
are not installed by default. This is not very convenient for
local development though and for docker images built from sources,
where you would like to install all providers by default.

A new INSTALL_ALL_AIRFLOW_PROVIDERS environment variable controls
this behaviour now. It is is set to "true", all packages including
provider packages are installed. If missing or set to false, only
the core provider package is installed.

For Breeze, the default is set to "true", as for those cases you
want to install all providers in your environment. Similarly if you
build the production image from sources. However when you build
image using github tag or pip package, you should specify
appropriate extras to install the required provider packages.

Note that if you install Airflow via 'pip install .' from sources
in local virtualenv, provider packages are not going to be
installed unless you set INSTALL_ALL_AIRFLOW_PROVIDERS to "true".

Fixes #11489
2020-10-17 11:16:28 +02:00
Jarek Potiuk e7dc964619
Adds capability of installing wheel packages in CI image (#11527)
The production image had the capability of installing images from
wheels (for security teams/air-gaped systems). This capability
might also be useful when building CI image espeically when
we are installing separately core and providers packages and
we do not yet have provider packages available in PyPI.

This is an intermediate step to implement #11490
2020-10-15 15:19:18 +02:00
Jarek Potiuk 16e7129719
Added support for provider packages for Airflow 2.0 (#11487)
* Separate changes/readmes for backport and regular providers

We have now separate release notes for backport provider
packages and regular provider packages.

They have different versioning - backport provider
packages with CALVER, regular provider packages with
semver.

* Added support for provider packages for Airflow 2.0

This change consists of the following changes:

* adds provider package support for 2.0
* adds generation of package readme and change notes
* versions are for now hard-coded to 0.0.1 for first release
* adds automated tests for installation of the packages
* rename backport package readmes/changes to BACKPORT_*
* adds regulaar packge readmes/changes
* updates documentation on generating the provider packaes
* adds CI tests for the packages
* maintains backport packages generation with --backports flag

Fixes #11421
Fixes #11424
2020-10-13 16:33:00 +01:00
Kaxil Naik 7f674c685d
Use only-if-needed upgrade strategy for PRs (#11363)
Currently, upgrading dependencies in setup.py still runs with previous versions of the package for the PR which fails.

This will change to upgrade only the package that is required for the PRs
2020-10-09 09:57:51 +02:00
Jarek Potiuk e89d384688
The bats script for CI image is now placed in the docker folder (#11262)
The script was previously placed in scripts/ci which caused
a bit of a problem in 1-10-test branch where PRs were using
scripts/ci from the v1-10-test HEAD but they were missing
the ci script from the PR.

The scripts "ci" are parts of the host scripts that are
always taken from master when the image is built, but
all the other stuff should be taken from "docker"
folder - which will be taken from the PR.
2020-10-04 08:30:11 +02:00
Jarek Potiuk ebd7150862
More customizable build process for Docker images (#11176)
* Allows more customizations for image building.

This is the third (and not last) part of making the Production
image more corporate-environment friendly. It's been prepared
for the request of one of the big Airflow user (company) that
has rather strict security requirements when it comes to
preparing and building images. They are committed to
synchronizing with the progress of Apache Airflow 2.0 development
and making the image customizable so that they can build it using
only sources controlled by them internally was one of the important
requirements for them.

This change adds the possibilty of customizing various steps in
the build process:

* adding custom scripts to be run before installation of both
  build image and runtime image. This allows for example to
  add installing custom GPG keys, and adding custom sources.

* customizing the way NodeJS and Yarn are installed in the
  build image segment - as they might rely on their own way
  of installation.

* adding extra packages to be installed during both build and
  dev segment build steps. This is crucial to achieve the same
  size optimizations as the original image.

* defining additional environment variables (for example
  environment variables that indicate acceptance of the EULAs
  in case of installing proprietary packages that require
  EULA acceptance - both in the build image and runtime image
  (again the goal is to keep the image optimized for size)

The image build process remains the same when no customization
options are specified, but having those options increases
flexibility of the image build process in corporate environments.

This is part of #11171.

This change also fixes some of the issues opened and raised by
other users of the Dockerfile.

Fixes: #10730
Fixes: #10555
Fixes: #10856

Input from those issues has been taken into account when this
change was designed so that the cases described in those issues
could be implemented. Example from one of the issue landed as
an example way of building highly customized Airflow Image
using those customization options.

Depends on #11174

* Update IMAGES.rst

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-09-29 15:30:00 +02:00
Omair Khan 68e0eb6976
in_container bats pre-commit hook and updated bats-tests hook (#11179) 2020-09-29 11:59:06 +02:00
Kaxil Naik 2ec12474ff
Fix typos in Dockerfile.ci (#11187)
Fixed some spellings
2020-09-29 07:41:05 +02:00
Jarek Potiuk 044b441257
Conditional MySQL Client installation (#11174)
This is the second step of making the Production Docker Image more
corporate-environment friendly, by making MySQL client installation
optional. Instaling MySQL Client on Debian requires to reach out
to oracle deb repositories which might not be approved by security
teams when you build the images. Also not everyone needs MySQL
client or might want to install their own MySQL client or MariaDB
client - from their own repositories.

This change makes the installation step separated out to
script (with prod/dev installation option). The prod/dev separation
is needed because MySQL needs to be installed with dev libraries
in the "Build" segment of the image (requiring build essentials
etc.) but in "Final" segment of the image only runtime libraries
are needed.

Part of #11171

Depends on #11173.
2020-09-27 18:56:58 +02:00
Jarek Potiuk f16354bc02
Optionally disables PIP cache from GitHub during the build (#11173)
This is first step of implementing the corporate-environment
friendly way of building images, where in the corporate
environment, this might not be possible to install the packages
using the GitHub cache initially.

Part of #11171
2020-09-27 18:00:03 +02:00
Jarek Potiuk 52fdb62314
Requirements might get upgraded without setup.py change (#10784)
I noticed that when there is no setup.py changes, the constraints
are not upgraded automatically. This is because of the docker
caching strategy used - it simply does not even know that the
upgrade of pip should happen.

I believe this is really good (from security and incremental updates
POV to attempt to upgrade at every successfull merge (not that
the upgrade will not be committed if any of the tests fail and this
is only happening on every merge to master or scheduled run.

This way we will have more often but smaller constraint changes.

Depends on #10828
2020-09-22 16:22:53 +02:00
Jarek Potiuk 018ae0ed95
The PIP version is not pinned to 19.0.2 any more (#10542)
Fixes #10516
2020-08-25 15:45:59 +02:00
Jarek Potiuk 1cf1af664f
Do not override in_container scripts when building the image (#10442)
After #10368, we've changed the way we build the images
on CI. We are overriding the ci scripts that we use
to build the image with the scripts taken from master
to not give roque PR authors the possibiility to run
something with the write credentials.

We should not override the in_container scripts, however
because they become part of the image, so we should use
those that came with the PR. That's why we have to move
the "in_container" scripts out of the "ci" folder and
only override the "ci" folder with the one from
master. We've made sure that those scripts in ci
are self-contained and they do not need reach outside of
that folder.

Also the static checks are done with local files mounted
on CI because we want to check all the files - not only
those that are embedded in the container.
2020-08-21 17:21:57 +02:00
Jarek Potiuk 306a6660fd
Docker images are now consistently labelled and a bit smaller (#10387)
Extracted from #10368
2020-08-19 02:03:22 +02:00
David Cavaletto f6734b3b85
Enable Sphinx spellcheck for doc generation (#10280) 2020-08-12 21:30:37 +01:00
Jarek Potiuk de9eaeb434
Constraint files are now maintained automatically (#9889)
* Constraint files are now maintained automatically

* No need to generate requirements when setup.py changes
* requirements are kept in separate orphan branches not in main repo
* merges to master verify if latest requirements are working and
  push tested requirements to orphaned branches
* we keep history of requirement changes and can label them
  individually for each version (by constraint-1.10.n tag name)
* consistently changed all references to be 'constraints' not
  'requirements'
2020-07-20 14:36:03 +02:00
Jarek Potiuk 5805a36368
Fix SqlAlchemy-Flask failure with python 3.8.4 (#9821) 2020-07-14 21:28:42 +01:00
Jarek Potiuk ca88151887
Fix in-breeze CLI tools to work also on Linux (#9376)
Instead of creating the links in the image (which did not work)
the links are created now at the entry to the breeze image.
The wrappers were not installed via Dockerfile and the ownership
fixing did not work on Linux
2020-06-19 08:58:32 +02:00
Jarek Potiuk 4fefaf78a2
Fixed crashing webserver after /tmp is mounted from the host (#9378)
The bug was introduced in f17a02d330

Gunicorn uses a lot of os.fchmod in /tmp directory and it can create some
excessive blocking in os.fchmod
https://docs.gunicorn.org/en/stable/faq.html#how-do-i-avoid-gunicorn-excessively-blocking-in-os-fchmod

We want to switch to use /dev/shm in prod image (shared memory) to make
blocking go away and make independent on the docker filesystem used (osxfs has
problems with os.fchmod and use permissions as well).

Use case / motivation

Avoiding contention might be useful = in production image.

This can be done with:

GUNICORN_CMD_ARGS="--worker-tmp-dir /dev/shm"
2020-06-18 16:58:08 +02:00
Kamil Breguła 6a9c436f6f
Move out metastore_browser from airflow.contrib (#9341) 2020-06-17 21:52:23 +02:00
Jarek Potiuk 7c12a9d4e0
Improve production image iteration speed (#9162)
For a long time the way how entrypoint worked in ci scripts
was wrong. The way it worked was convoluted and short of black
magic. This did not allow to pass multiple test targets and
required separate execute command scripts in Breeze.

This is all now straightened out and both production and
CI image are always using the right entrypoint by default
and we can simply pass parameters to the image as usual without
escaping strings.

This also allowed to remove some breeze commands and
change names of several flags in Breeze to make them more
meaningful.

Both CI and PROD image have now embedded scripts for log
cleaning.

History of image releases is added for 1.10.10-*
alpha quality images.
2020-06-16 12:36:46 +02:00
Jarek Potiuk 696e74594f
Fix broken CI image optimisation (#9313)
The commit 5918efc86a broke
optimisation of the CI image - using the Apache Airflow
master branch as a base package installation source from PyPI.

This commit restores it including removal of the
obsolete CI_OPTIMISED arg - as now we have a separate
production and CI image and CI image is by default
CI_OPTIMISED
2020-06-16 00:38:55 +01:00
Kamil Breguła f17a02d330
Add generic CLI tool wrapper (#9223)
* Add generic  CLI tool wrapper

* Pas working directory to container

* Share namespaces between all containers

* Fix permissions hack

* Unify code style

Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>

* Detect standalone execution by checking symboli link

* User friendly error message when env var is missing

* Display error to stderr

* Display errors on stderr

* Fix permission hack

* Fix condition in if

* Fix missing env-file

* TEST: Install airflow without copying ssources

* Update scripts/ci/in_container/run_prepare_backport_readme.sh

Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>
2020-06-11 18:50:31 +02:00
zikun 82c8343ab6
Support additional apt dependencies (#9189)
* Add ADDITONAL_DEV_DEPS and ADDITONAL_RUNTIME_DEPS

* Add examples for additional apt dev and runtime dependencies

* Update comment

* Fix typo
2020-06-09 23:05:43 +02:00
Jarek Potiuk de9d3401f9
Improved cloud tool available in the trimmed down CI container (#9167)
* Improved cloud tool available in the trimmed down CI container

The tools now have shebangs which make them available for
python tools. Also /opt/airflow is now mounted from the
host Airflow sources which makes it possible for the tools to
copy files directly to/from the sources of Airflow.

It also contains one small change for Linux users - the files
created by docker gcloud are created with root user so in order to fix
that the directories mounted from the host are fixed when you exit
the tool - their ownership is changed to be owned by the host user
2020-06-09 09:33:16 +02:00
James Timmins 5918efc86a
Add 3.8 to the test matrices (#8836) 2020-06-05 18:39:28 +01:00
Jarek Potiuk a39e9a3520
Replaces cloud-provider CLIs in CI image with scripts running containers (#9129)
The clis are replaced with scripts that will pull and run
docker images when they are needed.

Added Azure CLI as well.

Closes: #8946 #8947 #8785
2020-06-04 19:12:09 +02:00
Jarek Potiuk 46fee77156
Use static binary linked docker client in CI image (#9126) 2020-06-04 15:24:51 +02:00
Jarek Potiuk ff5dcccbbd
Kubernetes Cluster is started on host not in the container (#8265)
Tests requiring Kubernetes Cluster are now moved out of
the regular CI tests and moved to "kubernetes_tests" folder
so that they can be run entirely on host without having
the CI image built at all. They use production image
to run the tests on KinD cluster and we add tooling
to start/stop/deploy the application to the KinD cluster
automatically - for both CI testing and local development.

This is a pre-requisite to convert the tests to convert the
tests to use the official Helm Chart and Docker images or
Apache Airflow.

It closes #8782
2020-06-03 20:58:38 +02:00
James Timmins 10796cb7ce
Remove Hive/Hadoop/Java dependency from unit tests (#9029) 2020-06-03 12:49:27 +01:00
Jarek Potiuk 738667082d
Additional python extras and deps can be set in breeze (#9035)
Closes #8604
Closes #8866
2020-05-27 17:09:11 +02:00
Ash Berlin-Taylor 47413d98f0
Remove singularity from CI images (#8945)
The singularity operator tests _have always_ used mocking, so we were
adding 700MB to our docker image for nothing.

Fixes #8774
2020-05-21 12:12:03 +01:00
Ash Berlin-Taylor 8476c1e387
Hive/Hadoop minicluster needs JDK8 and JAVA_HOME to work (#8938)
Debian Buster only ships with a JDK11, and Hive/Hadoop fails in odd,
hard to debug ways (complains about metastore not being initalized,
possibly related to the class loader issues.)

Until we rip Hive out from the CI (replacing it with Hadoop in a seprate
integration, only on for some builds) we'll have to stick with JRE8

Our previous approach of installing openjdk-8 from Sid/Unstable started
failing as Debian Sid has a new (and conflicting) version of GCC/libc.
The adoptopenjdk package archive is designed for Buster so should be
more resilient
2020-05-21 07:19:49 +02:00
Ash Berlin-Taylor fef00e5a06
Use Debian's provided JRE from Buster (#8919)
Installing the JDK (not even the JRE) from Sid is starting to break on
Buster as the versions of packages conflict:

> The following packages have unmet dependencies:
> libgcc-8-dev : Depends: gcc-8-base (= 8.4.0-4) but 8.3.0-6 is to be installed
>                Depends: libmpx2 (>= 8.4.0-4) but 8.3.0-6 is to be installed

This changes our CI docker images to:

1. Not install something from Sid (unstable, packages change/get
   updated) when we are using Buster (stable, only security fixes).
2. Installed the JRE, not the JDK. We don't need to compile Java code.
2020-05-20 14:18:59 +01:00
Jarek Potiuk 2121f494c3
Avoid failure on transient requirements in CI image (#8892)
When you build from the scratch and some transient requirements
fail, the initial step of installation might fail.

We are now using latest valid constraints from the DEFAULT_BRANCH
branch to avoid it.
2020-05-17 22:41:48 +02:00
Felix Uellendall 2878f17630
Relax Flask-Appbuilder version to ~=2.3.4 (#8857)
"Bump jQuery to 3.5" was reverted. And so we can upgrade and remove email_validator dependency
See also: https://github.com/dpgaspar/Flask-AppBuilder/blob/master/CHANGELOG.rst#improvements-and-bug-fixes-on-234
2020-05-13 19:42:51 +01:00
Jarek Potiuk d15839de0c
Latest debian-buster release broke image build (#8758) 2020-05-07 08:25:50 +02:00
Jarek Potiuk 45c8983306
Less aggressive eager upgrade of requirements (#8267)
With this change requirements are only eagerly upgraded when
generating requirements when setup.py changes. They are also
eagerly upgraded when you run ./breeze generate-requirements
locally. Still the cron job will use the eager update mechanism
when building the docker image which means that CRON jobs will
still detect cases where upgrede of requirements causes failure
either at the installation time or during tests.
2020-04-13 18:50:46 +02:00