Граф коммитов

89 Коммитов

Автор SHA1 Сообщение Дата
Jarek Potiuk 45d33dbd43
Add capability of customising PyPI sources (#11385)
* Add capability of customising PyPI sources

This change adds capability of customising installation of PyPI
modules via custom .pypirc file. This might allow to install
dependencies from in-house, vetted registry of PyPI
2020-10-11 06:19:57 +02:00
Jarek Potiuk 04973904c3
Constraints and PIP packages can be installed from local sources (#11382)
* Constraints and PIP packages can be installed from local sources

This is the final part of implementing #11171 based on feedback
from enterprise customers we worked with. They want to have
a capability of building the image using binary wheel packages
that are locally available and the official Dockerfile. This means
that besides the official APT sources the Dockerfile build should
not needd GitHub, nor any other external files pulled from outside
including PIP repository.

This change also includes documentation on how to prepare set of
such binaries ready for inspection and review by security teams
in Enterprise environment. Such sets of "known-working-binary-whl"
files can then be separately committed, tracked and scrutinized
in an artifact repository of such an Enterprise.

Fixes: #11171

* Update docs/production-deployment.rst
2020-10-10 12:58:09 +02:00
Jarek Potiuk ebd7150862
More customizable build process for Docker images (#11176)
* Allows more customizations for image building.

This is the third (and not last) part of making the Production
image more corporate-environment friendly. It's been prepared
for the request of one of the big Airflow user (company) that
has rather strict security requirements when it comes to
preparing and building images. They are committed to
synchronizing with the progress of Apache Airflow 2.0 development
and making the image customizable so that they can build it using
only sources controlled by them internally was one of the important
requirements for them.

This change adds the possibilty of customizing various steps in
the build process:

* adding custom scripts to be run before installation of both
  build image and runtime image. This allows for example to
  add installing custom GPG keys, and adding custom sources.

* customizing the way NodeJS and Yarn are installed in the
  build image segment - as they might rely on their own way
  of installation.

* adding extra packages to be installed during both build and
  dev segment build steps. This is crucial to achieve the same
  size optimizations as the original image.

* defining additional environment variables (for example
  environment variables that indicate acceptance of the EULAs
  in case of installing proprietary packages that require
  EULA acceptance - both in the build image and runtime image
  (again the goal is to keep the image optimized for size)

The image build process remains the same when no customization
options are specified, but having those options increases
flexibility of the image build process in corporate environments.

This is part of #11171.

This change also fixes some of the issues opened and raised by
other users of the Dockerfile.

Fixes: #10730
Fixes: #10555
Fixes: #10856

Input from those issues has been taken into account when this
change was designed so that the cases described in those issues
could be implemented. Example from one of the issue landed as
an example way of building highly customized Airflow Image
using those customization options.

Depends on #11174

* Update IMAGES.rst

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-09-29 15:30:00 +02:00
Jarek Potiuk 044b441257
Conditional MySQL Client installation (#11174)
This is the second step of making the Production Docker Image more
corporate-environment friendly, by making MySQL client installation
optional. Instaling MySQL Client on Debian requires to reach out
to oracle deb repositories which might not be approved by security
teams when you build the images. Also not everyone needs MySQL
client or might want to install their own MySQL client or MariaDB
client - from their own repositories.

This change makes the installation step separated out to
script (with prod/dev installation option). The prod/dev separation
is needed because MySQL needs to be installed with dev libraries
in the "Build" segment of the image (requiring build essentials
etc.) but in "Final" segment of the image only runtime libraries
are needed.

Part of #11171

Depends on #11173.
2020-09-27 18:56:58 +02:00
Jarek Potiuk f16354bc02
Optionally disables PIP cache from GitHub during the build (#11173)
This is first step of implementing the corporate-environment
friendly way of building images, where in the corporate
environment, this might not be possible to install the packages
using the GitHub cache initially.

Part of #11171
2020-09-27 18:00:03 +02:00
Jarek Potiuk 4a46f4368b
Allows to build production images for 1.10.2 and 1.10.1 Airflow (#10983)
Airflow below 1.10.2 required SLUGIFY_USES_TEXT_UNIDECODE env
variable to be set to yes.

Our production Dockerfile and Breeze supports building images
for any version of airflow >= 1.10.1 but it failed on
1.10.2 and 1.10.1 because this variable was not set.

You can now set the variable when building image manually
and Breeze does it automatically if image is 1.10.1 or 1.10.2

Fixes #10974
2020-09-17 14:25:34 +02:00
Jarek Potiuk d9920faa80
The entrypoints in Docker Image should be owned by Airflow (#10853)
Since we are running the airflow image as airflow user, the
entrypoint and clear-logs scripts should also be set as airflow.

This had no impact if you actually run this as root user or
when your group was root (which was recommended).
2020-09-12 10:54:25 +02:00
Jarek Potiuk 018ae0ed95
The PIP version is not pinned to 19.0.2 any more (#10542)
Fixes #10516
2020-08-25 15:45:59 +02:00
Jarek Potiuk 1cf1af664f
Do not override in_container scripts when building the image (#10442)
After #10368, we've changed the way we build the images
on CI. We are overriding the ci scripts that we use
to build the image with the scripts taken from master
to not give roque PR authors the possibiility to run
something with the write credentials.

We should not override the in_container scripts, however
because they become part of the image, so we should use
those that came with the PR. That's why we have to move
the "in_container" scripts out of the "ci" folder and
only override the "ci" folder with the one from
master. We've made sure that those scripts in ci
are self-contained and they do not need reach outside of
that folder.

Also the static checks are done with local files mounted
on CI because we want to check all the files - not only
those that are embedded in the container.
2020-08-21 17:21:57 +02:00
Jarek Potiuk e17985382c
Kubernetes image is extended rather than customized (#10399)
The EMBEDDED dags were only really useful for testing
but it required to customise built production image
(run with extra --build-arg flag). This is not needed
as it is better to extend the image instead with FROM
and add dags afterwards. This way you do not have
to rebuild the image while iterating on it.
2020-08-19 14:19:05 +02:00
Jarek Potiuk 306a6660fd
Docker images are now consistently labelled and a bit smaller (#10387)
Extracted from #10368
2020-08-19 02:03:22 +02:00
Jarek Potiuk de9eaeb434
Constraint files are now maintained automatically (#9889)
* Constraint files are now maintained automatically

* No need to generate requirements when setup.py changes
* requirements are kept in separate orphan branches not in main repo
* merges to master verify if latest requirements are working and
  push tested requirements to orphaned branches
* we keep history of requirement changes and can label them
  individually for each version (by constraint-1.10.n tag name)
* consistently changed all references to be 'constraints' not
  'requirements'
2020-07-20 14:36:03 +02:00
Jarek Potiuk 593a0ddaae
Remove package.json and yarn.lock from the prod image (#9814)
Closes #9810
2020-07-14 16:34:21 +02:00
Jarek Potiuk 8f6b8378aa
The group of embedded DAGs should be root to be OpenShift compatible (#9794) 2020-07-13 20:47:55 +02:00
Jarek Potiuk 8bd15ef634
Switches to Helm Chart for Kubernetes tests (#9468)
The Kubernetes tests are now run using Helm chart
rather than the custom templates we used to have.

The Helm Chart uses locally build production image
so the tests are testing not only Airflow but also
Helm Chart and a Production image - all at the
same time. Later on we will add more tests
covering more functionalities of both Helm Chart
and Production Image. This is the first step to
get all of those bundle together and become
testable.

This change introduces also 'shell' sub-command
for Breeze's kind-cluster command and
EMBEDDED_DAGS build args for production image -
both of them useful to run the Kubernetes tests
more easily - without building two images
and with an easy-to-iterate-over-tests
shell command - which works without any
other development environment.

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Co-authored-by: Daniel Imberman <daniel@astronomer.io>
2020-07-01 14:50:30 +02:00
Jarek Potiuk cf510a30fb
Make Production Dockerfile OpenShift-compatible (#9545)
OpenShift (and other Kubernetes platforms) often use the approach
that they start containers with random user and root group. This is
described in the https://docs.openshift.com/container-platform/3.7/creating_images/guidelines.html

All the files created by the "airflow" user are now belonging to
'root' group and the root group has the same access to those
files as the Airflow user.

Additionally, the random user gets automatically added
/etc/passwd entry which is name 'default'. The name of the user
can be set by setting the USER_NAME variable when starting the
container.

Closes #9248
Closes #8706
2020-06-27 14:29:55 +02:00
Jarek Potiuk 2cf167b047
Gunicorn works better if temporary folder uses tmpfs (#9534)
This is discussed in the documentation of gunicorn.
You can find more information here: https://docs.gunicorn.org/en/stable/faq.html#how-do-i-avoid-gunicorn-excessively-blocking-in-os-fchmod

Since we are using docker, we always have shared memory
available (at least 64MB).

Closes #9379
2020-06-26 16:41:21 +02:00
Jarek Potiuk 7c12a9d4e0
Improve production image iteration speed (#9162)
For a long time the way how entrypoint worked in ci scripts
was wrong. The way it worked was convoluted and short of black
magic. This did not allow to pass multiple test targets and
required separate execute command scripts in Breeze.

This is all now straightened out and both production and
CI image are always using the right entrypoint by default
and we can simply pass parameters to the image as usual without
escaping strings.

This also allowed to remove some breeze commands and
change names of several flags in Breeze to make them more
meaningful.

Both CI and PROD image have now embedded scripts for log
cleaning.

History of image releases is added for 1.10.10-*
alpha quality images.
2020-06-16 12:36:46 +02:00
zikun 82c8343ab6
Support additional apt dependencies (#9189)
* Add ADDITONAL_DEV_DEPS and ADDITONAL_RUNTIME_DEPS

* Add examples for additional apt dev and runtime dependencies

* Update comment

* Fix typo
2020-06-09 23:05:43 +02:00
Jarek Potiuk 738667082d
Additional python extras and deps can be set in breeze (#9035)
Closes #8604
Closes #8866
2020-05-27 17:09:11 +02:00
Fabian 5a7a3d13ee
Add ADDITIONAL_AIRFLOW_EXTRAS (#9032)
* Add build-arg ADDITIONAL_AIRFLOW_EXTRAS

* Add ADDITIONAL_AIRFLOW_EXTRAS example and description
2020-05-27 12:58:59 +02:00
Fabian 6fc555d0bc Add ADDITIONAL_PYTHON_DEPS (#9031)
* add build-arg ADDITIONAL_PYTHON_DEPS

* Add ADDITIONAL_PYTHON_DEPS example and description

Co-authored-by: Fabian Witt <fabian.witt@redheads.de>
2020-05-27 11:52:26 +02:00
Jarek Potiuk 064cb67ae5
Pin Hadolint to version released 2020.04.20 (#8485) 2020-04-21 13:33:11 +02:00
Kaxil Naik 6c5fba2570
Remove duplicate dependency ('curl') from Dockerfile (#8412) 2020-04-17 16:09:20 +01:00
Hao Liang bc230a9711
Fix subcommand error when running production image without argument (#8415)
Co-authored-by: Liang Hao <liahao@tesla.com>
2020-04-17 14:47:23 +02:00
Daniel Imberman baa61c9c84
Add migration waiting script and log cleaner (#8219)
* Add migration waiting script and log cleaner

This PR creates a "migration spinner" that allows the webserver to wait for all database migrations to complete before starting up. Is a necessary component before we can merge the helm chart.

* Update airflow/cli/cli_parser.py

Co-Authored-By: Tomek Urbaszek <turbaszek@gmail.com>

Co-authored-by: Tomek Urbaszek <tomasz.urbaszek@polidea.com>
Co-authored-by: Tomek Urbaszek <turbaszek@gmail.com>
2020-04-16 00:12:41 -07:00
Felix Uellendall cf6c254ebd
Expose Airflow Webserver Port in Production Docker Image (#8228) 2020-04-15 13:05:02 +02:00
Jarek Potiuk 4e8a979d03
Docker image build include now releses 1.10.10 version (#8234)
It also installs properly on Mac as well as it auto-detects
if yarn prod is needed - based on presence of proper
package.json in either www or www_rbac which makes it simpler
for remote installations.
2020-04-10 15:07:15 +02:00
Jarek Potiuk 07fd0d71c8
Add Production Docker image support (#7832) 2020-04-02 18:52:11 +01:00
Jarek Potiuk 210de87d6d
Move Dockerfile to Dockerfile.ci (#7829) 2020-03-23 08:56:26 +01:00
Kamil Breguła 8465d66f05
Remove airflow.bin package (#7808) 2020-03-22 22:01:06 +01:00
Jarek Potiuk 8c5638832f
[AIRFLOW-7067] Pinned version of Apache Airflow (#7730) 2020-03-22 13:34:48 +01:00
Jarek Potiuk cabd684b46
[AIRFLOW-7097] Install gcloud beta componensts in CI image (#7772) 2020-03-21 15:45:10 +01:00
Jarek Potiuk dced43bee9
[AIRFLOW-6946] Switch to MySQL 5.7 in 2.0 as base (#7570)
Switch to MySQL 5.7 in tests.

Fixes the utf8mb4 encoding issue where utf8mb4 encoding
produces too long keys for mysql to handle in XCom table.

You can optionally specify a separate option to set
encoding differently for the columns that are part of the
index - dag_id, task_id and key.
2020-03-14 22:24:03 +01:00
Ash Berlin-Taylor ef71ac6a22
[AIRFLOW-7029] Use separate docker image for running license check (#7678)
Each stage of the CI tests needs to pull our `ci` image. By removing
java from it we can save 1-2minutes from each test stage. This is part
of that work.
2020-03-13 18:54:22 +00:00
Jarek Potiuk cad20c28da
[AIRFLOW-5842] Swtch to Debian buster image as a base (#7647) 2020-03-07 20:20:05 +01:00
Kamil Breguła 609707eddf
[AIRFLOW-6967] Add tests to avoid performance regression in DagFileProcessor (#7602) 2020-03-07 19:13:00 +01:00
Vanessasaurus 0bb687990b
[AIRFLOW-4030] second attempt to add singularity to airflow (#7191)
* adding singularity operator and tests

Signed-off-by: Vanessa Sochat <vsochat@stanford.edu>

* removing encoding pragmas and fixing up dockerfile to pass linting

Signed-off-by: Vanessa Sochat <vsochat@stanford.edu>

* make workdir in /tmp because AIRFLOW_SOURCES not defined yet

Signed-off-by: Vanessa Sochat <vsochat@stanford.edu>

* curl needs to follow redirects with -L

Signed-off-by: Vanessa Sochat <vsochat@stanford.edu>

* moving files to where they are supposed to be, more changes to mock, no clue

Signed-off-by: vsoch <vsochat@stanford.edu>

* removing trailing whitespace, moving example_dag for singularity, adding licenses to empty init files

Signed-off-by: vsoch <vsochat@stanford.edu>

* ran isort on example dags file

Signed-off-by: vsoch <vsochat@stanford.edu>

* adding missing init in example_dags folder for singularity

Signed-off-by: vsoch <vsochat@stanford.edu>

* removing code from __init__.py files for singularity operator to fix documentation generation

Signed-off-by: vsoch <vsochat@stanford.edu>

* forgot to update link to singularity in operators and hooks ref

Signed-off-by: vsoch <vsochat@stanford.edu>

* command must have been provided on init of singularity operator instance

Signed-off-by: vsoch <vsochat@stanford.edu>

* I guess I'm required to have a task_id?

Signed-off-by: vsoch <vsochat@stanford.edu>

* try adding working_dir to singularity operator type definitions

Signed-off-by: vsoch <vsochat@stanford.edu>

* disable too many arguments for pylint of singularity operator init

Signed-off-by: vsoch <vsochat@stanford.edu>

* move pylint disable up to line 64 - doesnt catch at end of statement like other examples

Signed-off-by: vsoch <vsochat@stanford.edu>

* two spaces before inline comment

Signed-off-by: vsoch <vsochat@stanford.edu>

* I dont see task_id as a param for other providers, removing for singularity operator

Signed-off-by: vsoch <vsochat@stanford.edu>

* adding debug print

Signed-off-by: vsoch <vsochat@stanford.edu>

* allow for return of just image and/or lines

Signed-off-by: vsoch <vsochat@stanford.edu>

* dont understand how mock works, but the image should exist after its pulled....

Signed-off-by: vsoch <vsochat@stanford.edu>

* try removing shutil, the client should handle pull folder instead

Signed-off-by: vsoch <vsochat@stanford.edu>

* try changing pull-file to same uri that is expected to be pulled

Signed-off-by: vsoch <vsochat@stanford.edu>

* import of AirflowException moved to exceptions

Signed-off-by: vsoch <vsochat@stanford.edu>

* DAG module was moved to airflow.models

Signed-off-by: vsoch <vsochat@stanford.edu>

* ensure pull is called with pull_folder

Signed-off-by: vsoch <vsochat@stanford.edu>
2020-02-23 10:49:47 +01:00
Kamil Breguła 175a160463
[AIRFLOW-6828] Stop using the zope library (#7448) 2020-02-19 13:08:48 +01:00
Ash Berlin-Taylor cec9249f90
[AIRFLOW-6818] Prevent Docker cache-busting on when editing www templates (#7432)
There is two parts to this PR:

1. Only copying www/webpack.config.js and www/static/ before running the
   asset pipeline
2. Making sure that _all_ files (not just the critical ones) have the
   same permissions.
2020-02-16 17:58:46 +01:00
Jarek Potiuk 627365ab74 Revert "[AIRFLOW-XXXX] Prevent Docker cache-busting on when editing www templates (#7427)"
This reverts commit 3eb30ed12c.
2020-02-16 10:10:37 +01:00
Ash Berlin-Taylor 3eb30ed12c
[AIRFLOW-XXXX] Prevent Docker cache-busting on when editing www templates (#7427)
There is two parts to this PR:

1. Only copying www/webpack.config.js and www/static/ before running the
   asset pipeline
2. Making sure that _all_ files (not just the critical ones) have the
   same permissions.

The goal of both of these is to make sure that the docker build cache for the "expensive"
operations (installing NPM modules, running asset pipeline, installing python modules)
isn't run when it isn't necessary.
2020-02-15 20:24:35 +00:00
Jarek Potiuk 945b988cc2
[AIRFLOW-6662] install dumb init (#7300)
* Revert "[AIRFLOW-6662] Switch to --init docker flag for signal propagation (#7278)"

This reverts commit d1bf343ffe.

* [AIRFLOW-6662] return back the dumb-init - installed by apt

We had stability problems with tests with --init flag so we are
going back to it
2020-02-02 11:13:04 +01:00
Jarek Potiuk d7d2794d05
[AIRFLOW-6701] Rat is downloaded from stable backup/mirrors (#7323)
Also curl options are now using long format and include --fail
to protect against some temporary errors (5xx). Also RAT download
uses now two possible sources of downloads and fallbacks to the
second if first is not available.
2020-02-02 11:11:38 +01:00
Jarek Potiuk d1bf343ffe
[AIRFLOW-6662] Switch to --init docker flag for signal propagation (#7278)
We are now using native --init flag of docker run and init: parameter
of docker compose to pass signals and reap child processes
2020-01-29 14:07:34 +01:00
dstandish 2a819b11fb [AIRFLOW-6296] add OdbcHook & deprecation warning for pymssql (#6850) 2020-01-19 07:54:56 +01:00
Jarek Potiuk 73403cc8f4
[AIRFLOW-5704] Improve Kind Kubernetes scripts for local testing (#6516)
* Fixed problem that Kubernetes tests were testing latest master
  rather than what came from the local sources.
* Kind (Kubernetes in Dcocker) is run in the same Docker as Breeze env
* Moved Kubernetes scripts to 'in_container' dir where they belong now
* Kubernetes cluster is reused until it is stopped
* Kubernetes image is build from image in docker already + mounted sources
* Kubectl version name is corrected in the Dockerfile
* KUBERNETES_VERSION can now be used to select Kubernetes version
* Running kubernetes scripts is now easy in Breeze
* We can start/recreate/stop cluster using  --<ACTION>-kind-cluster
* Instructions on how to run Kubernetes tests are updated
* The old "bare" environment is replaced by --no-deps switch
2020-01-11 16:25:19 +01:00
Kamil Breguła 9fce4eca4e [AIRFLOW-6470] Avoid pipe to file when do curl (#7063) 2020-01-05 19:00:02 +01:00
Kamil Breguła 5ae2f968e5 [AIRFLOW-6462] Limit exported variables in Dockerfile/Breeze (#7057) 2020-01-05 17:11:34 +01:00
Kamil Breguła 4c0cbe5843 [AIRFLOW-6465] Add bash autocomplete for airflow in Breeze (#7060) 2020-01-05 10:22:07 +01:00