* Add capability of customising PyPI sources
This change adds capability of customising installation of PyPI
modules via custom .pypirc file. This might allow to install
dependencies from in-house, vetted registry of PyPI
* Constraints and PIP packages can be installed from local sources
This is the final part of implementing #11171 based on feedback
from enterprise customers we worked with. They want to have
a capability of building the image using binary wheel packages
that are locally available and the official Dockerfile. This means
that besides the official APT sources the Dockerfile build should
not needd GitHub, nor any other external files pulled from outside
including PIP repository.
This change also includes documentation on how to prepare set of
such binaries ready for inspection and review by security teams
in Enterprise environment. Such sets of "known-working-binary-whl"
files can then be separately committed, tracked and scrutinized
in an artifact repository of such an Enterprise.
Fixes: #11171
* Update docs/production-deployment.rst
* Allows more customizations for image building.
This is the third (and not last) part of making the Production
image more corporate-environment friendly. It's been prepared
for the request of one of the big Airflow user (company) that
has rather strict security requirements when it comes to
preparing and building images. They are committed to
synchronizing with the progress of Apache Airflow 2.0 development
and making the image customizable so that they can build it using
only sources controlled by them internally was one of the important
requirements for them.
This change adds the possibilty of customizing various steps in
the build process:
* adding custom scripts to be run before installation of both
build image and runtime image. This allows for example to
add installing custom GPG keys, and adding custom sources.
* customizing the way NodeJS and Yarn are installed in the
build image segment - as they might rely on their own way
of installation.
* adding extra packages to be installed during both build and
dev segment build steps. This is crucial to achieve the same
size optimizations as the original image.
* defining additional environment variables (for example
environment variables that indicate acceptance of the EULAs
in case of installing proprietary packages that require
EULA acceptance - both in the build image and runtime image
(again the goal is to keep the image optimized for size)
The image build process remains the same when no customization
options are specified, but having those options increases
flexibility of the image build process in corporate environments.
This is part of #11171.
This change also fixes some of the issues opened and raised by
other users of the Dockerfile.
Fixes: #10730Fixes: #10555Fixes: #10856
Input from those issues has been taken into account when this
change was designed so that the cases described in those issues
could be implemented. Example from one of the issue landed as
an example way of building highly customized Airflow Image
using those customization options.
Depends on #11174
* Update IMAGES.rst
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
This is the second step of making the Production Docker Image more
corporate-environment friendly, by making MySQL client installation
optional. Instaling MySQL Client on Debian requires to reach out
to oracle deb repositories which might not be approved by security
teams when you build the images. Also not everyone needs MySQL
client or might want to install their own MySQL client or MariaDB
client - from their own repositories.
This change makes the installation step separated out to
script (with prod/dev installation option). The prod/dev separation
is needed because MySQL needs to be installed with dev libraries
in the "Build" segment of the image (requiring build essentials
etc.) but in "Final" segment of the image only runtime libraries
are needed.
Part of #11171
Depends on #11173.
This is first step of implementing the corporate-environment
friendly way of building images, where in the corporate
environment, this might not be possible to install the packages
using the GitHub cache initially.
Part of #11171
Airflow below 1.10.2 required SLUGIFY_USES_TEXT_UNIDECODE env
variable to be set to yes.
Our production Dockerfile and Breeze supports building images
for any version of airflow >= 1.10.1 but it failed on
1.10.2 and 1.10.1 because this variable was not set.
You can now set the variable when building image manually
and Breeze does it automatically if image is 1.10.1 or 1.10.2
Fixes#10974
Since we are running the airflow image as airflow user, the
entrypoint and clear-logs scripts should also be set as airflow.
This had no impact if you actually run this as root user or
when your group was root (which was recommended).
After #10368, we've changed the way we build the images
on CI. We are overriding the ci scripts that we use
to build the image with the scripts taken from master
to not give roque PR authors the possibiility to run
something with the write credentials.
We should not override the in_container scripts, however
because they become part of the image, so we should use
those that came with the PR. That's why we have to move
the "in_container" scripts out of the "ci" folder and
only override the "ci" folder with the one from
master. We've made sure that those scripts in ci
are self-contained and they do not need reach outside of
that folder.
Also the static checks are done with local files mounted
on CI because we want to check all the files - not only
those that are embedded in the container.
The EMBEDDED dags were only really useful for testing
but it required to customise built production image
(run with extra --build-arg flag). This is not needed
as it is better to extend the image instead with FROM
and add dags afterwards. This way you do not have
to rebuild the image while iterating on it.
* Constraint files are now maintained automatically
* No need to generate requirements when setup.py changes
* requirements are kept in separate orphan branches not in main repo
* merges to master verify if latest requirements are working and
push tested requirements to orphaned branches
* we keep history of requirement changes and can label them
individually for each version (by constraint-1.10.n tag name)
* consistently changed all references to be 'constraints' not
'requirements'
The Kubernetes tests are now run using Helm chart
rather than the custom templates we used to have.
The Helm Chart uses locally build production image
so the tests are testing not only Airflow but also
Helm Chart and a Production image - all at the
same time. Later on we will add more tests
covering more functionalities of both Helm Chart
and Production Image. This is the first step to
get all of those bundle together and become
testable.
This change introduces also 'shell' sub-command
for Breeze's kind-cluster command and
EMBEDDED_DAGS build args for production image -
both of them useful to run the Kubernetes tests
more easily - without building two images
and with an easy-to-iterate-over-tests
shell command - which works without any
other development environment.
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Co-authored-by: Daniel Imberman <daniel@astronomer.io>
OpenShift (and other Kubernetes platforms) often use the approach
that they start containers with random user and root group. This is
described in the https://docs.openshift.com/container-platform/3.7/creating_images/guidelines.html
All the files created by the "airflow" user are now belonging to
'root' group and the root group has the same access to those
files as the Airflow user.
Additionally, the random user gets automatically added
/etc/passwd entry which is name 'default'. The name of the user
can be set by setting the USER_NAME variable when starting the
container.
Closes#9248Closes#8706
For a long time the way how entrypoint worked in ci scripts
was wrong. The way it worked was convoluted and short of black
magic. This did not allow to pass multiple test targets and
required separate execute command scripts in Breeze.
This is all now straightened out and both production and
CI image are always using the right entrypoint by default
and we can simply pass parameters to the image as usual without
escaping strings.
This also allowed to remove some breeze commands and
change names of several flags in Breeze to make them more
meaningful.
Both CI and PROD image have now embedded scripts for log
cleaning.
History of image releases is added for 1.10.10-*
alpha quality images.
* Add migration waiting script and log cleaner
This PR creates a "migration spinner" that allows the webserver to wait for all database migrations to complete before starting up. Is a necessary component before we can merge the helm chart.
* Update airflow/cli/cli_parser.py
Co-Authored-By: Tomek Urbaszek <turbaszek@gmail.com>
Co-authored-by: Tomek Urbaszek <tomasz.urbaszek@polidea.com>
Co-authored-by: Tomek Urbaszek <turbaszek@gmail.com>
It also installs properly on Mac as well as it auto-detects
if yarn prod is needed - based on presence of proper
package.json in either www or www_rbac which makes it simpler
for remote installations.
Switch to MySQL 5.7 in tests.
Fixes the utf8mb4 encoding issue where utf8mb4 encoding
produces too long keys for mysql to handle in XCom table.
You can optionally specify a separate option to set
encoding differently for the columns that are part of the
index - dag_id, task_id and key.
Each stage of the CI tests needs to pull our `ci` image. By removing
java from it we can save 1-2minutes from each test stage. This is part
of that work.
* adding singularity operator and tests
Signed-off-by: Vanessa Sochat <vsochat@stanford.edu>
* removing encoding pragmas and fixing up dockerfile to pass linting
Signed-off-by: Vanessa Sochat <vsochat@stanford.edu>
* make workdir in /tmp because AIRFLOW_SOURCES not defined yet
Signed-off-by: Vanessa Sochat <vsochat@stanford.edu>
* curl needs to follow redirects with -L
Signed-off-by: Vanessa Sochat <vsochat@stanford.edu>
* moving files to where they are supposed to be, more changes to mock, no clue
Signed-off-by: vsoch <vsochat@stanford.edu>
* removing trailing whitespace, moving example_dag for singularity, adding licenses to empty init files
Signed-off-by: vsoch <vsochat@stanford.edu>
* ran isort on example dags file
Signed-off-by: vsoch <vsochat@stanford.edu>
* adding missing init in example_dags folder for singularity
Signed-off-by: vsoch <vsochat@stanford.edu>
* removing code from __init__.py files for singularity operator to fix documentation generation
Signed-off-by: vsoch <vsochat@stanford.edu>
* forgot to update link to singularity in operators and hooks ref
Signed-off-by: vsoch <vsochat@stanford.edu>
* command must have been provided on init of singularity operator instance
Signed-off-by: vsoch <vsochat@stanford.edu>
* I guess I'm required to have a task_id?
Signed-off-by: vsoch <vsochat@stanford.edu>
* try adding working_dir to singularity operator type definitions
Signed-off-by: vsoch <vsochat@stanford.edu>
* disable too many arguments for pylint of singularity operator init
Signed-off-by: vsoch <vsochat@stanford.edu>
* move pylint disable up to line 64 - doesnt catch at end of statement like other examples
Signed-off-by: vsoch <vsochat@stanford.edu>
* two spaces before inline comment
Signed-off-by: vsoch <vsochat@stanford.edu>
* I dont see task_id as a param for other providers, removing for singularity operator
Signed-off-by: vsoch <vsochat@stanford.edu>
* adding debug print
Signed-off-by: vsoch <vsochat@stanford.edu>
* allow for return of just image and/or lines
Signed-off-by: vsoch <vsochat@stanford.edu>
* dont understand how mock works, but the image should exist after its pulled....
Signed-off-by: vsoch <vsochat@stanford.edu>
* try removing shutil, the client should handle pull folder instead
Signed-off-by: vsoch <vsochat@stanford.edu>
* try changing pull-file to same uri that is expected to be pulled
Signed-off-by: vsoch <vsochat@stanford.edu>
* import of AirflowException moved to exceptions
Signed-off-by: vsoch <vsochat@stanford.edu>
* DAG module was moved to airflow.models
Signed-off-by: vsoch <vsochat@stanford.edu>
* ensure pull is called with pull_folder
Signed-off-by: vsoch <vsochat@stanford.edu>
There is two parts to this PR:
1. Only copying www/webpack.config.js and www/static/ before running the
asset pipeline
2. Making sure that _all_ files (not just the critical ones) have the
same permissions.
There is two parts to this PR:
1. Only copying www/webpack.config.js and www/static/ before running the
asset pipeline
2. Making sure that _all_ files (not just the critical ones) have the
same permissions.
The goal of both of these is to make sure that the docker build cache for the "expensive"
operations (installing NPM modules, running asset pipeline, installing python modules)
isn't run when it isn't necessary.
* Revert "[AIRFLOW-6662] Switch to --init docker flag for signal propagation (#7278)"
This reverts commit d1bf343ffe.
* [AIRFLOW-6662] return back the dumb-init - installed by apt
We had stability problems with tests with --init flag so we are
going back to it
Also curl options are now using long format and include --fail
to protect against some temporary errors (5xx). Also RAT download
uses now two possible sources of downloads and fallbacks to the
second if first is not available.
* Fixed problem that Kubernetes tests were testing latest master
rather than what came from the local sources.
* Kind (Kubernetes in Dcocker) is run in the same Docker as Breeze env
* Moved Kubernetes scripts to 'in_container' dir where they belong now
* Kubernetes cluster is reused until it is stopped
* Kubernetes image is build from image in docker already + mounted sources
* Kubectl version name is corrected in the Dockerfile
* KUBERNETES_VERSION can now be used to select Kubernetes version
* Running kubernetes scripts is now easy in Breeze
* We can start/recreate/stop cluster using --<ACTION>-kind-cluster
* Instructions on how to run Kubernetes tests are updated
* The old "bare" environment is replaced by --no-deps switch