The K9s is fantastic tool that helps to debug a running k8s
instance. It is terminal-based windowed CLI that makes you
several times more productive comparing to using kubectl
commands. We've integrated k9s (it is run as a docker container
and downloaded on demand). We've also separated out KUBECONFIG
of the integrated kind cluster so that it does not mess with
kubernetes configuration you might already have.
Also - together with that the "surrounding" of the kubernetes
tests were simplified and improved so that the k9s integration
can be utilized well. Instead of kubectl port forwarding (which
caused multitude of problems) we are now utilizing kind's
portMapping feature + custom NodePort resource that maps
port 8080 to 30007 NodePort which in turn maps it to 8080
port of the Webserver. This way we do not have to establish
an external kubectl port forward which is prone to error and
management - everything is brought up when Airflow gets
deployed to the Kind Cluster and shuts down when the Kind
cluster is stopped.
Yet another problem fixed was killing of postgres by one of the
kubernetes tests ('test_integration_run_dag_with_scheduler_failure').
Instead of just killing the scheduler it killed all pods - including
the Postgres one (it was named 'airflow-postgres.*'). That caused
various problems, as the database could be left in a strange state.
I changed the tests to do what it claimed was doing - so killing only the
scheduler during the test. This seemed to improve the stability
of tests immensely in my local setup.
* Providers in extras are properly configured and verified
This fixes#12255 - where we published beta2 release with some
extras pulling non-existing providers.
The exact list of providers that had problems:
Wrongly named extras/providers:
* apache.presto: it was badly named -> renamed to 'presto'
* spark (badly pointing to spark instead of apache.spark)
* yandexcloud (the name remains there but we've also added 'yandex' extra to correspond 1-1 with 'yandex' provider
Extras that were wrongly marked as having providers, where they had
none:
* dask
* rabbitmq
* sentry
* statsd
* tableau
* virtualenv
* Update scripts/ci/pre_commit/pre_commit_check_extras_have_providers.py
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
* Update scripts/ci/pre_commit/pre_commit_check_extras_have_providers.py
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Core example DAGs should not depend on any non-core dependency
like providers packages.
closes: #12247
Co-authored-by: Xiaodong DENG <xd.deng.r@gmail.com>
The change #10806 made airflow works with implicit packages
when "airflow" got imported. This is a good change, however
it has some unforeseen consequences. The 'provider_packages'
script copy all the providers code for backports in order
to refactor them to the empty "airflow" directory in
provider_packages folder. The #10806 change turned that
empty folder in 'airflow' package because it was in the
same directory as the provider_packages scripts.
Moving the scripts to dev solves this problem.
Seems that postgres is really stable when it comes to upgrades,
so we take the assumption that if we test 9.6 and 13, and they
work, all the versions between will also work.
This PR changes Postgres 10 to 13 in tests and updates documentation
with all the versions in between.
When we prepare pre-release versions, they are not intended to be
converted to final release versions, so there is no need to replace
version number for them artificially,
For release candidates on the other hand, we should internally use the
"final" version because those packages might be simply renamed to the
final "production" versions.
Fixes#11585
So far breeze used in-container data for persisting it (mysql redis,
postgres). This means that the data was kept as long, as long the
containers were running. If you stopped Breeze via `stop` command
the data was always deleted.
This changes the behaviour - each of the Breeze containers has
a named volume where data is kept. Those volumes are also deleted
by default when Breeze is stopped, but you can choose to preserve
them by adding ``--preserve-volumes`` when you run ``stop`` or
``restart`` command.
Fixes: #11625
In Airflow 2.0 we decided to split Airlow into separate providers.
this means that when you prepare core airflow package, providers
are not installed by default. This is not very convenient for
local development though and for docker images built from sources,
where you would like to install all providers by default.
A new INSTALL_ALL_AIRFLOW_PROVIDERS environment variable controls
this behaviour now. It is is set to "true", all packages including
provider packages are installed. If missing or set to false, only
the core provider package is installed.
For Breeze, the default is set to "true", as for those cases you
want to install all providers in your environment. Similarly if you
build the production image from sources. However when you build
image using github tag or pip package, you should specify
appropriate extras to install the required provider packages.
Note that if you install Airflow via 'pip install .' from sources
in local virtualenv, provider packages are not going to be
installed unless you set INSTALL_ALL_AIRFLOW_PROVIDERS to "true".
Fixes#11489
The production image had the capability of installing images from
wheels (for security teams/air-gaped systems). This capability
might also be useful when building CI image espeically when
we are installing separately core and providers packages and
we do not yet have provider packages available in PyPI.
This is an intermediate step to implement #11490
In preparation for adding provider packages to 2.0 line we
are renaming backport packages to provider packages.
We want to implement this in stages - first to rename the
packages, then split-out backport/2.0 providers as part of
the #11421 issue.
We seem to have a problem with running all tests at once - most
likely due to some resource problems in our CI, therefore it makes
sense to split the tests into more batches. This is not yet full
implementation of selective tests but it is going in this direction
by splitting to Core/Providers/API/CLI tests. The full selective
tests approach will be implemented as part of #10507 issue.
This split is possible thanks to #10422 which moved building image
to a separate workflow - this way each image is only built once
and it is uploaded to a shared registry, where it is quickly
downloaded from rather than built by all the jobs separately - this
way we can have many more jobs as there is very little per-job
overhead before the tests start runnning.
* Constraints and PIP packages can be installed from local sources
This is the final part of implementing #11171 based on feedback
from enterprise customers we worked with. They want to have
a capability of building the image using binary wheel packages
that are locally available and the official Dockerfile. This means
that besides the official APT sources the Dockerfile build should
not needd GitHub, nor any other external files pulled from outside
including PIP repository.
This change also includes documentation on how to prepare set of
such binaries ready for inspection and review by security teams
in Enterprise environment. Such sets of "known-working-binary-whl"
files can then be separately committed, tracked and scrutinized
in an artifact repository of such an Enterprise.
Fixes: #11171
* Update docs/production-deployment.rst
If this flag is specified it will look for wheel packages placed in dist
folder and it will install the wheels from there after installing
Airflow. This is useful for testing backport packages as well as in the
future for testing provider packages for 2.0.
When installing airflow 1.10 via breeze we now enable rbac
by default, but we can disable it with --no-rbac-ui flag.
This is useful to test different variants of 1.10 when testing
release candidataes in connection with the 'start-airflow'
command.
* Allows more customizations for image building.
This is the third (and not last) part of making the Production
image more corporate-environment friendly. It's been prepared
for the request of one of the big Airflow user (company) that
has rather strict security requirements when it comes to
preparing and building images. They are committed to
synchronizing with the progress of Apache Airflow 2.0 development
and making the image customizable so that they can build it using
only sources controlled by them internally was one of the important
requirements for them.
This change adds the possibilty of customizing various steps in
the build process:
* adding custom scripts to be run before installation of both
build image and runtime image. This allows for example to
add installing custom GPG keys, and adding custom sources.
* customizing the way NodeJS and Yarn are installed in the
build image segment - as they might rely on their own way
of installation.
* adding extra packages to be installed during both build and
dev segment build steps. This is crucial to achieve the same
size optimizations as the original image.
* defining additional environment variables (for example
environment variables that indicate acceptance of the EULAs
in case of installing proprietary packages that require
EULA acceptance - both in the build image and runtime image
(again the goal is to keep the image optimized for size)
The image build process remains the same when no customization
options are specified, but having those options increases
flexibility of the image build process in corporate environments.
This is part of #11171.
This change also fixes some of the issues opened and raised by
other users of the Dockerfile.
Fixes: #10730Fixes: #10555Fixes: #10856
Input from those issues has been taken into account when this
change was designed so that the cases described in those issues
could be implemented. Example from one of the issue landed as
an example way of building highly customized Airflow Image
using those customization options.
Depends on #11174
* Update IMAGES.rst
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
Breeze tags the image based on the default python version,
branch, type of the image, but you might want to tag the image
in the same command especially in automated cases of building
the image via CI scripts or security teams that tag the imge
based on external factors (build time, person etc.).
This is part of #11171 which makes the image easier to build in
corporate environments.
This is the second step of making the Production Docker Image more
corporate-environment friendly, by making MySQL client installation
optional. Instaling MySQL Client on Debian requires to reach out
to oracle deb repositories which might not be approved by security
teams when you build the images. Also not everyone needs MySQL
client or might want to install their own MySQL client or MariaDB
client - from their own repositories.
This change makes the installation step separated out to
script (with prod/dev installation option). The prod/dev separation
is needed because MySQL needs to be installed with dev libraries
in the "Build" segment of the image (requiring build essentials
etc.) but in "Final" segment of the image only runtime libraries
are needed.
Part of #11171
Depends on #11173.
This is first step of implementing the corporate-environment
friendly way of building images, where in the corporate
environment, this might not be possible to install the packages
using the GitHub cache initially.
Part of #11171
GitHub Actions allow to use `fromJson` method to read arrays
or even more complex json objects into the CI workflow yaml files.
This, connected with set::output commands, allows to read the
list of allowed versions as well as default ones from the
environment variables configured in
./scripts/ci/libraries/initialization.sh
This means that we can have one plece in which versions are
configured. We also need to do it in "breeze-complete" as this is
a standalone script that should not source anything we added
BATS tests to verify if the versions in breeze-complete
correspond with those defined in the initialization.sh
Also we do not limit tests any more in regular PRs now - we run
all combinations of available versions. Our tests run quite a
bit faster now so we should be able to run more complete
matrixes. We can still exclude individual values of the matrixes
if this is too much.
MySQL 8 is disabled from breeze for now. I plan a separate follow
up PR where we will run MySQL 8 tests (they were not run so far)
Relative and absolute imports are functionally equivalent, the only
pratical difference is that relative is shorter.
But it is also less obvious what exactly is imported, and harder to find
such imports with simple tools (such as grep).
Thus we have decided that Airflow house style is to use absolute imports
only