This commit unifies the mechanism of rendering output of tabular
data. This gives users a possibility to eiter display a tabular
representation of data or render it as valid json or yaml payload.
Closes: #12699
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Previously the output of instaling remaining packges when testing
provider imports was only shown on error. However it is useful
to know what's going on even if it clutters the log.
Note that this installation is only needed until we include
apache-beam in the installed packages on CI.
Related to #12703
This PR shows the output always .
* Adds support for Hook discovery from providers
This PR extends providers discovery with the mechanism
of retrieving mapping of connections from type to hook.
Fixes#12456
* fixup! Adds support for Hook discovery from providers
* fixup! fixup! Adds support for Hook discovery from providers
This change upgrades setup.py and setup.cfg to provide non-conflicting
`pip check` valid set of constraints for CI image.
Fixes#10854
Co-authored-by: Tomek Urbaszek <turbaszek@apache.org>
Co-authored-by: Tomek Urbaszek <turbaszek@apache.org>
This PR implements discovering and readin provider information from
packages (using entry_points) and - if found - from local
provider yaml files for the built-in airflow providers,
when they are found in the airflow.provider packages.
The provider.yaml files - if found - take precedence over the
package-provided ones.
Add displaying provider information in CLI
Closes: #12470
The images that are build on github can be used to reproduce
the test errors in CI - they should then be mounted without
local sources. However in some cases when you are dealing with
dependencies for example, it is useful to be able to mount the
sources.
This PR makes it possible.
You can now set a label on PR that will force upgrading to latest
dependencies in your PR. If committer sets an
"upgrade to latest dependencies" label, it will cause the PR
to upgrade all dependencies to latest versions of dependencies
matching setup.py + setup.cfg configuration.
Due to a bug in Breeze initialization code, we were always running
against Postgres 9.6 and MySQL 5.7, even when the matrix selected
something else.
(We were overwriting the POSTGRES_VERSION and MYSQL_VERSION environment
variables in initialization code)
* Fix Connection.description migration for MySQL8
Due to not executing MySQL8 tests Fixed in #12591 added
description for connection table was not compatible with
MySQL8 with utf8mb4 character set.
This change adds migration and fixes the previous migration
to make it compatible.
* Fixes inconsistent setting of encoding on Mysql 5.7/8
We missed that when we added support
for differnet mysql versions in #7717 when we removed default
character set setting for the database server.
This change forces the default on database server to be
utf8mb4 - regardless if MySQL 5.7 or MySQL8 is used.
Utf8mb4 is default for MySQL8 but latin1 is default fo MySQL 5.7.
There was a suspected root cause of the problem:
https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html
where mysql client falls back to the default collation if
the client8 is used with 5.7 database, but this should be
no problem if the default DB character set is forced to be
utf8mb4
This PR restores forcing the server-side encoding.
From Airflow 2.0, `max_threads` config under `[scheduler]` section has been renamed to `parsing_processes`.
This is to align the name with the actual code where the Scheduler launches the number of processes defined by
`[scheduler] parsing_processes` to Parse DAG files, calculates next DagRun date for each DAG,
serialize them and store them in the DB.
We do not need to add docker-context-files in CI before we run
first "cache" PIP installation. Adding it might cause the effect
that the cache will always be invalidated in case someone has
a file added there before building and pushing the image.
This PR fixes the problem by adding docker-context files later
in the Dockerfile and changing the constraints location
used in the "cache" step to always use the github constraints in
this case.
Closes#12509
yI've moved all the ones that are "static" -- any form of dynamic or
interpolated values are left in setup.py
If a value is passed as n kwrg to setup and in setup.cfg, the kwarg
wins out.
The ./build/bin content only depends on the version of tools used
(helm//kind/kubectl) and it does not depend on setup.py nor
setup.cfg
* Make the KubernetesPodOperator backwards compatible
This PR significantly reduces the pain of upgrading to Airflow 2.0
for users of the KubernetesPodOperator. Users will be allowed to
continue using the airflow.kubernetes custom classes
* spellcheck
* spelling
* clean up unecessary files in 1.10
* clean up unecessary files in 1.10
* clean up unecessary files in 1.10
When building tagged image on DockerHub the build has been
failing as it was trying to pull cached version of prod image
but the tagged image should be built from scratch so cache should
be disabled.
Fixes#12263
For Kubernetes tests all tests can be executed in the same python
version - default one - no matter which PYTHON_MAJOR_MINOR is
used. This is because we are testing Airflow which is deployed
via production image. Thanks to that we can fix the python version
to be default and avoid any python version problems (this is
especially important for cherry-picking to 1.10 where we have
python 2.7 and 3.5.
If we do not remove the cidfile, the subsequent write to it does
not change the content. The errors have been masked by the
stderr redirection, so the error was invisible.
Rather than counting changed layers in the image (which was
enigmatic, difficult and prone to some magic number) we rely now
on random file generated while building the image.
We are using the docker image caching mechanism here. The random
file will be regenerated only when the previous layer (which is
about installling Airflow dependencies for the first time) gets
rebuild. And for us this is the indication, that the building
the image will take quite some time. This layer should be
relatively static - even if setup.py changes the CI image is
designed in the way that the first time installation of Airflow
dependencies is not invalidated.
This should lead to faster and less frequent rebuild for people
using Breeze and static checks.
* K8s yaml templates not rendered by k8sexecutor
There is a bug in the yaml template rendering caused by the logic that
yaml templates are only generated when the current executor is the
k8sexecutor. This is a problem as the templates are generated by the
task pod, which is itself running a LocalExecutor. Also generates a
"base" template if this taskInstance has not run yet.
* fix tests
* fix taskinstance test
* fix taskinstance
* fix pod generator tests
* fix podgen
* Update tests/kubernetes/test_pod_generator.py
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
* @ashb comment
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
When you are building CI images locally you use the CI
base images from apache:airflow/python* now to maintain
consistency and avoid often rebuilds. But when you build
prod images, you would accidentaly override it with the
python base image available in python repo which might be
different (newer and not yet tested in CI). This PR
changes it to use the same base image which is now
tagged in Apache Airflow's dockerhub repository.
The K9s is fantastic tool that helps to debug a running k8s
instance. It is terminal-based windowed CLI that makes you
several times more productive comparing to using kubectl
commands. We've integrated k9s (it is run as a docker container
and downloaded on demand). We've also separated out KUBECONFIG
of the integrated kind cluster so that it does not mess with
kubernetes configuration you might already have.
Also - together with that the "surrounding" of the kubernetes
tests were simplified and improved so that the k9s integration
can be utilized well. Instead of kubectl port forwarding (which
caused multitude of problems) we are now utilizing kind's
portMapping feature + custom NodePort resource that maps
port 8080 to 30007 NodePort which in turn maps it to 8080
port of the Webserver. This way we do not have to establish
an external kubectl port forward which is prone to error and
management - everything is brought up when Airflow gets
deployed to the Kind Cluster and shuts down when the Kind
cluster is stopped.
Yet another problem fixed was killing of postgres by one of the
kubernetes tests ('test_integration_run_dag_with_scheduler_failure').
Instead of just killing the scheduler it killed all pods - including
the Postgres one (it was named 'airflow-postgres.*'). That caused
various problems, as the database could be left in a strange state.
I changed the tests to do what it claimed was doing - so killing only the
scheduler during the test. This seemed to improve the stability
of tests immensely in my local setup.
* Providers in extras are properly configured and verified
This fixes#12255 - where we published beta2 release with some
extras pulling non-existing providers.
The exact list of providers that had problems:
Wrongly named extras/providers:
* apache.presto: it was badly named -> renamed to 'presto'
* spark (badly pointing to spark instead of apache.spark)
* yandexcloud (the name remains there but we've also added 'yandex' extra to correspond 1-1 with 'yandex' provider
Extras that were wrongly marked as having providers, where they had
none:
* dask
* rabbitmq
* sentry
* statsd
* tableau
* virtualenv
* Update scripts/ci/pre_commit/pre_commit_check_extras_have_providers.py
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
* Update scripts/ci/pre_commit/pre_commit_check_extras_have_providers.py
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
There was a problem that even if we pulled the right image
from the Airflow repository, we have not tagged it properly.
Also added protection for people who have not yet at all pulled
the Python image from airflow, to force pull for the first time.
When extras are specifying when airflow is installed, this one triggers
installation of dependent packages. Each extra has a set of provider
packages that are needed by the extra and they will be installed
automatically if this extra is specified.
For now we do not add any version specificatiion, until we agree the
process in #11425 and then we should be able to implement an
automated way of getting information about cross-package
version dependencies.
Fixes: #11464
The "tmp" directory is mounted from the host (from tmp folder
in the source airflow directory). This is needed to get some
of our docker-in-docker tools (such as gcloud/aws/java) and
get them working on demand. Thanks to that we do not have
to increase the size of CI image unnecessarily.
Those tools were introduced and made to work in #9376
However this causes some of the standard tools (such as apt-get)
to not work inside the container unless the mounted /tmp
folder has write permission for groups/other.
This PR fixes it.
The change #10806 made airflow works with implicit packages
when "airflow" got imported. This is a good change, however
it has some unforeseen consequences. The 'provider_packages'
script copy all the providers code for backports in order
to refactor them to the empty "airflow" directory in
provider_packages folder. The #10806 change turned that
empty folder in 'airflow' package because it was in the
same directory as the provider_packages scripts.
Moving the scripts to dev solves this problem.
This is a fix to a problem introduced in #10806. The change
turned provider packages into namespace packages - which made
them ignored by find_packages function from setup tools - thus
prodiuction image build automatically and used by Kubernetes
tests did not have the provider packages installed.
This PR fixes it and adds future protection during CI tests of
production image to make sure that provider packages are
actually installed.
Fixes#12150
When new Python version is released (bugfixes), we rebuild the CI image
and replace it with the new one, however releasing of the python
image and CI image is often hours or even days apart (we only
release the CI image when tests pass in master with the new python
image). We already use a better approach for Github - we simply
push the new python image to our registry together with the CI
image and the CI jobs are always pulling them from our registry
knowing that the two - python and CI image are in sync.
This PR introduces the same approach. We not only push CI image
but also the corresponding Python image to our registry. This has
no ill effect - DockerHub handles it automatically and reuses
the layers of the image directly from the Python one so it is
merely a label that is stored in our registry that points to the
exact Python image that was used by the last pushed CI image.
There are few more variables that (if not defined) prevent
from using the CI image directly without breeze or the
CI scripts.
With this change you can run:
`docker run -it apache/airflow:master-python3.6-ci`
and enter the image without errors.
Those variables are defined in GitHub environment so when they
were recently addded it was not obvious that they will fail when
running kubernetes tests locally.
This PR fixes that.
After Debian 9 and according to the manual https://manpages.debian.org/stretch/apt/apt-key.8.en.html, after Debian 9 instead of using "apt-key add" a keyring should be placed directly in the /etc/apt/trusted.gpg.d/ directory with a descriptive name and either "gpg" or "asc" as file extension. Also added better redirection on the apt-key list command.
This test was bundled in with the existing needs-api tests, but then
performed it's _own_ checks on if it should run. This changes that to
have selective_ci_checks.sh do this check.
Additionally CI_SOURCE_REPO was often wrong -- at least for me as I
don't open PRs from ashb/airflow, and this lead to a confusing message:
> https://github.com/ashb/airflow.git Branch my_branch does not exist
But all we were using this for was to find the "parent" commit, but
there is any easier way we can do that: HEAD^1 with a fetch depth of 2
to the checkout option.
So I've removed calculating that and where it is used.
If we need to bring it back we should use the output from the
`potiuk/get-workflow-origin` action -- that gets the correct value
The `exit` and `quit` functions are actually `site.Quitter` objects and are loaded, at interpreter start up, from site.py. However, if the interpreter is started with the `-S` flag, or a custom site.py is used then exit and quit may not be present. It is recommended to use `sys.exit()` which is built into the interpreter and is guaranteed to be present.
The change #12050 that aimed at automation of Docker images
building in DockerHub had an undesired effect of overriding the
production image tag with the CI one.
This is fixed by this PR.
DockerHub uses `hooks/build` to build the image and it passes
DOCKER_TAG variable when the script is called.
This PR makes the DOCKER_TAG to provide the default valuei for tag
that is calculated from sources (taking the default branch and
python version). Since it is only set in the DockerHub build, it
should be safe.
Fixes#11937
There was a problem that documentation-only checks triggered
selective checks without docs build (they resulted in
basic-checks-only and no images being built.
This occured for example in #12025
This PR fixes it by adding image-build and docs-build as two
separate outputs.
For example, this allows some providers to be installed in site packages
(`/usr/local/python3.7/...`) and others to be installed in the user folder
(`~/.local/lib/python3.7/...`) and both be importable.
If we didn't have code in `airflow/__init__.py` this would be much
easier to achieve (we simply delete the top level init file would be
enough) - but sadly we can't take that route.
From the docs of pkgutil: https://docs.python.org/3/library/pkgutil.html#module-pkgutil
> This will add to the package’s __path__ all subdirectories of
> directories on sys.path named after the package. This is useful if one
> wants to distribute different parts of a single logical package as
> multiple directories.
Tested as follows:
```
$ pip install /wheels/apache_airflow-2.0.0.dev0-py3-none-any.whl
$ ls -ald $(python -c 'import os; print(os.path.dirname(__import__("airflow").__file__))')/providers
ls: cannot access '/usr/local/lib/python3.7/site-packages/airflow/providers': No such file or directory
$ pip install --constraint <(echo 'apache-airflow==2.0.0.dev0') apache-airflow-backport-providers-redis
$ pip install --user --constraint <(echo 'apache-airflow==2.0.0.dev0') apache-airflow-backport-providers-imap
$ python -c 'import airflow.providers.imap, airflow.providers.redis; print(airflow.providers.imap.__file__); print(airflow.providers.redis.__file__)'
/root/.local/lib/python3.7/site-packages/airflow/providers/imap/__init__.py
/usr/local/lib/python3.7/site-packages/airflow/providers/redis/__init__.py
```
* Migrate from helm-unittest to python unittest
* fixup! Migrate from helm-unittest to python unittest
* fixup! fixup! Migrate from helm-unittest to python unittest
Some tests (testing the structure and importability of
example) should be always run even if core part was not modified.
That's why we move it to "always" directory.
This PR is an implementation of optimisation - to only run
default values for build matrix in case PR does not have
"okay to test" label.
This "okay to test" label is set when the PR gets approved
but it was not approved before, also then a comment is generated
urging the committer to rebase the PR to run full set of tests.
Additionally a check is added (in-progress) that makes the PR
not yet ready to be merged. Only after re-running it it will
become truly readty to be merged.
With this change we attempt to better diagnose some occasional
network docker-compose issues that have beeen plaguing us after
we solved or workarounded other CI-related issues. Sometimes
the docker compose jobs fail on checking if the container is
up and running with either of the two errors:
* 'forward host lookup failed: Unknown host`
* 'DNS fwd/rev mismatch'
Usually this happens in rabbitMQ and openldap containers.
Both indicate a problem with DNS of the docker engine or maybe
some remnants of the previous docker run that do not allow us
to start those containers.
This change introduces few improvements:
* added --volume in `docker system prune` command which might
clean-up some anonymous volumes left by the containers between
runs
* removed docker-compose down --remove-orphans --down command
after failure, as currently we are anyhow always doing it
few lines before (before the test). This change will cause
that our mechanism of logging container logs after failure
will likely give us more information about in case the root
cause is rabbitmq or openldap container failing to start
* Increases number of tries to 5 in case of failed containers.
Presto DB is checked several times but it also means that
it is added several times to DISABLED_INTEGRATIONS in case
it is not enabled. This commit fixes it.
Seems that postgres is really stable when it comes to upgrades,
so we take the assumption that if we test 9.6 and 13, and they
work, all the versions between will also work.
This PR changes Postgres 10 to 13 in tests and updates documentation
with all the versions in between.
If you used context from git repo, the .piprc file was missing and
COPY in Dockerfile is not conditional.
This change copies the .pypirc conditionally from the
docker-context-files folder instead.
Also it was needlessly copied in the main image where it is not
needed and it was even dangerous to do so.
The packages lacked setup.py and they could not be installed.
This change automatically generates setup.py for the packages and
adds them to the packages.
Fixes: #11546
We've implemented the capability of running the tests in smaller
chunks and selective running only some of those, but this
capability have been disabled by mistake by default setting of
TEST_TYPE to "All" and not removing it when TEST_TYPES are set
to the sets of tests that should be run.
This should speed up many of our tests and also hopefully
lower the chance of EXIT 137 errors.
The security scans take a long time, especially for python code
- it is about ~18 minutes now. This PR reduces strain on the
GitHub actions by only running the scan in pull requests
when any of python/javascript code changed respectively.
* Images are not built if the change is not touching code or docs.
* In case we have no need for CI images we run stripped-down
pre-commit checks which skip the long checks and only run for
changed files
* If none of the CLI/Providers/Kubernetes/WWW files changed
the relevant tests are skipped, unless some of the core files
changed as well.
* The selective checks logic is explained and documented.
This is the second attempt at the problem with better
strategy to get the list of files from the incoming PR.
The strategy works now better in a number of cases:
* when PR comes from the same repo
* when PR comes from the pull_repo
* when PR contains more than one commit
* when PR is based on older master and GitHub creates
merge commit
When we prepare pre-release versions, they are not intended to be
converted to final release versions, so there is no need to replace
version number for them artificially,
For release candidates on the other hand, we should internally use the
"final" version because those packages might be simply renamed to the
final "production" versions.
Fixes#11585
We do not dump airflow logs on success any more, but we dump them
and all the container logs in case of failure, so that we can
better investigate cases like #11543 - that includes enabling
full deadlock information dumping in our mysql database.
In case of very simple changes, there might be no merge commits
generated by GitHub. In such cases we should take the commit SHA
instead as the base of change calculation for selective tests.
Seems like the trap with several steps and || true does not really
work the way I wanted and when kill is run but the process is
already gone, we had error in the script.
Looks like this approach with sub-process kill will do it.
* Improves stability of K8S tests by caching binaries and repeats
The K8S tests on CI are controlled from the host, not from
inside of the CI container image. Therefore it needs virtualenv
to run the tests as well as some tools such as helm, kubectl
and kind. While those tools can bee downloaded and installed
on demand, from time to time the download fails intermittently.
This change introduces the following improvements:
* the commands to download and setup kind, helm, kubectl are
repeated up to 4 times in case they fail
* the "bin" directory where those binaries are downloaded is
cached between runs. Only the same combination of
versions of the tools are sharing the same cache.
This way both cases - regular re-runs of the same jobs and
upgrade of tools will be much more stable.
* Images are not built if the change is not touching code or docs.
* In case we have no need for CI images we run stripped-down
pre-commit checks which skip the long checks and only run for
changed files
* If none of the CLI/Providers/Kubernetes/WWW files changed
the relevant tests are skipped, unless some of the core files
changed as well.
* The selective checks logic is explained and documented.
So far breeze used in-container data for persisting it (mysql redis,
postgres). This means that the data was kept as long, as long the
containers were running. If you stopped Breeze via `stop` command
the data was always deleted.
This changes the behaviour - each of the Breeze containers has
a named volume where data is kept. Those volumes are also deleted
by default when Breeze is stopped, but you can choose to preserve
them by adding ``--preserve-volumes`` when you run ``stop`` or
``restart`` command.
Fixes: #11625
In Airflow 2.0 we decided to split Airlow into separate providers.
this means that when you prepare core airflow package, providers
are not installed by default. This is not very convenient for
local development though and for docker images built from sources,
where you would like to install all providers by default.
A new INSTALL_ALL_AIRFLOW_PROVIDERS environment variable controls
this behaviour now. It is is set to "true", all packages including
provider packages are installed. If missing or set to false, only
the core provider package is installed.
For Breeze, the default is set to "true", as for those cases you
want to install all providers in your environment. Similarly if you
build the production image from sources. However when you build
image using github tag or pip package, you should specify
appropriate extras to install the required provider packages.
Note that if you install Airflow via 'pip install .' from sources
in local virtualenv, provider packages are not going to be
installed unless you set INSTALL_ALL_AIRFLOW_PROVIDERS to "true".
Fixes#11489
The scripts were using docker compose, but they
can be docker run commands. Also they are not needed to be
run by breeze directly in CI image because I've added traps
to run the commands at the exit of all "in_container" scripts.
The production image had the capability of installing images from
wheels (for security teams/air-gaped systems). This capability
might also be useful when building CI image espeically when
we are installing separately core and providers packages and
we do not yet have provider packages available in PyPI.
This is an intermediate step to implement #11490
Seems that port forwarding during kubernetes tests started to behave
erratically - seems that kubectl port forward sometimes might hang
indefinitely rather than connect or fail.
We change the strategy a bit to try to allocate
increasing port numbers in case something like that happens.
Seems that by splitting the tests into many small jobs has a bad
effect - since we only have queue size = 180 for the whole Apache
organisation, we are competing with other projects for the jobs
and with the jobs being so short we got starved much more than if
we had long jobs. Therefore we are re-combining the test types into
single jobs per Python version/Database version and run all the
tests sequentially on those machines.
* Separate changes/readmes for backport and regular providers
We have now separate release notes for backport provider
packages and regular provider packages.
They have different versioning - backport provider
packages with CALVER, regular provider packages with
semver.
* Added support for provider packages for Airflow 2.0
This change consists of the following changes:
* adds provider package support for 2.0
* adds generation of package readme and change notes
* versions are for now hard-coded to 0.0.1 for first release
* adds automated tests for installation of the packages
* rename backport package readmes/changes to BACKPORT_*
* adds regulaar packge readmes/changes
* updates documentation on generating the provider packaes
* adds CI tests for the packages
* maintains backport packages generation with --backports flag
Fixes#11421Fixes#11424
In preparation for adding provider packages to 2.0 line we
are renaming backport packages to provider packages.
We want to implement this in stages - first to rename the
packages, then split-out backport/2.0 providers as part of
the #11421 issue.
This is final step of implementing #10507 - selective tests.
Depending on files changed by the incoming commit, only subset of
the tests are exucted. The conditions below are evaluated in the
sequence specified below as well:
* In case of "push" and "schedule" type of events, all tests
are executed.
* If no important files and folders changed - no tests are executed.
This is a typical case for doc-only changes.
* If any of the environment files (Dockerfile/setup.py etc.) all
tests are executed.
* If no "core/other" files are changed, only the relevant types
of tests are executed:
* API - if any of the API files/tests changed
* CLI - if any of the CLI files/tests changed
* WWW - if any of the WWW files/tests changed
* Providers - if any of the Providers files/tests changed
* Integration Heisentests, Quarantined, Postgres and MySQL
runs are always run unless all tests are skipped like in
case of doc-only changes.
* If "Kubernetes" related files/tests are changed, the
"Kubernetes" tests with Kind are run. Note that those tests
are run separately using Host environment and those tests
are stored in "kubernetes_tests" folder.
* If some of the core/other files change, all tests are run. This
is calculated by substracting all the files count calculated
above from the total count of important files.
Fixes: #10507
Constraints generation script was broken by recent changes
in naming of constraints URL variables and moving generation
of the link to the Dockerfile
This change restores the script's behaviour.
We seem to have a problem with running all tests at once - most
likely due to some resource problems in our CI, therefore it makes
sense to split the tests into more batches. This is not yet full
implementation of selective tests but it is going in this direction
by splitting to Core/Providers/API/CLI tests. The full selective
tests approach will be implemented as part of #10507 issue.
This split is possible thanks to #10422 which moved building image
to a separate workflow - this way each image is only built once
and it is uploaded to a shared registry, where it is quickly
downloaded from rather than built by all the jobs separately - this
way we can have many more jobs as there is very little per-job
overhead before the tests start runnning.
We have started to experience "unknown_blob" errors intermittently
recently with GitHub Docker registry. We might eventually need
to migrate to GCR (which eventually is going to replace the
Docker Registry for GitHub:
The ticket is opened to the Apache Infrastructure to enable
access to the GCR and to make some statements about Access
Rights management for GCR https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20959
Also a ticket to GitHub Support has been raised about it
https://support.github.com/ticket/personal/0/861667 as we
cannot delete our public images in Docker registry.
But until this happens, the workaround might help us
to handle the situations where we got intermittent errors
while pushing to the registry. This seems to be a common
error, when NGINX proxy is used to proxy Github Registry so
it is likely that retrying will workaround the issue.
* Constraints and PIP packages can be installed from local sources
This is the final part of implementing #11171 based on feedback
from enterprise customers we worked with. They want to have
a capability of building the image using binary wheel packages
that are locally available and the official Dockerfile. This means
that besides the official APT sources the Dockerfile build should
not needd GitHub, nor any other external files pulled from outside
including PIP repository.
This change also includes documentation on how to prepare set of
such binaries ready for inspection and review by security teams
in Enterprise environment. Such sets of "known-working-binary-whl"
files can then be separately committed, tracked and scrutinized
in an artifact repository of such an Enterprise.
Fixes: #11171
* Update docs/production-deployment.rst
This PR needs to be merged first in order to handle the #11385
which requires .pypirc to be created before dockerfile gets build.
This means that the script change needs to be merged to master
first in this PR.
If this flag is specified it will look for wheel packages placed in dist
folder and it will install the wheels from there after installing
Airflow. This is useful for testing backport packages as well as in the
future for testing provider packages for 2.0.
We started to get more often "unknown blob" kind of errors when
pushing the images to GitHub Registry. While this is clearly a
GitHub issue, it's frequency of occurence and unclear message
make it a good candidate to write additional message with
instructions to the users, especially that now they have
an easy way to get to that information via status checks and
links leading to the log file, when this problem happens during
image building process.
This way users will know that they should simply rebase or
amend/force-push their change to fix it.
When installing airflow 1.10 via breeze we now enable rbac
by default, but we can disable it with --no-rbac-ui flag.
This is useful to test different variants of 1.10 when testing
release candidataes in connection with the 'start-airflow'
command.
The script was previously placed in scripts/ci which caused
a bit of a problem in 1-10-test branch where PRs were using
scripts/ci from the v1-10-test HEAD but they were missing
the ci script from the PR.
The scripts "ci" are parts of the host scripts that are
always taken from master when the image is built, but
all the other stuff should be taken from "docker"
folder - which will be taken from the PR.
* Allows more customizations for image building.
This is the third (and not last) part of making the Production
image more corporate-environment friendly. It's been prepared
for the request of one of the big Airflow user (company) that
has rather strict security requirements when it comes to
preparing and building images. They are committed to
synchronizing with the progress of Apache Airflow 2.0 development
and making the image customizable so that they can build it using
only sources controlled by them internally was one of the important
requirements for them.
This change adds the possibilty of customizing various steps in
the build process:
* adding custom scripts to be run before installation of both
build image and runtime image. This allows for example to
add installing custom GPG keys, and adding custom sources.
* customizing the way NodeJS and Yarn are installed in the
build image segment - as they might rely on their own way
of installation.
* adding extra packages to be installed during both build and
dev segment build steps. This is crucial to achieve the same
size optimizations as the original image.
* defining additional environment variables (for example
environment variables that indicate acceptance of the EULAs
in case of installing proprietary packages that require
EULA acceptance - both in the build image and runtime image
(again the goal is to keep the image optimized for size)
The image build process remains the same when no customization
options are specified, but having those options increases
flexibility of the image build process in corporate environments.
This is part of #11171.
This change also fixes some of the issues opened and raised by
other users of the Dockerfile.
Fixes: #10730Fixes: #10555Fixes: #10856
Input from those issues has been taken into account when this
change was designed so that the cases described in those issues
could be implemented. Example from one of the issue landed as
an example way of building highly customized Airflow Image
using those customization options.
Depends on #11174
* Update IMAGES.rst
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
Breeze tags the image based on the default python version,
branch, type of the image, but you might want to tag the image
in the same command especially in automated cases of building
the image via CI scripts or security teams that tag the imge
based on external factors (build time, person etc.).
This is part of #11171 which makes the image easier to build in
corporate environments.
This is the second step of making the Production Docker Image more
corporate-environment friendly, by making MySQL client installation
optional. Instaling MySQL Client on Debian requires to reach out
to oracle deb repositories which might not be approved by security
teams when you build the images. Also not everyone needs MySQL
client or might want to install their own MySQL client or MariaDB
client - from their own repositories.
This change makes the installation step separated out to
script (with prod/dev installation option). The prod/dev separation
is needed because MySQL needs to be installed with dev libraries
in the "Build" segment of the image (requiring build essentials
etc.) but in "Final" segment of the image only runtime libraries
are needed.
Part of #11171
Depends on #11173.
This is first step of implementing the corporate-environment
friendly way of building images, where in the corporate
environment, this might not be possible to install the packages
using the GitHub cache initially.
Part of #11171
GitHub Actions allow to use `fromJson` method to read arrays
or even more complex json objects into the CI workflow yaml files.
This, connected with set::output commands, allows to read the
list of allowed versions as well as default ones from the
environment variables configured in
./scripts/ci/libraries/initialization.sh
This means that we can have one plece in which versions are
configured. We also need to do it in "breeze-complete" as this is
a standalone script that should not source anything we added
BATS tests to verify if the versions in breeze-complete
correspond with those defined in the initialization.sh
Also we do not limit tests any more in regular PRs now - we run
all combinations of available versions. Our tests run quite a
bit faster now so we should be able to run more complete
matrixes. We can still exclude individual values of the matrixes
if this is too much.
MySQL 8 is disabled from breeze for now. I plan a separate follow
up PR where we will run MySQL 8 tests (they were not run so far)
* Simplify Airflow on Kubernetes Story
Removes thousands of lines of code that essentially ammount to us
re-creating the Kubernetes API. Will offer a faster, simpler
KubernetesExecutor for 2.0
* Fix podgen tests
* fix documentation
* simplify validate function
* @mik-laj comments
* spellcheck
* spellcheck
* Update airflow/executors/kubernetes_executor.py
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Airflow below 1.10.2 required SLUGIFY_USES_TEXT_UNIDECODE env
variable to be set to yes.
Our production Dockerfile and Breeze supports building images
for any version of airflow >= 1.10.1 but it failed on
1.10.2 and 1.10.1 because this variable was not set.
You can now set the variable when building image manually
and Breeze does it automatically if image is 1.10.1 or 1.10.2
Fixes#10974
Changed `Is` to `Passed`
Before:
```
ERROR: Allowed backend: [ sqlite mysql postgres ]. Is: 'dpostgres'.
Switch to supported value with --backend flag.
```
After:
```
ERROR: Allowed backend: [ sqlite mysql postgres ]. Passed: 'dpostgres'.
Switch to supported value with --backend flag.
```
During testing v1-10-test backport for Breeze the
--github-repository flag did not work. It turned out that
the lowercase variable was not re-set when the flag was
provided by Breeze.
This change causes the lowercasing to be run just before it
is used to make sure that the GITHUB_REPOSITORY value
is used after it's been overwritten.
When we ported the new CI mechanism to v1-10-test it turned out
that we have to correct the retrieval of DEFAULT BRANCH
and DEFAULT_CONSTRAINTS_BRANCH.
Since we are building the images using the "master" scripts, we need to
make sure the branches are retrieved from _initialization.sh of the
incoming PR, not from the one in the master branch.
Additionally versions 2.7 and 3.5 builds have to be merged to
master and excluded when the build is run targeting master branch.
* Modify helm chart to use pod_template_file
Since we are deprecating most k8sexecutor arguments
we should use the pod_template_file when launching airflow
using the KubernetesExecutor
* fix tests
* one more nit
* fix dag command
* fix pylint
The change from #10769 accidentally switched Integration tests
into far-longer run unit tests (we effectively run the tests
twice and did not run integration tests.
This fixes the problem by removing readonly status from
INTEGRATIONS and only setting it after the integrations are
set.
Until pre-commit implements export of all configured
checks, we need to maintain the list manually updated.
We check both - pre-commit list in breeze-complete and
descriptions in STATIC_CODE_CHECKS.rst
We've observed the tests for last couple of weeks and it seems
most of the tests marked with "quarantine" marker are succeeding
in a stable way (https://github.com/apache/airflow/issues/10118)
The removed tests have success ratio of > 95% (20 runs without
problems) and this has been verified a week ago as well,
so it seems they are rather stable.
There are literally few that are either failing or causing
the Quarantined builds to hang. I manually reviewed the
master tests that failed for last few weeks and added the
tests that are causing the build to hang.
Seems that stability has improved - which might be casued
by some temporary problems when we marked the quarantined builds
or too "generous" way of marking test as quarantined, or
maybe improvement comes from the #10368 as the docker engine
and machines used to run the builds in GitHub experience far
less load (image builds are executed in separate builds) so
it might be that resource usage is decreased. Another reason
might be Github Actions stability improvements.
Or simply those tests are more stable when run isolation.
We might still add failing tests back as soon we see them behave
in a flaky way.
The remaining quarantined tests that need to be fixed:
* test_local_run (often hangs the build)
* test_retry_handling_job
* test_clear_multiple_external_task_marker
* test_should_force_kill_process
* test_change_state_for_tis_without_dagrun
* test_cli_webserver_background
We also move some of those tests to "heisentests" category
Those testst run fine in isolation but fail
the builds when run with all other tests:
* TestImpersonation tests
We might find that those heisentest can be fixed but for
now we are going to run them in isolation.
Also - since those quarantined tests are failing more often
the "num runs" to track for those has been decreased to 10
to keep track of 10 last runs only.
When rebuildig the image during commit, kill command failed to
find the spinner job to kill (this is just preventive measure)
and failed the rebuild step in pre-commit.
This is now fixed.
The constants were initialised after the readonly status was set
for the constants in the test script.
This was mainly about default values for those consttants (but this
has already been handled by the _script_init.sh but more importantly
the INTEGRATIONS were not properly initialized that cause skipping of
some integration tests.
The docker(), helm(), kubectl() functions replace the real tools
to get verbose behaviour (we can print the exact command being
executed for those. But when 'set +e' was set before the command
was called - indicating that error in those functions should be
ignored - this did not happen. The functions set 'set -e' just
before returning the non-zero value, effectively exiting the
script right after. This caused first time experience to be not
good.
The fix also fixes behaviour of stdout and stderr for those
functions - previously they were joined to be able to be
printed to OUTPUT_FILE but this lost the stderr/stdout
distinction. Now both stdout and stderr are printed to the
output file but they are also redirected to stdout/stderr
respectively, so that 2>/dev/null works as expected.
While fixing it, it turned out that one of the remove_images
methods was not used any more - merged it with the breeze one.
The hadolint check only checked the "main dir" Dockerfile
but we have more of them now. All of them are now checked.
The following problems are fixed:
* DL3000 Use absolute WORKDIR
* DL4000 MAINTAINER is deprecated
* DL4006 Set the SHELL option -o pipefail before RUN with a pipe in it.
* SC2046 Quote this to prevent word splitting.
The followiing problems are ignored:
* DL3018 Pin versions in apk add. Instead of `apk add <package>` use `apk add
<package>=<version>`
If the pod restarts before the sleep time is over, the trim command will not run. I think it's better if we reorder the commands to execute the delete and then go to sleep. At the moment sleep is every 15 mins but people will just increase the EVERY line if they want longer sleep time and can encounter this bug.
BATS has additional libraries of asserts that are much more
straightforward and nicer to write tests for bash scripts
There is no dockerfile from BATS that contains those, so we
had to build our own (but it follows the same structure
as #9652 - where we keep our dev docker image
sources inside our repository and the generated docker images
in "apache/airflow:<tool>-CALVER-TOOLVER format.
We have more BATS unit test to add - following #10576
and this change will be of great help.
Inspired by the Google Shell Guide where they mentioned
separating package names with :: I realized that this was
one of the missing pieces in the bash scripts of ours.
While we already had packages (in libraries folders)
it's been difficult to realise which function is where.
With introducing packages - equal to the library file name
we are *almost* at a level of a structured language - and
it's easier to find the functions if you are looking for them.
Way easier in fact.
Part of #10576
(cherry picked from commit cc551ba793)
(cherry picked from commit 2bba276f0f06a5981bdd7e4f0e7e5ca2fe84f063)
* Implement Google Shell Conventions for breeze script … (#10651)
Part of #10576
First (and the biggest of the series of commits to introduce
Google Shell Conventions in our bash scripts.
This is about the biggest and the most complex breeze script
so it is rather huge but it is difficult to split it into
smaller pieces.
The rules implemented (from the conventions):
* constants and exported variables are CAPITALIZED, where
local/temporary variables are lowercase
* following the shell guide, once all the variables are set to their
final values (either from exported variables, calculation or --switches
) I have a single function that makes all the variables read-only. That
helped to clean-up a lot of places where same functions was called
several times, or where variables were defined in a few places. Now the
behavior should be rather consistent and we should easily catch some
duplications
* function headers (following the guide) explaining arguments,
variables expected, variables modified in the functions used.
* setting the variables as read-only also helped to clean-up the "ifs"
where we often had ":=}" in variables and != "" or == "". Those are
replaced with `=}` and tests are replaced with `-n` and `-z` - also
following the shell guide (readonly helped to detect and clean all
such cases). This also should be much more robust in the future.
* reorganized initialization of those constants and variables - simplified
a few places where initialization was overlapping. It should be much more
straightforward and clean now
* a number of internal function breeze variables are "local" - this is
helpful in accidental variables overwriting and keeping stuff localized
* trap_add function is separated out to help in cases where we had
several traps handling the same signals.
(cherry picked from commit 46c8d6714c)
(cherry picked from commit c822fd7b4bf2a9c5a9bb3c6e783cbea9dac37246)
* fixup! Implement Google Shell Conventions for breeze script … (#10651)
* Revert "Add packages to function names in bash (#10670)"
This reverts commit cc551ba793.
* Revert "Implement Google Shell Conventions for breeze script … (#10651)"
This reverts commit 46c8d6714c.
Inspired by the Google Shell Guide where they mentioned
separating package names with :: I realized that this was
one of the missing pieces in the bash scripts of ours.
While we already had packages (in libraries folders)
it's been difficult to realise which function is where.
With introducing packages - equal to the library file name
we are *almost* at a level of a structured language - and
it's easier to find the functions if you are looking for them.
Way easier in fact.
Part of #10576
Part of #10576
First (and the biggest of the series of commits to introduce
Google Shell Conventions in our bash scripts.
This is about the biggest and the most complex breeze script
so it is rather huge but it is difficult to split it into
smaller pieces.
The rules implemented (from the conventions):
* constants and exported variables are CAPITALIZED, where
local/temporary variables are lowercase
* following the shell guide, once all the variables are set to their
final values (either from exported variables, calculation or --switches
) I have a single function that makes all the variables read-only. That
helped to clean-up a lot of places where same functions was called
several times, or where variables were defined in a few places. Now the
behavior should be rather consistent and we should easily catch some
duplications
* function headers (following the guide) explaining arguments,
variables expected, variables modified in the functions used.
* setting the variables as read-only also helped to clean-up the "ifs"
where we often had ":=}" in variables and != "" or == "". Those are
replaced with `=}` and tests are replaced with `-n` and `-z` - also
following the shell guide (readonly helped to detect and clean all
such cases). This also should be much more robust in the future.
* reorganized initialization of those constants and variables - simplified
a few places where initialization was overlapping. It should be much more
straightforward and clean now
* a number of internal function breeze variables are "local" - this is
helpful in accidental variables overwriting and keeping stuff localized
* trap_add function is separated out to help in cases where we had
several traps handling the same signals.
We can now build all the images from Airlfow sources in
a reproducible fashion and our users can use the helm chart
based on the images build from official images + code in
Airflow Codebase.
We also have consistent versioning scheme based on
calver version of releasing the images coupled with
the version of the original package.
Part of #9401
We have already fixed a lot of problems that were marked
with those, also IntelluiJ gotten a bit smarter on not
detecting false positives as well as understand more
pylint annotation. Wherever the problem remained
we replaced it with # noqa comments - as it is
also well understood by IntelliJ.