Граф коммитов

1023 Коммитов

Автор SHA1 Сообщение Дата
Tomek Urbaszek cba8d62553
Refactor list rendering in commands (#12704)
This commit unifies the mechanism of rendering output of tabular
data. This gives users a possibility to eiter display a tabular
representation of data or render it as valid json or yaml payload.

Closes: #12699

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-12-02 10:20:16 +01:00
Jarek Potiuk a02e0f746f
User-friendly output of Breeze and CI scripts (#12735) 2020-12-01 17:44:05 +01:00
Kamil Breguła 027fd743d6
Fix static checks - #12715 (#12729) 2020-12-01 11:10:54 +00:00
Jarek Potiuk ebc8fcf199
Improve verification of images with PIP check (#12718)
Verification of images with PIP is done in separate jobs and
they provide useful information to committers and contributors
when the pip check fails.
2020-12-01 09:51:24 +01:00
Jarek Potiuk e4cb0ef192
Output of installing remaining packages is shown also on success (#12723)
Previously the output of instaling remaining packges when testing
provider imports was only shown on error. However it is useful
to know what's going on even if it clutters the log.

Note that this installation is only needed until we include
apache-beam in the installed packages on CI.

Related to #12703

This PR shows the output always .
2020-12-01 09:51:05 +01:00
Kamil Breguła bd90136aaf
Move operator guides to provider documentation packages (#12681) 2020-11-30 08:48:24 +01:00
Jarek Potiuk 2037303eef
Adds support for Connection/Hook discovery from providers (#12466)
* Adds support for Hook discovery from providers

This PR extends providers discovery with the mechanism
of retrieving mapping of connections from type to hook.

Fixes #12456

* fixup! Adds support for Hook discovery from providers

* fixup! fixup! Adds support for Hook discovery from providers
2020-11-29 15:31:49 +01:00
Jarek Potiuk e4ab453a37
Setup.cfg change triggers full build (#12684)
Since we moved part of the setup.py specification to
setup.cfg, we should trigger full build when only that file
changes.
2020-11-28 12:39:46 +01:00
Kamil Breguła 08bc62b64d
Validate JSON schema files with JSON Schema (#12682) 2020-11-28 12:12:54 +01:00
Jarek Potiuk 1c500ee62c
Temporarily disable PROD image check until Azure Blob is fixed (#12679)
This PR disables temporarily PIP check result for production
image, until the fix to switch Azure Blob to v12 is fixed.
2020-11-28 10:45:14 +01:00
Jarek Potiuk 3b138d2d60
Remove "@" references from constraints generattion (#12671)
Likely fixes: #12665
2020-11-28 06:04:45 +01:00
Jarek Potiuk fa8af2d165
Enable PIP check for both CI and PROD image (#12664)
This PR enables PIP check after constraints have been updated
to be stable and 'pip check' compliant in #12636
2020-11-27 21:33:50 +01:00
Jarek Potiuk 6b3c6add9e
Update setup.py to get non-conflicting set of dependencies (#12636)
This change upgrades setup.py and setup.cfg to provide non-conflicting
`pip check` valid set of constraints for CI image.

Fixes #10854

Co-authored-by: Tomek Urbaszek <turbaszek@apache.org>

Co-authored-by: Tomek Urbaszek <turbaszek@apache.org>
2020-11-27 20:06:44 +01:00
Jarek Potiuk 41a699a7bd
Implement reading provider information from packages/sources (#12512)
This PR implements discovering and readin provider information from
packages (using entry_points) and - if found - from local
provider yaml files for the built-in airflow providers,
when they are found in the airflow.provider packages.
The provider.yaml files - if found - take precedence over the
package-provided ones.

Add displaying provider information in CLI

Closes: #12470
2020-11-27 18:42:32 +01:00
Tomek Urbaszek 456a1c5dc9
Restructure the extras in setup.py and described them (#12548)
Closes: #12544

Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-11-27 15:34:47 +01:00
Jarek Potiuk c0843930bf
Allows mounting local sources for github run-id images (#12650)
The images that are build on github can be used to reproduce
the test errors in CI - they should then be mounted without
local sources. However in some cases when you are dealing with
dependencies for example, it is useful to be able to mount the
sources.

This PR makes it possible.
2020-11-27 12:15:03 +01:00
Jarek Potiuk 8b9d52f0cc
Adds possibility of forcing upgrade constraint by setting a label (#12635)
You can now set a label on PR that will force upgrading to latest
dependencies in your PR. If committer sets an
"upgrade to latest dependencies" label, it will cause the PR
to upgrade all dependencies to latest versions of dependencies
matching setup.py + setup.cfg configuration.
2020-11-26 11:02:33 +01:00
Ash Berlin-Taylor 54adda50c6
Actually run against the version of the DB we select in the matrix. (#12591)
Due to a bug in Breeze initialization code, we were always running
against Postgres 9.6 and MySQL 5.7, even when the matrix selected
something else.

(We were overwriting the POSTGRES_VERSION and MYSQL_VERSION environment
variables in initialization code)
2020-11-25 21:17:10 +01:00
Jarek Potiuk 6d6ca14675
Fixes inconsistent behaviour of utf8mb4 encoding on Mysql 5.7/8 (#12614)
* Fix Connection.description migration for MySQL8

Due to not executing MySQL8 tests Fixed in #12591 added
description for connection table was not compatible with
MySQL8 with utf8mb4 character set.

This change adds migration and fixes the previous migration
to make it compatible.

* Fixes inconsistent setting of encoding on Mysql 5.7/8

We missed that when we added support
for differnet mysql versions in #7717 when we removed default
character set setting for the database server.

This change forces the default on database server to be
utf8mb4 - regardless if MySQL 5.7 or MySQL8 is used.
Utf8mb4 is default for MySQL8 but latin1 is default fo MySQL 5.7.

There was a suspected root cause of the problem:

https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html
where mysql client falls back to the default collation if
the client8 is used with 5.7 database, but this should be
no problem if the default DB character set is forced to be
utf8mb4

This PR restores forcing the server-side encoding.
2020-11-25 14:37:53 +01:00
Kaxil Naik 486134426b
Rename `[scheduler] max_threads` to `[scheduler] parsing_processes` (#12605)
From Airflow 2.0, `max_threads` config under `[scheduler]` section has been renamed to `parsing_processes`.

This is to align the name with the actual code where the Scheduler launches the number of processes defined by
`[scheduler] parsing_processes` to Parse DAG files, calculates next DagRun date for each DAG,
serialize them and store them in the DB.
2020-11-25 09:33:19 +00:00
Jarek Potiuk 58e21ed949
Adds missing licence headers (#12593) 2020-11-25 00:58:01 +01:00
Tobiasz Kędzierski 3fa51f94d7
Add check for duplicates in provider.yaml files (#12578) 2020-11-24 05:25:29 +01:00
Tomek Urbaszek 919e1d8bd1
Fix sed command on MacOS (#12549) 2020-11-22 16:51:26 +01:00
Jed Cunningham 9eb92e7343
Support installing providers with no dependencies via extras (#12497) 2020-11-22 08:19:08 +01:00
Jarek Potiuk 37548f09ac
Fixes unneeded docker-context-files added in CI (#12534)
We do not need to add docker-context-files in CI before we run
first "cache" PIP installation. Adding it might cause the effect
that the cache will always be invalidated in case someone has
a file added there before building and pushing the image.

This PR fixes the problem by adding docker-context files later
in the Dockerfile and changing the constraints location
used in the "cache" step to always use the github constraints in
this case.

Closes #12509
2020-11-21 19:21:43 +01:00
Kamil Breguła c34ef853c8
Separate out documentation building per provider (#12444)
* POC

* fixup! POC
2020-11-20 15:35:56 +01:00
John Bampton 4b59ce827e
Fix case of GitHub in comment (#12474)
github -> GitHub
2020-11-19 11:01:10 +00:00
Ash Berlin-Taylor f034d4b78c
Move setup properties out of setup.py in to setup.cfg (#12417)
yI've moved all the ones that are "static" -- any form of dynamic or
interpolated values are left in setup.py

If a value is passed as n kwrg to setup and in setup.cfg, the kwarg
wins out.

The ./build/bin content only depends on the version of tools used
(helm//kind/kubectl) and it does not depend on setup.py nor
setup.cfg
2020-11-18 13:23:03 +00:00
Kaxil Naik f4851f7d75
Fix Entrypoint and _CMD config variables (#12411)
closes https://github.com/apache/airflow/issues/8705

Co-Authored-By: Noël Bardelot <11333203+NBardelot@users.noreply.github.com>
2020-11-18 00:47:13 +00:00
Daniel Imberman cab86d80d4
Make K8sPodOperator backwards compatible (#12384)
* Make the KubernetesPodOperator backwards compatible

This PR significantly reduces the pain of upgrading to Airflow 2.0
for users of the KubernetesPodOperator. Users will be allowed to
    continue using the airflow.kubernetes custom classes

* spellcheck

* spelling

* clean up unecessary files in 1.10

* clean up unecessary files in 1.10

* clean up unecessary files in 1.10
2020-11-17 13:47:18 -08:00
Jarek Potiuk dc31ca4dc6
The messages about remote image check are only shown with -v (#12402)
The messages might be confusing and should only be shown when
verbose is turned on.
2020-11-17 20:32:00 +01:00
Jarek Potiuk 2c0920fba5
Adds mechanism for provider package discovery. (#12383)
This is a simple mechanism that will allow us to dynamically
discover and register all provider packages in the Airflow core.

Closes: #11422
2020-11-17 18:48:57 +01:00
Kamil Breguła 2cda2f2a0a
Add missing pre-commit definition - provider-yamls (#12393) 2020-11-17 15:44:46 +01:00
Kaxil Naik 3e994abc1c
Fix typo in check_environment.sh (#12395)
`Databsae` -> `Database`
2020-11-17 12:04:03 +00:00
Jarek Potiuk bfbbb247a8
Add extra info when starting extra actions in Breeze (#12377) 2020-11-16 02:26:57 +01:00
Jarek Potiuk 0038660fdd
Fixes pull error on building tagged image (#12378)
When building tagged image on DockerHub the build has been
failing as it was trying to pull cached version of prod image
but the tagged image should be built from scratch so cache should
be disabled.

Fixes #12263
2020-11-16 02:26:36 +01:00
Jarek Potiuk cbd6daf5e6
All kubernetes tests use the same host python version (#12374)
For Kubernetes tests all tests can be executed in the same python
version - default one - no matter which PYTHON_MAJOR_MINOR is
used. This is because we are testing Airflow which is deployed
via production image. Thanks to that we can fix the python version
to be default and avoid any python version problems (this is
especially important for cherry-picking to 1.10 where we have
python 2.7 and 3.5.
2020-11-15 14:20:22 +01:00
Jarek Potiuk cd88af8692
Removes the cidfile before generation (#12372)
If we do not remove the cidfile, the subsequent write to it does
not change the content. The errors have been masked by the
stderr redirection, so the error was invisible.
2020-11-15 01:29:57 +01:00
Kamil Breguła 6889a333cf
Improvements for operators and hooks ref docs (#12366) 2020-11-15 00:50:30 +01:00
Kaxil Naik c9d2b3c5d0
Remove unused import (#12371) 2020-11-14 23:37:52 +00:00
Jarek Potiuk 167b9b9889
Simplifies check whether the CI image should be rebuilt (#12181)
Rather than counting changed layers in the image (which was
enigmatic, difficult and prone to some magic number) we rely now
on random file generated while building the image.

We are using the docker image caching mechanism here. The random
file will be regenerated only when the previous layer (which is
about installling Airflow dependencies for the first time) gets
rebuild. And for us this is the indication, that the building
the image will take quite some time. This layer should be
relatively static - even if setup.py changes the CI image is
designed in the way that the first time installation of Airflow
dependencies is not invalidated.

This should lead to faster and less frequent rebuild for people
using Breeze and static checks.
2020-11-13 22:21:39 +01:00
Daniel Imberman 4e362c1347
K8s yaml templates not rendered by k8sexecutor (#12303)
* K8s yaml templates not rendered by k8sexecutor

There is a bug in the yaml template rendering caused by the logic that
yaml templates are only generated when the current executor is the
k8sexecutor. This is a problem as the templates are generated by the
task pod, which is itself running a LocalExecutor. Also generates a
"base" template if this taskInstance has not run yet.

* fix tests

* fix taskinstance test

* fix taskinstance

* fix pod generator tests

* fix podgen

* Update tests/kubernetes/test_pod_generator.py

Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>

* @ashb comment

Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
2020-11-13 12:06:29 -08:00
Kamil Breguła 7825e8f590
Docs installation improvements (#12304)
* Improvements for installation docs
2020-11-13 09:38:54 +01:00
Jarek Potiuk af19b126e9
Deploy was not working from Breeze (#12319)
The get_cluster_name was called twice resulting in redonly
error after rebasing/fixing CI failure in #12163.

This PR is fxing it.
2020-11-12 19:53:35 +01:00
Jarek Potiuk 3c2c29187a
Python base image is shared between CI and PROD image (#12280)
When you are building CI images locally you use the CI
base images from apache:airflow/python* now to maintain
consistency and avoid often rebuilds. But when you build
prod images, you would accidentaly override it with the
python base image available in python repo which might be
different (newer and not yet tested in CI). This PR
changes it to use the same base image which is now
tagged in Apache Airflow's dockerhub repository.
2020-11-12 12:31:14 +01:00
Jarek Potiuk 21999dd56e
Added k9s as integrated tool to help with kubernetes testing (#12163)
The K9s is fantastic tool that helps to debug a running k8s
instance. It is terminal-based windowed CLI that makes you
several times more productive comparing to using kubectl
commands. We've integrated k9s (it is run as a docker container
and downloaded on demand). We've also separated out KUBECONFIG
of the integrated kind cluster so that it does not mess with
kubernetes configuration you might already have.

Also - together with that the "surrounding" of the kubernetes
tests were simplified and improved so that the k9s integration
can be utilized well. Instead of kubectl port forwarding (which
caused multitude of problems) we are now utilizing kind's
portMapping feature + custom NodePort resource that maps
port 8080 to 30007 NodePort which in turn maps it to 8080
port of the Webserver. This way we do not have to establish
an external kubectl port forward which is prone to error and
management - everything is brought up when Airflow gets
deployed to the Kind Cluster and shuts down when the Kind
cluster is stopped.

Yet another problem fixed was killing of postgres by one of the
kubernetes tests ('test_integration_run_dag_with_scheduler_failure').
Instead of just killing the scheduler it killed all pods - including
the Postgres one (it was named 'airflow-postgres.*'). That caused
various problems, as the database could be left in a strange state.
I changed the tests to do what it claimed was doing - so killing only the
scheduler during the test. This seemed to improve the stability
of tests immensely in my local setup.
2020-11-11 17:15:02 +01:00
Jarek Potiuk 348510f86b
Providers in extras are properly configured and verified (#12265)
* Providers in extras are properly configured and verified

This fixes #12255 - where we published beta2 release with some
extras pulling non-existing providers.

The exact list of providers that had problems:

Wrongly named extras/providers:

* apache.presto: it was badly named -> renamed to 'presto'
* spark (badly pointing to spark instead of apache.spark)
* yandexcloud (the name remains there but we've also added 'yandex' extra to correspond 1-1 with 'yandex' provider

Extras that were wrongly marked as having providers, where they had
none:

* dask
* rabbitmq
* sentry
* statsd
* tableau
* virtualenv

* Update scripts/ci/pre_commit/pre_commit_check_extras_have_providers.py

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>

* Update scripts/ci/pre_commit/pre_commit_check_extras_have_providers.py

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-11-11 17:13:57 +01:00
John Bampton e0b7cae51e
Fix spelling (#12266) 2020-11-10 23:19:43 +01:00
Jarek Potiuk 09febee4c1
Fixes continuous image rebuilding with Breeze (#12256)
There was a problem that even if we pulled the right image
from the Airflow repository, we have not tagged it properly.

Also added protection for people who have not yet at all pulled
the Python image from airflow, to force pull for the first time.
2020-11-10 17:34:52 +01:00
John Bampton 502ba309ea
Enable Markdownlint rule - MD022/blanks-around-headings (#12225)
https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md022---headings-should-be-surrounded-by-blank-lines
2020-11-10 10:36:45 +01:00
Kaxil Naik 08d67add52
Beautify Output of setup-installation pre-commit (#12218) 2020-11-10 00:54:47 +00:00
Jarek Potiuk ea27f90d29
Adds automated installation of dependent packages (#11526)
When extras are specifying when airflow is installed, this one triggers
installation of dependent packages. Each extra has a set of provider
packages that are needed by the extra and they will be installed
automatically if this extra is specified.

For now we do not add any version specificatiion, until we agree the
process in #11425 and then we should be able to implement an
automated way of getting information about cross-package
version dependencies.

Fixes: #11464
2020-11-09 22:01:19 +00:00
Jarek Potiuk a42bbe21c8
Fix permissions of mounted /tmp directory for Breeze (#12157)
The "tmp" directory is mounted from the host (from tmp folder
in the source airflow directory). This is needed to get some
of our docker-in-docker tools (such as gcloud/aws/java) and
get them working on demand. Thanks to that we do not have
to increase the size of CI image unnecessarily.

Those tools were introduced and made to work in #9376

However this causes some of the standard tools (such as apt-get)
to not work inside the container unless the mounted /tmp
folder has write permission for groups/other.

This PR fixes it.
2020-11-09 22:01:58 +01:00
Jarek Potiuk d8abee6908
Remove popd which is a remnant from past (#12211) 2020-11-09 21:58:37 +01:00
Jarek Potiuk b2a28d1590
Moves provider packages scripts to dev (#12082)
The change #10806 made airflow works with implicit packages
when "airflow" got imported. This is a good change, however
it has some unforeseen consequences. The 'provider_packages'
script copy all the providers code for backports in order
to refactor them to the empty "airflow" directory in
provider_packages folder. The #10806 change turned that
empty folder in 'airflow' package because it was in the
same directory as the provider_packages scripts.

Moving the scripts to dev solves this problem.
2020-11-09 13:27:10 +01:00
Jarek Potiuk eaac361f3b
Provider packages are installed by default in production image (#12154)
This is a fix to a problem introduced in #10806. The change
turned provider packages into namespace packages - which made
them ignored by find_packages function from setup tools - thus
prodiuction image build automatically and used by Kubernetes
tests did not have the provider packages installed.

This PR fixes it and adds future protection during CI tests of
production image to make sure that provider packages are
actually installed.

Fixes #12150
2020-11-09 13:26:24 +01:00
Jarek Potiuk 75bdfaeb9b
Uses always the same Python base image as used for CI image (#12177)
When new Python version is released (bugfixes), we rebuild the CI image
and replace it with the new one, however releasing of the python
image and CI image is often hours or even days apart (we only
release the CI image when tests pass in master with the new python
image). We already use a better approach for Github - we simply
push the new python image to our registry together with the CI
image and the CI jobs are always pulling them from our registry
knowing that the two - python and CI image are in sync.

This PR introduces the same approach. We not only push CI image
but also the corresponding Python image to our registry. This has
no ill effect - DockerHub handles it automatically and reuses
the layers of the image directly from the Python one so it is
merely a label that is stored in our registry that points to the
exact Python image that was used by the last pushed CI image.
2020-11-08 11:20:31 +01:00
Jarek Potiuk 5c60157819
Fixes "--force-clean-images" flag in Breeze (#12156)
The flag was broken - bad cache parameter value was passed.

This PR fixes it.
2020-11-07 13:51:01 +01:00
Jarek Potiuk c7f3410451
Fixes undefined variables (#12155)
There are few more variables that (if not defined) prevent
from using the CI image directly without breeze or the
CI scripts.

With this change you can run:
`docker run -it apache/airflow:master-python3.6-ci`

and enter the image without errors.
2020-11-07 12:18:14 +01:00
Ash Berlin-Taylor 128c9918b5
Update to new helm stable repo (#12137)
Switch out deprecated helm repo for new stable repo.

- https://www.cncf.io/blog/2020/11/05/helm-chart-repository-deprecation-update/
- https://helm.sh/docs/faq/#i-am-getting-a-warning-about-unable-to-get-an-update-from-the-stable-chart-repository
2020-11-06 16:05:18 +00:00
Jarek Potiuk 5351f0d996
Work properly if some variables are not defined (#12135)
Those variables are defined in GitHub environment so when they
were recently addded it was not obvious that they will fail when
running kubernetes tests locally.

This PR fixes that.
2020-11-06 16:56:43 +01:00
Marcus Levine cb070e928b
Refactor Elasticsearch provider to support 1.10.x (#11509) 2020-11-05 23:20:36 +01:00
Daniel Imberman 054de0703a
Add Kubernetes files to selective checks (#12114)
* Add Kubernetes files to selective checks

There are multiple kubernetes-related files that require
running the k8s integration tests. This PR adds those to the
run_selective_tests

* Update scripts/ci/selective_ci_checks.sh

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>

* Update scripts/ci/selective_ci_checks.sh

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>

* Update scripts/ci/selective_ci_checks.sh

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>

* Update scripts/ci/selective_ci_checks.sh

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>

* Update scripts/ci/selective_ci_checks.sh

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2020-11-05 11:54:49 -08:00
J. Daniel Medeiros ded3dbbff0
Update install_mysql.sh (#12101)
After Debian 9 and according to the manual https://manpages.debian.org/stretch/apt/apt-key.8.en.html, after Debian 9  instead of using "apt-key add" a keyring should be placed directly in the /etc/apt/trusted.gpg.d/ directory with a descriptive name and either "gpg" or "asc" as file extension. Also added better redirection on the apt-key list command.
2020-11-05 17:32:39 +01:00
Ash Berlin-Taylor 79836bb92c
Convert OpenAPI client generation tests to use selective checks (#12092)
This test was bundled in with the existing needs-api tests, but then
performed it's _own_ checks on if it should run. This changes that to
have selective_ci_checks.sh do this check.

Additionally CI_SOURCE_REPO was often wrong -- at least for me as I
don't open PRs from ashb/airflow, and this lead to a confusing message:

> https://github.com/ashb/airflow.git Branch my_branch does not exist

But all we were using this for was to find the "parent" commit, but
there is any easier way we can do that: HEAD^1 with a fetch depth of 2
to the checkout option.

So I've removed calculating that and where it is used.

If we need to bring it back we should use the output from the
`potiuk/get-workflow-origin` action -- that gets the correct value
2020-11-04 19:12:49 +00:00
Kaxil Naik bec9f3b29f
Use sys.exit() instead of exit() (#12084)
The `exit` and `quit` functions are actually `site.Quitter` objects and are loaded, at interpreter start up, from site.py. However, if the interpreter is started with the `-S` flag, or a custom site.py is used then exit and quit may not be present. It is recommended to use `sys.exit()` which is built into the interpreter and is guaranteed to be present.
2020-11-04 11:50:52 +00:00
Jarek Potiuk d971c1c0e1
Fixes problem with building a PROD image (#12080)
The change #12050 that aimed at automation of Docker images
building in DockerHub had an undesired effect of overriding the
production image tag with the CI one.

This is fixed by this PR.
2020-11-04 09:31:00 +01:00
Kaxil Naik 4e8f9cc8d0
Enable Black - Python Auto Formmatter (#9550) 2020-11-03 23:51:54 +00:00
Kaxil Naik 8c42cf1b00
Use PyUpgrade to use Python 3.6 features (#11447)
Use features like `f-strings` instead of format across the code-base.
More details: https://github.com/asottile/pyupgrade
2020-11-03 21:53:59 +00:00
Ash Berlin-Taylor 8000ab7375
If we build a new image, we should run more than basic checks (#12070)
This lead to bases such as in #11699 where despite there being changes,
and an image being build, the pre-commit tests were not being run.
2020-11-03 17:42:01 +00:00
Jarek Potiuk 5c199fbddf
Uses DOCKER_TAG when building image in DockerHub (#12050)
DockerHub uses `hooks/build` to build the image and it passes
DOCKER_TAG variable when the script is called.

This PR makes the DOCKER_TAG to provide the default valuei for tag
that is calculated from sources (taking the default branch and
python version). Since it is only set in the DockerHub build, it
should be safe.

Fixes #11937
2020-11-02 22:00:51 +01:00
Jarek Potiuk adbf764ade
Fixes documentation-only selective checks (#12038)
There was a problem that documentation-only checks triggered
selective checks without docs build (they resulted in
basic-checks-only and no images being built.

This occured for example in #12025

This PR fixes it by adding image-build and docs-build as two
separate outputs.
2020-11-02 15:16:24 +01:00
SZN 2354bd2be3
Checks if all the libraries in setup.py are listed in installation.rst file (#12023) 2020-11-02 14:17:41 +01:00
Ash Berlin-Taylor 0314a3a218
Allow airflow.providers to be installed in multiple python folders (#10806)
For example, this allows some providers to be installed in site packages
(`/usr/local/python3.7/...`) and others to be installed in the user folder
(`~/.local/lib/python3.7/...`) and both be importable.

If we didn't have code in `airflow/__init__.py` this would be much
easier to achieve (we simply delete the top level init file would be
enough) - but sadly we can't take that route.

From the docs of pkgutil: https://docs.python.org/3/library/pkgutil.html#module-pkgutil

> This will add to the package’s __path__ all subdirectories of
> directories on sys.path named after the package. This is useful if one
> wants to distribute different parts of a single logical package as
> multiple directories.

Tested as follows:

```
$ pip install /wheels/apache_airflow-2.0.0.dev0-py3-none-any.whl

$ ls -ald $(python -c 'import os; print(os.path.dirname(__import__("airflow").__file__))')/providers
ls: cannot access '/usr/local/lib/python3.7/site-packages/airflow/providers': No such file or directory

$ pip install --constraint <(echo 'apache-airflow==2.0.0.dev0') apache-airflow-backport-providers-redis
$ pip install --user --constraint <(echo 'apache-airflow==2.0.0.dev0') apache-airflow-backport-providers-imap

$ python -c 'import airflow.providers.imap, airflow.providers.redis; print(airflow.providers.imap.__file__); print(airflow.providers.redis.__file__)'
/root/.local/lib/python3.7/site-packages/airflow/providers/imap/__init__.py
/usr/local/lib/python3.7/site-packages/airflow/providers/redis/__init__.py
```
2020-11-01 15:01:28 +00:00
Kamil Breguła 9322f3e46c
Migrate from helm-unittest to python unittest (#11827)
* Migrate from helm-unittest to python unittest

* fixup! Migrate from helm-unittest to python unittest

* fixup! fixup! Migrate from helm-unittest to python unittest
2020-10-30 07:48:22 -07:00
Jarek Potiuk 2124453421
Moves tests that should be always executed to 'always' directory (#11948)
Some tests (testing the structure and importability of
example) should be always run even if core part was not modified.

That's why we move it to "always" directory.
2020-10-30 12:53:25 +01:00
Jarek Potiuk 37eaac3c5d
The PRs which are not approved run subset of tests (#11828)
This PR is an implementation of optimisation - to only run
default values for build matrix in case PR does not have
"okay to test" label.

This "okay to test" label is set when the PR gets approved
but it was not approved before, also then a comment is generated
urging the committer to rebase the PR to run full set of tests.

Additionally a check is added (in-progress) that makes the PR
not yet ready to be merged. Only after re-running it it will
become truly readty to be merged.
2020-10-29 08:07:02 +01:00
Daniel Imberman 0d1ad6648e
Add Python Helm testing framework (#11693)
* Helm Python Testing

* helm change

* add back args
2020-10-27 18:29:47 -07:00
Jarek Potiuk 923cc09830
Fixes the doc pattern in selective checks (#11834)
The pattern contained $ which effectively stopped docs from
being run on doc-only change :(
2020-10-27 10:20:01 +01:00
Kaxil Naik 0f41ba9e77
Use correct name for PostgreSQL (#11869)
`PostgresSQL` -> `PostgreSQL`
2020-10-27 01:47:47 +00:00
Jarek Potiuk 2f4a3d48a8
Occasional docker-compose errors will be easier to diagnose (#11835)
With this change we attempt to better diagnose some occasional
network docker-compose issues that have beeen plaguing us after
we solved or workarounded other CI-related issues. Sometimes
the docker compose jobs fail on checking if the container is
up and running with either of the two errors:

 * 'forward host lookup failed: Unknown host`
 * 'DNS fwd/rev mismatch'

Usually this happens in rabbitMQ and openldap containers.

Both indicate a problem with DNS of the docker engine or maybe
some remnants of the previous docker run that do not allow us
to start those containers.

This change introduces few improvements:

* added --volume in `docker system prune` command which might
  clean-up some anonymous volumes left by the containers between
  runs

* removed docker-compose down --remove-orphans --down command
  after failure, as currently we are anyhow always doing it
  few lines before (before the test). This change will cause
  that our mechanism of logging container logs after failure
  will likely give us more information about in case the root
  cause is rabbitmq or openldap container failing to start

* Increases number of tries to 5 in case of failed containers.
2020-10-26 17:21:21 +01:00
Jarek Potiuk 03905158bb
Local Executor is used by default for MySQL/Postgres breeze (#11792) 2020-10-26 12:16:13 +01:00
Jarek Potiuk 00bec1b09e
Fix the script that builds source for backports (#11846)
First time preparing backports after converting scripts to
also support regular providers. Some small bugs were found
and fixed.
2020-10-26 10:18:16 +01:00
Jarek Potiuk f5410f2486
Removes duplicates from DISABLED_INTEGRATIONS variable (#11831)
Presto DB is checked several times but it also means that
it is added several times to DISABLED_INTEGRATIONS in case
it is not enabled. This commit fixes it.
2020-10-25 11:30:23 +01:00
Kaxil Naik f93175d5a7
Fix typo in scripts/in_container/entrypoint_ci.sh (#11824)
* Fix typo in scripts/in_container/entrypoint_ci.sh

* Update bats_tests.sh
2020-10-24 16:58:01 +02:00
Jarek Potiuk 8d94214575
Switch postgres from 10 to 13 (#11785)
Seems that postgres is really stable when it comes to upgrades,
so we take the assumption that if we test 9.6 and 13, and they
work, all the versions between will also work.

This PR changes Postgres 10 to 13 in tests  and updates documentation
with all the versions in between.
2020-10-24 14:39:01 +02:00
John Bampton aacf6025f6
Fix spelling (#11457) 2020-10-24 12:23:52 +02:00
Kaxil Naik 0218bcf838
Use LocalExecutor by default with tmux + Breeze (#11791)
* Use LocalExecutor by default with tmux + Breeze

* Update run_tmux.sh

* Update run_tmux.sh
2020-10-24 08:07:19 +02:00
Jarek Potiuk 53e5d8f1f2
The .pypirc file is read from docker-context-files (#11779)
If you used context from git repo, the .piprc file was missing and
COPY in Dockerfile is not conditional.

This change copies the .pypirc conditionally from the
docker-context-files folder instead.

Also it was needlessly copied in the main image where it is not
needed and it was even dangerous to do so.
2020-10-23 17:55:15 +02:00
Jarek Potiuk 4d04bb663c
The .tar.gz provider packages are installable now. (#11630)
The packages lacked setup.py and they could not be installed.

This change automatically generates setup.py for the packages and
adds them to the packages.

Fixes: #11546
2020-10-23 16:47:47 +02:00
Jarek Potiuk 0647888c15
Enables splitting tests into smaller chunks (#11659)
We've implemented the capability of running the tests in smaller
chunks and selective running only some of those, but this
capability have been disabled by mistake by default setting of
TEST_TYPE to "All" and not removing it when TEST_TYPES are set
to the sets of tests that should be run.

This should speed up many of our tests and also hopefully
lower the chance of EXIT 137 errors.
2020-10-22 23:25:00 +02:00
John Bampton 172820db4d
Fix case of GitHub (#11398) 2020-10-21 14:32:41 +02:00
Kamil Breguła ffc9aebeb2
Better file extension for Helm template (#11702)
* Better file extension for Helm template

* fixup! Better file extension for Helm template
2020-10-21 12:17:19 +02:00
Jarek Potiuk c568c8886a
Fixing problem with missing output in pre-commits in some cases (#11684)
Dumping logs from container should only be done in CI.

Problem was introduced in #11614
2020-10-20 13:54:12 +02:00
Kamil Breguła 1543923c19
Add Kerberos Auth for PrestoHook (#10488) 2020-10-20 13:43:18 +02:00
Jarek Potiuk e3a0839e21
Security scans are also selective now (#11674)
The security scans take a long time, especially for python code
- it is about ~18 minutes now. This PR reduces strain on the
GitHub actions by only running the scan in pull requests
when any of python/javascript code changed respectively.
2020-10-20 12:19:16 +02:00
Jarek Potiuk 9a90ebeabe
Bats tests should be much faster now for pre-commits. (#11662)
For pre-commit run of the tests only the corresponding tests
for changed .sh files and changed .bats files should be run
2020-10-20 09:21:28 +02:00
Jarek Potiuk 03730891cb
Brings back fixup to CI optimisation (#11671)
The fixup was lost during the rebase. This one restores it
2020-10-20 07:25:08 +02:00
Jarek Potiuk dd1c07b20d
Optimizes CI builds heavily with selective checks (#11656)
* Images are not built if the change is not touching code or docs.
* In case we have no need for CI images we run stripped-down
  pre-commit checks which skip the long checks and only run for
  changed files
* If none of the CLI/Providers/Kubernetes/WWW files changed
  the relevant tests are skipped, unless some of the core files
  changed as well.
* The selective checks logic is explained and documented.

This is the second attempt at the problem with better
strategy to get the list of files from the incoming PR.

The strategy works now better in a number of cases:
* when PR comes from the same repo
* when PR comes from the pull_repo
* when PR contains more than one commit
* when PR is based on older master and GitHub creates
  merge commit
2020-10-20 06:27:20 +02:00
Jarek Potiuk ae06ad01a2
Fixes versioning for pre-release provider packages (#11586)
When we prepare pre-release versions, they are not intended to be
converted to final release versions, so there is no need to replace
version number for them artificially,

For release candidates on the other hand, we should internally use the
"final" version because those packages might be simply renamed to the
final "production" versions.

Fixes #11585
2020-10-19 12:32:07 +02:00
alxdembo 73b0991e48
Sourcing the profile file should be sufficient to update the PATH, re-login is not required. (#11588) 2020-10-19 10:57:10 +01:00
Jarek Potiuk 68f647a4e1
Dumps more logs in case of CI failure (#11614)
We do not dump airflow logs on success any more, but we dump them
and all the container logs in case of failure, so that we can
better investigate cases like #11543 - that includes enabling
full deadlock information dumping in our mysql database.
2020-10-19 08:59:43 +02:00
Kaxil Naik d93b6e53eb
Revert "Optimizes CI builds heavily with selective checks (#11541)" (#11648)
This reverts commit 9237338f75.
2020-10-19 01:56:38 +01:00
Kaxil Naik 8ed2229298
Revert "Fixed an error introduced in selective checks (#11640)" (#11647)
This reverts commit 6fbb235f25.
2020-10-19 01:53:08 +01:00
Kaxil Naik c06addf276
Revert "Fixes selective tests in case of missing merge commits (#11641)" (#11646)
This reverts commit 4fcc71c2ff.
2020-10-19 01:52:27 +01:00
Jarek Potiuk 4fcc71c2ff
Fixes selective tests in case of missing merge commits (#11641)
In case of very simple changes, there might be no merge commits
generated by GitHub. In such cases we should take the commit SHA
instead as the base of change calculation for selective tests.
2020-10-19 01:05:50 +01:00
Jarek Potiuk 6fbb235f25
Fixed an error introduced in selective checks (#11640)
A few remnants of earlier version of the script caused occasional
errors. Error introduced in #11541
2020-10-19 00:18:29 +02:00
Jarek Potiuk bf79578ed0
Fix random kills during pre-commit image building (#11535)
Seems like the trap with several steps and || true does not really
work the way I wanted and when kill is run but the process is
already gone, we had error in the script.

Looks like this approach with sub-process kill will do it.
2020-10-18 23:50:50 +02:00
Jarek Potiuk 4655409982
Improves stability of K8S tests by caching binaries and repeats (#11634)
* Improves stability of K8S tests by caching binaries and repeats

The K8S tests on CI are controlled from the host, not from
inside of the CI container image. Therefore it needs virtualenv
to run the tests as well as some tools such as helm, kubectl
and kind. While those tools can bee downloaded and installed
on demand, from time to time the download fails intermittently.

This change introduces the following improvements:

* the commands to download and setup kind, helm, kubectl are
  repeated up to 4 times in case they fail

* the "bin" directory where those binaries are downloaded is
  cached between runs. Only the same combination of
  versions of the tools are sharing the same cache.

This way both cases - regular re-runs of the same jobs and
upgrade of tools will be much more stable.
2020-10-18 22:45:00 +02:00
Jarek Potiuk 9237338f75
Optimizes CI builds heavily with selective checks (#11541)
* Images are not built if the change is not touching code or docs.
* In case we have no need for CI images we run stripped-down
  pre-commit checks which skip the long checks and only run for
  changed files
* If none of the CLI/Providers/Kubernetes/WWW files changed
  the relevant tests are skipped, unless some of the core files
  changed as well.
* The selective checks logic is explained and documented.
2020-10-18 20:47:21 +02:00
Jarek Potiuk 66ced72fca
Name and optionally preserve data volumes in Breeze (#11628)
So far breeze used in-container data for persisting it (mysql redis,
postgres). This means that the data was kept as long, as long the
containers were running. If you stopped Breeze via `stop` command
the data was always deleted.

This changes the behaviour - each of the Breeze containers has
a named volume where data is kept. Those volumes are also deleted
by default when Breeze is stopped, but you can choose to preserve
them by adding ``--preserve-volumes`` when you run ``stop`` or
``restart`` command.

Fixes: #11625
2020-10-18 16:39:44 +02:00
Tomek Urbaszek e74b861fd8
Expose flower and redis ports in breeze (#11624) 2020-10-18 11:46:22 +02:00
Jarek Potiuk 925f7619e1
Behaviour to install all airflow providers added (#11529)
In Airflow 2.0 we decided to split Airlow into separate providers.
this means that when you prepare core airflow package, providers
are not installed by default. This is not very convenient for
local development though and for docker images built from sources,
where you would like to install all providers by default.

A new INSTALL_ALL_AIRFLOW_PROVIDERS environment variable controls
this behaviour now. It is is set to "true", all packages including
provider packages are installed. If missing or set to false, only
the core provider package is installed.

For Breeze, the default is set to "true", as for those cases you
want to install all providers in your environment. Similarly if you
build the production image from sources. However when you build
image using github tag or pip package, you should specify
appropriate extras to install the required provider packages.

Note that if you install Airflow via 'pip install .' from sources
in local virtualenv, provider packages are not going to be
installed unless you set INSTALL_ALL_AIRFLOW_PROVIDERS to "true".

Fixes #11489
2020-10-17 11:16:28 +02:00
Jarek Potiuk 6733f2d7b9
The scripts fixing ownership and cleaning tmp use docker run (#11569)
The scripts were using docker compose, but they
can be docker run commands. Also they are not needed to be
run by breeze directly in CI image because I've added traps
to run the commands at the exit of all "in_container" scripts.
2020-10-16 10:50:59 +02:00
Jarek Potiuk e7dc964619
Adds capability of installing wheel packages in CI image (#11527)
The production image had the capability of installing images from
wheels (for security teams/air-gaped systems). This capability
might also be useful when building CI image espeically when
we are installing separately core and providers packages and
we do not yet have provider packages available in PyPI.

This is an intermediate step to implement #11490
2020-10-15 15:19:18 +02:00
Jarek Potiuk 3447b55ba5
More stable kubernetes port forwarding (#11538)
Seems that port forwarding during kubernetes tests started to behave
erratically - seems that kubectl port forward sometimes might hang
indefinitely rather than connect or fail.
We change the strategy a bit to try to allocate
increasing port numbers in case something like that happens.
2020-10-15 11:05:58 +02:00
Kaxil Naik e9f7bdd25f
Fix typo in scripts/ci/libraries/_initialization.sh (#11517)
`initialized` -> `initialize`
2020-10-14 08:24:32 +02:00
Jarek Potiuk 4297abab26
Combine back multiple test types into single jobs (#11504)
Seems that by splitting the tests into many small jobs has a bad
effect - since we only have queue size = 180 for the whole Apache
organisation, we are competing with other projects for the jobs
and with the jobs being so short we got starved much more than if
we had long jobs. Therefore we are re-combining the test types into
single jobs per Python version/Database version and run all the
tests sequentially on those machines.
2020-10-13 20:51:08 +02:00
Jarek Potiuk 16e7129719
Added support for provider packages for Airflow 2.0 (#11487)
* Separate changes/readmes for backport and regular providers

We have now separate release notes for backport provider
packages and regular provider packages.

They have different versioning - backport provider
packages with CALVER, regular provider packages with
semver.

* Added support for provider packages for Airflow 2.0

This change consists of the following changes:

* adds provider package support for 2.0
* adds generation of package readme and change notes
* versions are for now hard-coded to 0.0.1 for first release
* adds automated tests for installation of the packages
* rename backport package readmes/changes to BACKPORT_*
* adds regulaar packge readmes/changes
* updates documentation on generating the provider packaes
* adds CI tests for the packages
* maintains backport packages generation with --backports flag

Fixes #11421
Fixes #11424
2020-10-13 16:33:00 +01:00
Jarek Potiuk 32f2a45819
Rename backport packages to provider packages (#11459)
In preparation for adding provider packages to 2.0 line we
are renaming backport packages to provider packages.

We want to implement this in stages - first to rename the
packages, then split-out backport/2.0 providers as part of
the #11421 issue.
2020-10-12 16:29:48 +02:00
Jarek Potiuk 369bbf0427
Selective tests - depends on files changed in the commit. (#11417)
This is final step of implementing #10507 - selective tests.
Depending on files changed by the incoming commit, only subset of
the tests are exucted. The conditions below are evaluated in the
sequence specified below as well:

* In case of "push" and "schedule" type of events, all tests
  are executed.

* If no important files and folders changed - no tests are executed.
  This is a typical case for doc-only changes.

* If any of the environment files (Dockerfile/setup.py etc.) all
  tests are executed.

* If no "core/other" files are changed, only the relevant types
  of tests are executed:

  * API - if any of the API files/tests changed
  * CLI - if any of the CLI files/tests changed
  * WWW - if any of the WWW files/tests changed
  * Providers - if any of the Providers files/tests changed

* Integration Heisentests, Quarantined, Postgres and MySQL
  runs are always run unless all tests are skipped like in
  case of doc-only changes.

* If "Kubernetes" related files/tests are changed, the
  "Kubernetes" tests with Kind are run. Note that those tests
  are run separately using Host environment and those tests
  are stored in "kubernetes_tests" folder.

* If some of the core/other files change, all tests are run. This
  is calculated by substracting all the files count calculated
  above from the total count of important files.

Fixes: #10507
2020-10-12 00:28:11 +02:00
Jarek Potiuk ce2f19d30d
Fix constraints generation script (#11412)
Constraints generation script was broken by recent changes
in naming of constraints URL variables and moving generation
of the link to the Dockerfile

This change restores the script's behaviour.
2020-10-11 17:49:19 +02:00
Jarek Potiuk 5bc5994c2c
Split tests to more sub-types (#11402)
We seem to have a problem with running all tests at once - most
likely due to some resource problems in our CI, therefore it makes
sense to split the tests into more batches. This is not yet full
implementation of selective tests but it is going in this direction
by splitting to Core/Providers/API/CLI tests. The full selective
tests approach will be implemented as part of #10507 issue.

This split is possible thanks to #10422 which moved building image
to a separate workflow - this way each image is only built once
and it is uploaded to a shared registry, where it is quickly
downloaded from rather than built by all the jobs separately - this
way we can have many more jobs as there is very little per-job
overhead before the tests start runnning.
2020-10-11 07:40:31 -07:00
Jarek Potiuk f9dddd5d3c
Workarounds "unknown blob" issue by introducing retries (#11411)
We have started to experience "unknown_blob" errors intermittently
recently with GitHub Docker registry. We might eventually need
to migrate to GCR (which eventually is going to replace the
Docker Registry for GitHub:

The ticket is opened to the Apache Infrastructure to enable
access to the GCR and to make some statements about Access
Rights management for GCR https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20959
Also a ticket to GitHub Support has been raised about it
https://support.github.com/ticket/personal/0/861667 as we
cannot delete our public images in Docker registry.

But until this happens, the workaround might help us
to handle the situations where we got intermittent errors
while pushing to the registry. This seems to be a common
error, when NGINX proxy is used to proxy Github Registry so
it is likely that retrying will workaround the issue.
2020-10-11 06:02:46 +02:00
Jarek Potiuk 04973904c3
Constraints and PIP packages can be installed from local sources (#11382)
* Constraints and PIP packages can be installed from local sources

This is the final part of implementing #11171 based on feedback
from enterprise customers we worked with. They want to have
a capability of building the image using binary wheel packages
that are locally available and the official Dockerfile. This means
that besides the official APT sources the Dockerfile build should
not needd GitHub, nor any other external files pulled from outside
including PIP repository.

This change also includes documentation on how to prepare set of
such binaries ready for inspection and review by security teams
in Enterprise environment. Such sets of "known-working-binary-whl"
files can then be separately committed, tracked and scrutinized
in an artifact repository of such an Enterprise.

Fixes: #11171

* Update docs/production-deployment.rst
2020-10-10 12:58:09 +02:00
John Bampton 39fc961eec
Fix case of JavaScript. (#10957) 2020-10-10 00:50:31 +02:00
Jarek Potiuk d752575e78
Revert "Revert "Adds --install-wheels flag to breeze command line (#11317)" (#11348)" (#11356)
This reverts commit f67e6cb805.
2020-10-10 00:41:11 +02:00
Jarek Potiuk e198077f3e
Add pypirc initialization (#11386)
This PR needs to be merged first in order to handle the #11385
which requires .pypirc to be created before dockerfile gets build.

This means that the script change needs to be merged to master
first in this PR.
2020-10-09 22:55:03 +02:00
Jarek Potiuk f5b7bbcb92
Better diagnostics when there are problems with Kerberos (#11353) 2020-10-08 21:08:11 +02:00
Ash Berlin-Taylor f67e6cb805
Revert "Adds --install-wheels flag to breeze command line (#11317)" (#11348)
This reverts commit de07d135ae.
2020-10-08 14:35:04 +01:00
Jarek Potiuk de07d135ae
Adds --install-wheels flag to breeze command line (#11317)
If this flag is specified it will look for wheel packages placed in dist
folder and it will install the wheels from there after installing
Airflow. This is useful for testing backport packages as well as in the
future for testing provider packages for 2.0.
2020-10-08 10:06:53 +02:00
Jarek Potiuk e2655f60b3
Prints nicer message in case of git push errors (#11320)
We started to get more often "unknown blob" kind of errors when
pushing the images to GitHub Registry. While this is clearly a
GitHub issue, it's frequency of occurence and unclear message
make it a good candidate to write additional message with
instructions to the users, especially that now they have
an easy way to get to that information via status checks and
links leading to the log file, when this problem happens during
image building process.

This way users will know that they should simply rebase or
amend/force-push their change to fix it.
2020-10-07 10:30:16 +02:00
Jarek Potiuk 22c6a843d7
Adds --no-rbac-ui flag for Breeze airflow 1.10 installation (#11315)
When installing airflow 1.10 via breeze we now enable rbac
by default, but we can disable it with --no-rbac-ui flag.

This is useful to test different variants of 1.10 when testing
release candidataes in connection with the 'start-airflow'
command.
2020-10-07 01:00:00 +01:00
mucio 03e0ff24b1
Breeze start-airflow command wasn't able to initialize the db in 1.10.x (#11207) 2020-10-06 10:40:32 +02:00
Kaxil Naik 6dce7a6c26
Enable MySQL 8 CI jobs (#11247)
closes https://github.com/apache/airflow/issues/11164
2020-10-04 13:45:05 +02:00
Jarek Potiuk e89d384688
The bats script for CI image is now placed in the docker folder (#11262)
The script was previously placed in scripts/ci which caused
a bit of a problem in 1-10-test branch where PRs were using
scripts/ci from the v1-10-test HEAD but they were missing
the ci script from the PR.

The scripts "ci" are parts of the host scripts that are
always taken from master when the image is built, but
all the other stuff should be taken from "docker"
folder - which will be taken from the PR.
2020-10-04 08:30:11 +02:00
Kaxil Naik 3db2e7cbfb
Breeze: Fix issue with pulling an image via ID (#11255) 2020-10-03 12:56:19 +01:00
Jarek Potiuk ebd7150862
More customizable build process for Docker images (#11176)
* Allows more customizations for image building.

This is the third (and not last) part of making the Production
image more corporate-environment friendly. It's been prepared
for the request of one of the big Airflow user (company) that
has rather strict security requirements when it comes to
preparing and building images. They are committed to
synchronizing with the progress of Apache Airflow 2.0 development
and making the image customizable so that they can build it using
only sources controlled by them internally was one of the important
requirements for them.

This change adds the possibilty of customizing various steps in
the build process:

* adding custom scripts to be run before installation of both
  build image and runtime image. This allows for example to
  add installing custom GPG keys, and adding custom sources.

* customizing the way NodeJS and Yarn are installed in the
  build image segment - as they might rely on their own way
  of installation.

* adding extra packages to be installed during both build and
  dev segment build steps. This is crucial to achieve the same
  size optimizations as the original image.

* defining additional environment variables (for example
  environment variables that indicate acceptance of the EULAs
  in case of installing proprietary packages that require
  EULA acceptance - both in the build image and runtime image
  (again the goal is to keep the image optimized for size)

The image build process remains the same when no customization
options are specified, but having those options increases
flexibility of the image build process in corporate environments.

This is part of #11171.

This change also fixes some of the issues opened and raised by
other users of the Dockerfile.

Fixes: #10730
Fixes: #10555
Fixes: #10856

Input from those issues has been taken into account when this
change was designed so that the cases described in those issues
could be implemented. Example from one of the issue landed as
an example way of building highly customized Airflow Image
using those customization options.

Depends on #11174

* Update IMAGES.rst

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-09-29 15:30:00 +02:00
Jarek Potiuk 17c810ec36
Fixes image tag readonly failure (#11194)
The image builds fine, but produces an unnecessary error message.

Bug Introduced in c9a34d2ef9
2020-09-29 13:07:51 +02:00
Omair Khan 68e0eb6976
in_container bats pre-commit hook and updated bats-tests hook (#11179) 2020-09-29 11:59:06 +02:00
Jarek Potiuk c9a34d2ef9
Optionally tags image when building with Breeze (#11181)
Breeze tags the image based on the default python version,
branch, type of the image, but you might want to tag the image
in the same command especially in automated cases of building
the image via CI scripts or security teams that tag the imge
based on external factors (build time, person etc.).

This is part of #11171 which makes the image easier to build in
corporate environments.
2020-09-29 11:45:37 +02:00
Jarek Potiuk 044b441257
Conditional MySQL Client installation (#11174)
This is the second step of making the Production Docker Image more
corporate-environment friendly, by making MySQL client installation
optional. Instaling MySQL Client on Debian requires to reach out
to oracle deb repositories which might not be approved by security
teams when you build the images. Also not everyone needs MySQL
client or might want to install their own MySQL client or MariaDB
client - from their own repositories.

This change makes the installation step separated out to
script (with prod/dev installation option). The prod/dev separation
is needed because MySQL needs to be installed with dev libraries
in the "Build" segment of the image (requiring build essentials
etc.) but in "Final" segment of the image only runtime libraries
are needed.

Part of #11171

Depends on #11173.
2020-09-27 18:56:58 +02:00
mucio 0db7a30782
New Breeze command start-airflow, it replaces the previous flag (#11157) 2020-09-27 18:31:50 +02:00
Jarek Potiuk f16354bc02
Optionally disables PIP cache from GitHub during the build (#11173)
This is first step of implementing the corporate-environment
friendly way of building images, where in the corporate
environment, this might not be possible to install the packages
using the GitHub cache initially.

Part of #11171
2020-09-27 18:00:03 +02:00
Jarek Potiuk 620b0989b8
Add Helm Chart linting (#11108) 2020-09-24 13:02:11 +02:00
Kaxil Naik 7644c37082
Revert "Introducing flags to skip example dags and default connections (#11099)" (#11110)
This reverts commit 0edc3dd579.
2020-09-23 19:47:43 +01:00
Kaxil Naik ccfbc319dd
Fix sort-in-the-wild pre-commit on Mac (#11103) 2020-09-23 15:10:15 +01:00
mucio 0edc3dd579
Introducing flags to skip example dags and default connections (#11099) 2020-09-23 14:56:29 +02:00
Jarek Potiuk 3db4d3b04d
All versions in CI yamls are not hard-coded any more (#10959)
GitHub Actions allow to use `fromJson` method to read arrays
or even more complex json objects into the CI workflow yaml files.

This, connected with set::output commands, allows to read the
list of allowed versions as well as default ones from the
environment variables configured in
./scripts/ci/libraries/initialization.sh

This means that we can have one plece in which versions are
configured. We also need to do it in "breeze-complete" as this is
a standalone script that should not source anything we added
BATS tests to verify if the versions in breeze-complete
correspond with those defined in the initialization.sh

Also we do not limit tests any more in regular PRs now - we run
all combinations of available versions. Our tests run quite a
bit faster now so we should be able to run more complete
matrixes. We can still exclude individual values of the matrixes
if this is too much.

MySQL 8 is disabled from breeze for now. I plan a separate follow
up PR where we will run MySQL 8 tests (they were not run so far)
2020-09-21 20:02:04 +02:00
mucio 17faea0b5c
Starting breeze will run an init script after the environment is setup (#11029)
Added the possibility to run an init script
2020-09-21 11:58:30 +01:00
Daniel Imberman cba51d49ee
Simplify the K8sExecutor and K8sPodOperator (#10393)
* Simplify Airflow on Kubernetes Story

Removes thousands of lines of code that essentially ammount to us
re-creating the Kubernetes API. Will offer a faster, simpler
KubernetesExecutor for 2.0

* Fix podgen tests

* fix documentation

* simplify validate function

* @mik-laj comments

* spellcheck

* spellcheck

* Update airflow/executors/kubernetes_executor.py

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-09-17 08:40:20 -07:00
Jarek Potiuk 4a46f4368b
Allows to build production images for 1.10.2 and 1.10.1 Airflow (#10983)
Airflow below 1.10.2 required SLUGIFY_USES_TEXT_UNIDECODE env
variable to be set to yes.

Our production Dockerfile and Breeze supports building images
for any version of airflow >= 1.10.1 but it failed on
1.10.2 and 1.10.1 because this variable was not set.

You can now set the variable when building image manually
and Breeze does it automatically if image is 1.10.1 or 1.10.2

Fixes #10974
2020-09-17 14:25:34 +02:00
Kaxil Naik a575c79cee
Fix 'Upload documentation' step in CI (#10981) 2020-09-16 19:57:25 -07:00
Kaxil Naik e066260ef8
Improve the Error message in Breeze for invalid params (#10980)
Changed `Is` to `Passed`

Before:

```

ERROR:  Allowed backend: [ sqlite mysql postgres ]. Is: 'dpostgres'.

Switch to supported value with --backend flag.
```

After:

```

ERROR:  Allowed backend: [ sqlite mysql postgres ]. Passed: 'dpostgres'.

Switch to supported value with --backend flag.
```
2020-09-17 03:21:47 +01:00
Ash Berlin-Taylor 1ed621ea86
Fix static error (tabs) introduced in #10971 (#10973) 2020-09-16 14:39:17 +01:00
mucio c9f006b540
added environment configuration for using --start-airflow (#10971) 2020-09-16 13:41:58 +02:00
John Bampton ce19657ec6
Fix case of GitHub. (#10955)
Changed `Github` to `GitHub`.
2020-09-15 14:49:27 -04:00
Jarek Potiuk 45272a8e41
Github repository can be overridden in command line by Breeze (#10943)
During testing v1-10-test backport for Breeze the
--github-repository flag did not work. It turned out that
the lowercase variable was not re-set when the flag was
provided by Breeze.

This change causes the lowercasing to be run just before it
is used to make sure that the GITHUB_REPOSITORY value
is used after it's been overwritten.
2020-09-15 15:42:31 +02:00
Jarek Potiuk 14f27635f6
Fixes retrieval of correct branch in non-master related builds (#10912)
When we ported the new CI mechanism to v1-10-test it turned out
that we have to correct the retrieval of DEFAULT BRANCH
and DEFAULT_CONSTRAINTS_BRANCH.

Since we are building the images using the "master" scripts, we need to
make sure the branches are retrieved from _initialization.sh of the
incoming PR, not from the one in the master branch.

Additionally versions 2.7 and 3.5 builds have to be merged to
master and excluded when the build is run targeting master branch.
2020-09-15 15:24:33 +02:00
Jarek Potiuk b2dc346062
Make breeeze-complete Google Shell Guide compatible (#10708)
Also added unit tests for breeze-complete
Part of #10576
2020-09-14 10:21:09 +02:00
Kaxil Naik 69be4b8bce
Fix typos in scripts/ci/docker-compose/local.yml (#10906)
`an` -> `on` (grammatically that makes sense)
2020-09-13 22:41:59 +01:00
Kaxil Naik 9c866cd9ef
Fix syntax error in Dockerfile 'maintainer' Label (#10899) 2020-09-12 18:32:07 +02:00
Jarek Potiuk 106c0f556f
Add pre-commit to sort INTHEWILD.md file automatically (#10851) 2020-09-12 18:26:12 +02:00
mucio 47e592e3a0
Flag --start-airflow for breeze (#10837) 2020-09-11 23:26:56 +02:00
Daniel Imberman 56bd9b7d6b
Modify helm chart to use pod_template_file (#10872)
* Modify helm chart to use pod_template_file

Since we are deprecating most k8sexecutor arguments
we should use the pod_template_file when launching airflow
using the KubernetesExecutor

* fix tests

* one more nit

* fix dag command

* fix pylint
2020-09-11 10:47:59 -07:00
Jarek Potiuk a356656d44
Make dockerfiles Google Shell Guide Compliant (#10734)
Part of #10576
2020-09-09 14:04:16 +02:00
Jarek Potiuk 409ebc1097
Make scripts/ci/tools Google Shell Guide Compatible (#10811)
Part of #10576
2020-09-09 11:44:17 +02:00
Jarek Potiuk 40939dca86
Make airflow testing Google Shell Guide compatible (#10813)
Part of #10576
2020-09-09 11:43:25 +02:00
Jarek Potiuk c60fcccdb6
Fix integration tests being accidentally excluded (#10807)
The change from #10769 accidentally switched Integration tests
into far-longer run unit tests (we effectively run the tests
twice and did not run integration tests.

This fixes the problem by removing readonly status from
INTEGRATIONS and only setting it after the integrations are
set.
2020-09-08 20:46:46 +02:00
Jarek Potiuk 3c6fdd84f8
Make ci/backport_packages Google Shell guide compliant (#10733) 2020-09-08 19:29:37 +02:00
Jarek Potiuk 71e1d09175
Fixed wrong "-e" on md5 file status check (#10803)
The "-e" flag was not reset properly in the md5 status check
which could lead in some cases to removing output of flake check.
2020-09-08 19:23:07 +02:00
Jarek Potiuk 4f07463cf2
Make script/ci/kubernetes Google Shell Guide Compatible (#10746)
Part of #10576
2020-09-08 19:21:45 +02:00
Jarek Potiuk 43303f10aa
Make script/ci/images Google Shell Guide compatible (#10745)
Part of #10576
2020-09-08 19:20:26 +02:00
Jarek Potiuk cd0cc4ca86
Check that all pre-commits are synchronized code<>docs (#10789)
Until pre-commit implements export of all configured
checks, we need to maintain the list manually updated.

We check both - pre-commit list in breeze-complete and
descriptions in STATIC_CODE_CHECKS.rst
2020-09-08 14:06:42 +02:00
Jarek Potiuk 4de67a6731
Move dev docker images to airflow registry (#9652)
Part of #9401
2020-09-08 10:07:10 +02:00
Jarek Potiuk b746f33fc6
Removes stable tests from quarantine (#10768)
We've observed the tests for last couple of weeks and it seems
most of the tests marked with "quarantine" marker are succeeding
in a stable way (https://github.com/apache/airflow/issues/10118)
The removed tests have success ratio of > 95% (20 runs without
problems) and this has been verified a week ago as well,
so it seems they are rather stable.

There are literally few that are either failing or causing
the Quarantined builds to hang. I manually reviewed the
master tests that failed for last few weeks and added the
tests that are causing the build to hang.

Seems that stability has improved - which might be casued
by some temporary problems when we marked the quarantined builds
or too "generous" way of marking test as quarantined, or
maybe improvement comes from the #10368 as the docker engine
and machines used to run the builds in GitHub experience far
less load (image builds are executed in separate builds) so
it might be that resource usage is decreased. Another reason
might be Github Actions stability improvements.

Or simply those tests are more stable when run isolation.

We might still add failing tests back as soon we see them behave
in a flaky way.

The remaining quarantined tests that need to be fixed:
 * test_local_run (often hangs the build)
 * test_retry_handling_job
 * test_clear_multiple_external_task_marker
 * test_should_force_kill_process
 * test_change_state_for_tis_without_dagrun
 * test_cli_webserver_background

We also move some of those tests to "heisentests" category
Those testst run fine in isolation but fail
the builds when run with all other tests:
 * TestImpersonation tests

We might find that those heisentest can be fixed but for
now we are going to run them in isolation.

Also - since those quarantined tests are failing more often
the "num runs" to track for those has been decreased to 10
to keep track of 10 last runs only.
2020-09-08 07:36:12 +02:00
Jarek Potiuk ef0d639b34
Fixes pre-commit failing on build step (#10785)
When rebuildig the image during commit, kill command failed to
find the spinner job to kill (this is just preventive measure)
and failed the rebuild step in pre-commit.

This is now fixed.
2020-09-07 22:38:39 +02:00
Jarek Potiuk 1959d6aee2
Make static checks Google Shell Guide compatible (#10750)
Part of #10576
2020-09-07 08:05:07 +02:00
Jarek Potiuk 18b80f34e8
The scripts to run tests properly initialises constants (#10769)
The constants were initialised after the readonly status was set
for the constants in the test script.

This was mainly about default values for those consttants (but this
has already been handled by the _script_init.sh but more importantly
the INTEGRATIONS were not properly initialized that cause skipping of
some integration tests.
2020-09-07 08:03:48 +02:00
Jarek Potiuk ebb0a97018
Make ci/scripts/pre-commit Google Shell Guide compatible (#10748)
Part of #10576
2020-09-06 20:00:54 +02:00
Jarek Potiuk fbce92e8e7
The verbose functions will not exit immediately if not asked to (#10731)
The docker(), helm(), kubectl() functions replace the real tools
to get verbose behaviour (we can print the exact command being
executed for those. But when 'set +e' was set before the command
was called - indicating that error in those functions should be
ignored - this did not happen. The functions set 'set -e' just
before returning the non-zero value, effectively exiting the
script right after. This caused first time experience to be not
good.

The fix also fixes behaviour of stdout and stderr for those
functions - previously they were joined to be able to be
printed to OUTPUT_FILE but this lost the stderr/stdout
distinction. Now both stdout and stderr are printed to the
output file but they are also redirected to stdout/stderr
respectively, so that 2>/dev/null works as expected.

While fixing it, it turned out that one of the remove_images
methods was not used any more - merged it with the breeze one.
2020-09-06 19:56:35 +02:00
Jarek Potiuk e3c83da984
Check all dockerfiles with hadolint (#10754)
The hadolint check only checked the "main dir" Dockerfile
but we have more of them now. All of them are now checked.

The following problems are fixed:

 * DL3000 Use absolute WORKDIR
 * DL4000 MAINTAINER is deprecated
 * DL4006 Set the SHELL option -o pipefail before RUN with a pipe in it.
 * SC2046 Quote this to prevent word splitting.

The followiing problems are ignored:

 * DL3018 Pin versions in apk add. Instead of `apk add <package>` use `apk add
   <package>=<version>`
2020-09-06 18:06:05 +02:00
Jarek Potiuk ba36f876dc
Make scripts/ci/openapi Google Shell Guide compatible (#10747)
Part of #10576
2020-09-05 22:25:39 +02:00
Kush 28c21cfd1d
clean-logs script for Dockerfile: trim logs before sleep (#10685)
If the pod restarts before the sleep time is over, the trim command will not run. I think it's better if we reorder the commands to execute the delete and then go to sleep. At the moment sleep is every 15 mins but people will just increase the EVERY line if they want longer sleep time and can encounter this bug.
2020-09-05 17:40:57 +02:00
Jarek Potiuk e4de7288a3
Switches to better BATS asserts (#10718)
BATS has additional libraries of asserts that are much more
straightforward and nicer to write tests for bash scripts

There is no dockerfile from BATS that contains those, so we
had to build our own (but it follows the same structure
as #9652 - where we keep our dev docker image
sources inside our repository and the generated docker images
in "apache/airflow:<tool>-CALVER-TOOLVER format.

We have more BATS unit test to add - following #10576
and this change will be of great help.
2020-09-04 22:25:29 +02:00
João Marques 5b6464f489
Migrate speccy to spectral in OpenAPI linting. (#10351) 2020-09-03 18:06:23 +02:00
Jarek Potiuk 4e09cb53ea
Add packages to function names in bash (#10670) (#10696)
Inspired by the Google Shell Guide where they mentioned
separating package names with :: I realized that this was
one of the missing pieces in the bash scripts of ours.

While we already had packages (in libraries folders)
it's been difficult to realise which function is where.

With introducing packages - equal to the library file name
we are *almost* at a level of a structured language - and
it's easier to find the functions if you are looking for them.

Way easier in fact.

Part of #10576

(cherry picked from commit cc551ba793)
(cherry picked from commit 2bba276f0f06a5981bdd7e4f0e7e5ca2fe84f063)
2020-09-02 21:58:37 +02:00
Jarek Potiuk 649ce4ba9d
Implement Google Shell Conventions for breeze script (#10695)
* Implement Google Shell Conventions for breeze script … (#10651)

Part of #10576

First (and the biggest of the series of commits to introduce
Google Shell Conventions in our bash scripts.

This is about the biggest and the most complex breeze script
so it is rather huge but it is difficult to split it into
smaller pieces.

The rules implemented (from the conventions):

 * constants and exported variables are CAPITALIZED, where
   local/temporary variables are lowercase

 * following the shell guide, once all the variables are set to their
   final values (either from exported variables, calculation or --switches
   ) I have a single function that makes all the variables read-only. That
   helped to clean-up a lot of places where same functions was called
   several times, or where variables were defined in a few places. Now the
   behavior should be rather consistent and we should easily catch some
   duplications

 * function headers (following the guide) explaining arguments,
   variables expected, variables modified in the functions used.

 * setting the variables as read-only also helped to clean-up the "ifs"
   where we often had ":=}" in variables and != "" or == "". Those are
   replaced with `=}` and tests are replaced with `-n` and `-z` - also
   following the shell guide (readonly helped to detect and clean all
   such cases). This also should be much more robust in the future.

 * reorganized initialization of those constants and variables - simplified
   a few places where initialization was overlapping. It should be much more
   straightforward and clean now

 * a number of internal function breeze variables are "local" - this is
   helpful in accidental variables overwriting and keeping stuff localized

 * trap_add function is separated out to help in cases where we had
   several traps handling the same signals.

(cherry picked from commit 46c8d6714c)
(cherry picked from commit c822fd7b4bf2a9c5a9bb3c6e783cbea9dac37246)

* fixup! Implement Google Shell Conventions for breeze script … (#10651)
2020-09-02 21:55:50 +02:00
Kaxil Naik 9a10f83ab0
Revert recent breeze changes (#10651 & #10670) (#10694)
* Revert "Add packages to function names in bash (#10670)"

This reverts commit cc551ba793.

* Revert "Implement Google Shell Conventions for breeze script … (#10651)"

This reverts commit 46c8d6714c.
2020-09-02 17:27:36 +01:00
Jarek Potiuk cc551ba793
Add packages to function names in bash (#10670)
Inspired by the Google Shell Guide where they mentioned
separating package names with :: I realized that this was
one of the missing pieces in the bash scripts of ours.

While we already had packages (in libraries folders)
it's been difficult to realise which function is where.

With introducing packages - equal to the library file name
we are *almost* at a level of a structured language - and
it's easier to find the functions if you are looking for them.

Way easier in fact.

Part of #10576
2020-09-01 13:40:06 +02:00
Jarek Potiuk 46c8d6714c
Implement Google Shell Conventions for breeze script … (#10651)
Part of #10576

First (and the biggest of the series of commits to introduce
Google Shell Conventions in our bash scripts.

This is about the biggest and the most complex breeze script
so it is rather huge but it is difficult to split it into
smaller pieces.

The rules implemented (from the conventions):

 * constants and exported variables are CAPITALIZED, where
   local/temporary variables are lowercase

 * following the shell guide, once all the variables are set to their
   final values (either from exported variables, calculation or --switches
   ) I have a single function that makes all the variables read-only. That
   helped to clean-up a lot of places where same functions was called
   several times, or where variables were defined in a few places. Now the
   behavior should be rather consistent and we should easily catch some
   duplications

 * function headers (following the guide) explaining arguments,
   variables expected, variables modified in the functions used.

 * setting the variables as read-only also helped to clean-up the "ifs"
   where we often had ":=}" in variables and != "" or == "". Those are
   replaced with `=}` and tests are replaced with `-n` and `-z` - also
   following the shell guide (readonly helped to detect and clean all
   such cases). This also should be much more robust in the future.

 * reorganized initialization of those constants and variables - simplified
   a few places where initialization was overlapping. It should be much more
   straightforward and clean now

 * a number of internal function breeze variables are "local" - this is
   helpful in accidental variables overwriting and keeping stuff localized

 * trap_add function is separated out to help in cases where we had
   several traps handling the same signals.
2020-08-31 13:24:53 +02:00
Kamil Breguła 7200835d0e
Improve output of check_environment.sh (#10631) 2020-08-28 21:53:11 +02:00
Jarek Potiuk cd1f794242
Bring back some inclusions before we solve cyclic deps problems (#10551) 2020-08-25 20:04:51 +02:00
Jarek Potiuk c6e6d6dedd
Helm Docker image sources are now included in the Airlfow codebase (#9650)
We can now build all the images from Airlfow sources in
a reproducible fashion and our users can use the helm chart
based on the images build from official images + code in
Airflow Codebase.

We also have consistent versioning scheme based on
calver version of releasing the images coupled with
the version of the original package.

Part of #9401
2020-08-25 16:01:39 +01:00
Jarek Potiuk 1775474484
Make configuration.py Pylint compatible (#10494) 2020-08-25 00:16:54 +02:00
Jarek Potiuk 4f6d53eaa7
Make models/taskinstance.py pylint compatible (#10499) 2020-08-25 00:16:05 +02:00
Jarek Potiuk 2f2d8dbfaf
Remove all "noinspection" comments native to IntelliJ (#10525)
We have already fixed a lot of problems that were marked
with those, also IntelluiJ gotten a bit smarter on not
detecting false positives as well as understand more
pylint annotation. Wherever the problem remained
we replaced it with # noqa comments - as it is
also well understood by IntelliJ.
2020-08-25 00:01:37 +02:00
Jarek Potiuk f2da6b419f
Updated documentation for the CI with mermaid sequence diagrams (#10380) 2020-08-24 22:45:28 +02:00
Jarek Potiuk 8fdcc5760a
Make www/views.py pylint compatible (#10498) 2020-08-24 22:38:23 +02:00
Jarek Potiuk be1a67b93f
Make models/crypto.py Pylint-compatible (#10500) 2020-08-24 16:36:04 +01:00