incubator-airflow

Граф коммитов

Автор	SHA1	Сообщение	Дата
Tomek Urbaszek	cba8d62553	Refactor list rendering in commands (#12704 ) This commit unifies the mechanism of rendering output of tabular data. This gives users a possibility to eiter display a tabular representation of data or render it as valid json or yaml payload. Closes: #12699 Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>	2020-12-02 10:20:16 +01:00
Jarek Potiuk	a02e0f746f	User-friendly output of Breeze and CI scripts (#12735 )	2020-12-01 17:44:05 +01:00
Kamil Breguła	027fd743d6	Fix static checks - #12715 (#12729 )	2020-12-01 11:10:54 +00:00
Jarek Potiuk	ebc8fcf199	Improve verification of images with PIP check (#12718 ) Verification of images with PIP is done in separate jobs and they provide useful information to committers and contributors when the pip check fails.	2020-12-01 09:51:24 +01:00
Jarek Potiuk	e4cb0ef192	Output of installing remaining packages is shown also on success (#12723 ) Previously the output of instaling remaining packges when testing provider imports was only shown on error. However it is useful to know what's going on even if it clutters the log. Note that this installation is only needed until we include apache-beam in the installed packages on CI. Related to #12703 This PR shows the output always .	2020-12-01 09:51:05 +01:00
Kamil Breguła	bd90136aaf	Move operator guides to provider documentation packages (#12681 )	2020-11-30 08:48:24 +01:00
Jarek Potiuk	2037303eef	Adds support for Connection/Hook discovery from providers (#12466 ) * Adds support for Hook discovery from providers This PR extends providers discovery with the mechanism of retrieving mapping of connections from type to hook. Fixes #12456 * fixup! Adds support for Hook discovery from providers * fixup! fixup! Adds support for Hook discovery from providers	2020-11-29 15:31:49 +01:00
Jarek Potiuk	e4ab453a37	Setup.cfg change triggers full build (#12684 ) Since we moved part of the setup.py specification to setup.cfg, we should trigger full build when only that file changes.	2020-11-28 12:39:46 +01:00
Kamil Breguła	08bc62b64d	Validate JSON schema files with JSON Schema (#12682 )	2020-11-28 12:12:54 +01:00
Jarek Potiuk	1c500ee62c	Temporarily disable PROD image check until Azure Blob is fixed (#12679 ) This PR disables temporarily PIP check result for production image, until the fix to switch Azure Blob to v12 is fixed.	2020-11-28 10:45:14 +01:00
Jarek Potiuk	3b138d2d60	Remove "@" references from constraints generattion (#12671 ) Likely fixes: #12665	2020-11-28 06:04:45 +01:00
Jarek Potiuk	fa8af2d165	Enable PIP check for both CI and PROD image (#12664 ) This PR enables PIP check after constraints have been updated to be stable and 'pip check' compliant in #12636	2020-11-27 21:33:50 +01:00
Jarek Potiuk	6b3c6add9e	Update setup.py to get non-conflicting set of dependencies (#12636 ) This change upgrades setup.py and setup.cfg to provide non-conflicting `pip check` valid set of constraints for CI image. Fixes #10854 Co-authored-by: Tomek Urbaszek <turbaszek@apache.org> Co-authored-by: Tomek Urbaszek <turbaszek@apache.org>	2020-11-27 20:06:44 +01:00
Jarek Potiuk	41a699a7bd	Implement reading provider information from packages/sources (#12512 ) This PR implements discovering and readin provider information from packages (using entry_points) and - if found - from local provider yaml files for the built-in airflow providers, when they are found in the airflow.provider packages. The provider.yaml files - if found - take precedence over the package-provided ones. Add displaying provider information in CLI Closes: #12470	2020-11-27 18:42:32 +01:00
Tomek Urbaszek	456a1c5dc9	Restructure the extras in setup.py and described them (#12548 ) Closes: #12544 Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com> Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>	2020-11-27 15:34:47 +01:00
Jarek Potiuk	c0843930bf	Allows mounting local sources for github run-id images (#12650 ) The images that are build on github can be used to reproduce the test errors in CI - they should then be mounted without local sources. However in some cases when you are dealing with dependencies for example, it is useful to be able to mount the sources. This PR makes it possible.	2020-11-27 12:15:03 +01:00
Jarek Potiuk	8b9d52f0cc	Adds possibility of forcing upgrade constraint by setting a label (#12635 ) You can now set a label on PR that will force upgrading to latest dependencies in your PR. If committer sets an "upgrade to latest dependencies" label, it will cause the PR to upgrade all dependencies to latest versions of dependencies matching setup.py + setup.cfg configuration.	2020-11-26 11:02:33 +01:00
Ash Berlin-Taylor	54adda50c6	Actually run against the version of the DB we select in the matrix. (#12591 ) Due to a bug in Breeze initialization code, we were always running against Postgres 9.6 and MySQL 5.7, even when the matrix selected something else. (We were overwriting the POSTGRES_VERSION and MYSQL_VERSION environment variables in initialization code)	2020-11-25 21:17:10 +01:00
Jarek Potiuk	6d6ca14675	Fixes inconsistent behaviour of utf8mb4 encoding on Mysql 5.7/8 (#12614 ) * Fix Connection.description migration for MySQL8 Due to not executing MySQL8 tests Fixed in #12591 added description for connection table was not compatible with MySQL8 with utf8mb4 character set. This change adds migration and fixes the previous migration to make it compatible. * Fixes inconsistent setting of encoding on Mysql 5.7/8 We missed that when we added support for differnet mysql versions in #7717 when we removed default character set setting for the database server. This change forces the default on database server to be utf8mb4 - regardless if MySQL 5.7 or MySQL8 is used. Utf8mb4 is default for MySQL8 but latin1 is default fo MySQL 5.7. There was a suspected root cause of the problem: https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html where mysql client falls back to the default collation if the client8 is used with 5.7 database, but this should be no problem if the default DB character set is forced to be utf8mb4 This PR restores forcing the server-side encoding.	2020-11-25 14:37:53 +01:00
Kaxil Naik	486134426b	Rename `[scheduler] max_threads` to `[scheduler] parsing_processes` (#12605 ) From Airflow 2.0, `max_threads` config under `[scheduler]` section has been renamed to `parsing_processes`. This is to align the name with the actual code where the Scheduler launches the number of processes defined by `[scheduler] parsing_processes` to Parse DAG files, calculates next DagRun date for each DAG, serialize them and store them in the DB.	2020-11-25 09:33:19 +00:00
Jarek Potiuk	58e21ed949	Adds missing licence headers (#12593 )	2020-11-25 00:58:01 +01:00
Tobiasz Kędzierski	3fa51f94d7	Add check for duplicates in provider.yaml files (#12578 )	2020-11-24 05:25:29 +01:00
Tomek Urbaszek	919e1d8bd1	Fix sed command on MacOS (#12549 )	2020-11-22 16:51:26 +01:00
Jed Cunningham	9eb92e7343	Support installing providers with no dependencies via extras (#12497 )	2020-11-22 08:19:08 +01:00
Jarek Potiuk	37548f09ac	Fixes unneeded docker-context-files added in CI (#12534 ) We do not need to add docker-context-files in CI before we run first "cache" PIP installation. Adding it might cause the effect that the cache will always be invalidated in case someone has a file added there before building and pushing the image. This PR fixes the problem by adding docker-context files later in the Dockerfile and changing the constraints location used in the "cache" step to always use the github constraints in this case. Closes #12509	2020-11-21 19:21:43 +01:00
Kamil Breguła	c34ef853c8	Separate out documentation building per provider (#12444 ) * POC * fixup! POC	2020-11-20 15:35:56 +01:00
John Bampton	4b59ce827e	Fix case of GitHub in comment (#12474 ) github -> GitHub	2020-11-19 11:01:10 +00:00
Ash Berlin-Taylor	f034d4b78c	Move setup properties out of setup.py in to setup.cfg (#12417 ) yI've moved all the ones that are "static" -- any form of dynamic or interpolated values are left in setup.py If a value is passed as n kwrg to setup and in setup.cfg, the kwarg wins out. The ./build/bin content only depends on the version of tools used (helm//kind/kubectl) and it does not depend on setup.py nor setup.cfg	2020-11-18 13:23:03 +00:00
Kaxil Naik	f4851f7d75	Fix Entrypoint and _CMD config variables (#12411 ) closes https://github.com/apache/airflow/issues/8705 Co-Authored-By: Noël Bardelot <11333203+NBardelot@users.noreply.github.com>	2020-11-18 00:47:13 +00:00
Daniel Imberman	cab86d80d4	Make K8sPodOperator backwards compatible (#12384 ) * Make the KubernetesPodOperator backwards compatible This PR significantly reduces the pain of upgrading to Airflow 2.0 for users of the KubernetesPodOperator. Users will be allowed to continue using the airflow.kubernetes custom classes * spellcheck * spelling * clean up unecessary files in 1.10 * clean up unecessary files in 1.10 * clean up unecessary files in 1.10	2020-11-17 13:47:18 -08:00
Jarek Potiuk	dc31ca4dc6	The messages about remote image check are only shown with -v (#12402 ) The messages might be confusing and should only be shown when verbose is turned on.	2020-11-17 20:32:00 +01:00
Jarek Potiuk	2c0920fba5	Adds mechanism for provider package discovery. (#12383 ) This is a simple mechanism that will allow us to dynamically discover and register all provider packages in the Airflow core. Closes: #11422	2020-11-17 18:48:57 +01:00
Kamil Breguła	2cda2f2a0a	Add missing pre-commit definition - provider-yamls (#12393 )	2020-11-17 15:44:46 +01:00
Kaxil Naik	3e994abc1c	Fix typo in check_environment.sh (#12395 ) `Databsae` -> `Database`	2020-11-17 12:04:03 +00:00
Jarek Potiuk	bfbbb247a8	Add extra info when starting extra actions in Breeze (#12377 )	2020-11-16 02:26:57 +01:00
Jarek Potiuk	0038660fdd	Fixes pull error on building tagged image (#12378 ) When building tagged image on DockerHub the build has been failing as it was trying to pull cached version of prod image but the tagged image should be built from scratch so cache should be disabled. Fixes #12263	2020-11-16 02:26:36 +01:00
Jarek Potiuk	cbd6daf5e6	All kubernetes tests use the same host python version (#12374 ) For Kubernetes tests all tests can be executed in the same python version - default one - no matter which PYTHON_MAJOR_MINOR is used. This is because we are testing Airflow which is deployed via production image. Thanks to that we can fix the python version to be default and avoid any python version problems (this is especially important for cherry-picking to 1.10 where we have python 2.7 and 3.5.	2020-11-15 14:20:22 +01:00
Jarek Potiuk	cd88af8692	Removes the cidfile before generation (#12372 ) If we do not remove the cidfile, the subsequent write to it does not change the content. The errors have been masked by the stderr redirection, so the error was invisible.	2020-11-15 01:29:57 +01:00
Kamil Breguła	6889a333cf	Improvements for operators and hooks ref docs (#12366 )	2020-11-15 00:50:30 +01:00
Kaxil Naik	c9d2b3c5d0	Remove unused import (#12371 )	2020-11-14 23:37:52 +00:00
Jarek Potiuk	167b9b9889	Simplifies check whether the CI image should be rebuilt (#12181 ) Rather than counting changed layers in the image (which was enigmatic, difficult and prone to some magic number) we rely now on random file generated while building the image. We are using the docker image caching mechanism here. The random file will be regenerated only when the previous layer (which is about installling Airflow dependencies for the first time) gets rebuild. And for us this is the indication, that the building the image will take quite some time. This layer should be relatively static - even if setup.py changes the CI image is designed in the way that the first time installation of Airflow dependencies is not invalidated. This should lead to faster and less frequent rebuild for people using Breeze and static checks.	2020-11-13 22:21:39 +01:00
Daniel Imberman	4e362c1347	K8s yaml templates not rendered by k8sexecutor (#12303 ) * K8s yaml templates not rendered by k8sexecutor There is a bug in the yaml template rendering caused by the logic that yaml templates are only generated when the current executor is the k8sexecutor. This is a problem as the templates are generated by the task pod, which is itself running a LocalExecutor. Also generates a "base" template if this taskInstance has not run yet. * fix tests * fix taskinstance test * fix taskinstance * fix pod generator tests * fix podgen * Update tests/kubernetes/test_pod_generator.py Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com> * @ashb comment Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>	2020-11-13 12:06:29 -08:00
Kamil Breguła	7825e8f590	Docs installation improvements (#12304 ) * Improvements for installation docs	2020-11-13 09:38:54 +01:00
Jarek Potiuk	af19b126e9	Deploy was not working from Breeze (#12319 ) The get_cluster_name was called twice resulting in redonly error after rebasing/fixing CI failure in #12163. This PR is fxing it.	2020-11-12 19:53:35 +01:00
Jarek Potiuk	3c2c29187a	Python base image is shared between CI and PROD image (#12280 ) When you are building CI images locally you use the CI base images from apache:airflow/python* now to maintain consistency and avoid often rebuilds. But when you build prod images, you would accidentaly override it with the python base image available in python repo which might be different (newer and not yet tested in CI). This PR changes it to use the same base image which is now tagged in Apache Airflow's dockerhub repository.	2020-11-12 12:31:14 +01:00
Jarek Potiuk	21999dd56e	Added k9s as integrated tool to help with kubernetes testing (#12163 ) The K9s is fantastic tool that helps to debug a running k8s instance. It is terminal-based windowed CLI that makes you several times more productive comparing to using kubectl commands. We've integrated k9s (it is run as a docker container and downloaded on demand). We've also separated out KUBECONFIG of the integrated kind cluster so that it does not mess with kubernetes configuration you might already have. Also - together with that the "surrounding" of the kubernetes tests were simplified and improved so that the k9s integration can be utilized well. Instead of kubectl port forwarding (which caused multitude of problems) we are now utilizing kind's portMapping feature + custom NodePort resource that maps port 8080 to 30007 NodePort which in turn maps it to 8080 port of the Webserver. This way we do not have to establish an external kubectl port forward which is prone to error and management - everything is brought up when Airflow gets deployed to the Kind Cluster and shuts down when the Kind cluster is stopped. Yet another problem fixed was killing of postgres by one of the kubernetes tests ('test_integration_run_dag_with_scheduler_failure'). Instead of just killing the scheduler it killed all pods - including the Postgres one (it was named 'airflow-postgres.*'). That caused various problems, as the database could be left in a strange state. I changed the tests to do what it claimed was doing - so killing only the scheduler during the test. This seemed to improve the stability of tests immensely in my local setup.	2020-11-11 17:15:02 +01:00
Jarek Potiuk	348510f86b	Providers in extras are properly configured and verified (#12265 ) * Providers in extras are properly configured and verified This fixes #12255 - where we published beta2 release with some extras pulling non-existing providers. The exact list of providers that had problems: Wrongly named extras/providers: * apache.presto: it was badly named -> renamed to 'presto' * spark (badly pointing to spark instead of apache.spark) * yandexcloud (the name remains there but we've also added 'yandex' extra to correspond 1-1 with 'yandex' provider Extras that were wrongly marked as having providers, where they had none: * dask * rabbitmq * sentry * statsd * tableau * virtualenv * Update scripts/ci/pre_commit/pre_commit_check_extras_have_providers.py Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com> * Update scripts/ci/pre_commit/pre_commit_check_extras_have_providers.py Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com> Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>	2020-11-11 17:13:57 +01:00
John Bampton	e0b7cae51e	Fix spelling (#12266 )	2020-11-10 23:19:43 +01:00
Jarek Potiuk	09febee4c1	Fixes continuous image rebuilding with Breeze (#12256 ) There was a problem that even if we pulled the right image from the Airflow repository, we have not tagged it properly. Also added protection for people who have not yet at all pulled the Python image from airflow, to force pull for the first time.	2020-11-10 17:34:52 +01:00
John Bampton	502ba309ea	Enable Markdownlint rule - MD022/blanks-around-headings (#12225 ) https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md022---headings-should-be-surrounded-by-blank-lines	2020-11-10 10:36:45 +01:00
Kaxil Naik	08d67add52	Beautify Output of setup-installation pre-commit (#12218 )	2020-11-10 00:54:47 +00:00
Jarek Potiuk	ea27f90d29	Adds automated installation of dependent packages (#11526 ) When extras are specifying when airflow is installed, this one triggers installation of dependent packages. Each extra has a set of provider packages that are needed by the extra and they will be installed automatically if this extra is specified. For now we do not add any version specificatiion, until we agree the process in #11425 and then we should be able to implement an automated way of getting information about cross-package version dependencies. Fixes: #11464	2020-11-09 22:01:19 +00:00
Jarek Potiuk	a42bbe21c8	Fix permissions of mounted /tmp directory for Breeze (#12157 ) The "tmp" directory is mounted from the host (from tmp folder in the source airflow directory). This is needed to get some of our docker-in-docker tools (such as gcloud/aws/java) and get them working on demand. Thanks to that we do not have to increase the size of CI image unnecessarily. Those tools were introduced and made to work in #9376 However this causes some of the standard tools (such as apt-get) to not work inside the container unless the mounted /tmp folder has write permission for groups/other. This PR fixes it.	2020-11-09 22:01:58 +01:00
Jarek Potiuk	d8abee6908	Remove popd which is a remnant from past (#12211 )	2020-11-09 21:58:37 +01:00
Jarek Potiuk	b2a28d1590	Moves provider packages scripts to dev (#12082 ) The change #10806 made airflow works with implicit packages when "airflow" got imported. This is a good change, however it has some unforeseen consequences. The 'provider_packages' script copy all the providers code for backports in order to refactor them to the empty "airflow" directory in provider_packages folder. The #10806 change turned that empty folder in 'airflow' package because it was in the same directory as the provider_packages scripts. Moving the scripts to dev solves this problem.	2020-11-09 13:27:10 +01:00
Jarek Potiuk	eaac361f3b	Provider packages are installed by default in production image (#12154 ) This is a fix to a problem introduced in #10806. The change turned provider packages into namespace packages - which made them ignored by find_packages function from setup tools - thus prodiuction image build automatically and used by Kubernetes tests did not have the provider packages installed. This PR fixes it and adds future protection during CI tests of production image to make sure that provider packages are actually installed. Fixes #12150	2020-11-09 13:26:24 +01:00
Jarek Potiuk	75bdfaeb9b	Uses always the same Python base image as used for CI image (#12177 ) When new Python version is released (bugfixes), we rebuild the CI image and replace it with the new one, however releasing of the python image and CI image is often hours or even days apart (we only release the CI image when tests pass in master with the new python image). We already use a better approach for Github - we simply push the new python image to our registry together with the CI image and the CI jobs are always pulling them from our registry knowing that the two - python and CI image are in sync. This PR introduces the same approach. We not only push CI image but also the corresponding Python image to our registry. This has no ill effect - DockerHub handles it automatically and reuses the layers of the image directly from the Python one so it is merely a label that is stored in our registry that points to the exact Python image that was used by the last pushed CI image.	2020-11-08 11:20:31 +01:00
Jarek Potiuk	5c60157819	Fixes "--force-clean-images" flag in Breeze (#12156 ) The flag was broken - bad cache parameter value was passed. This PR fixes it.	2020-11-07 13:51:01 +01:00
Jarek Potiuk	c7f3410451	Fixes undefined variables (#12155 ) There are few more variables that (if not defined) prevent from using the CI image directly without breeze or the CI scripts. With this change you can run: `docker run -it apache/airflow:master-python3.6-ci` and enter the image without errors.	2020-11-07 12:18:14 +01:00
Ash Berlin-Taylor	128c9918b5	Update to new helm stable repo (#12137 ) Switch out deprecated helm repo for new stable repo. - https://www.cncf.io/blog/2020/11/05/helm-chart-repository-deprecation-update/ - https://helm.sh/docs/faq/#i-am-getting-a-warning-about-unable-to-get-an-update-from-the-stable-chart-repository	2020-11-06 16:05:18 +00:00
Jarek Potiuk	5351f0d996	Work properly if some variables are not defined (#12135 ) Those variables are defined in GitHub environment so when they were recently addded it was not obvious that they will fail when running kubernetes tests locally. This PR fixes that.	2020-11-06 16:56:43 +01:00
Marcus Levine	cb070e928b	Refactor Elasticsearch provider to support 1.10.x (#11509 )	2020-11-05 23:20:36 +01:00
Daniel Imberman	054de0703a	Add Kubernetes files to selective checks (#12114 ) * Add Kubernetes files to selective checks There are multiple kubernetes-related files that require running the k8s integration tests. This PR adds those to the run_selective_tests * Update scripts/ci/selective_ci_checks.sh Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com> * Update scripts/ci/selective_ci_checks.sh Co-authored-by: Jarek Potiuk <jarek@potiuk.com> * Update scripts/ci/selective_ci_checks.sh Co-authored-by: Jarek Potiuk <jarek@potiuk.com> * Update scripts/ci/selective_ci_checks.sh Co-authored-by: Jarek Potiuk <jarek@potiuk.com> * Update scripts/ci/selective_ci_checks.sh Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com> Co-authored-by: Jarek Potiuk <jarek@potiuk.com>	2020-11-05 11:54:49 -08:00
J. Daniel Medeiros	ded3dbbff0	Update install_mysql.sh (#12101 ) After Debian 9 and according to the manual https://manpages.debian.org/stretch/apt/apt-key.8.en.html, after Debian 9 instead of using "apt-key add" a keyring should be placed directly in the /etc/apt/trusted.gpg.d/ directory with a descriptive name and either "gpg" or "asc" as file extension. Also added better redirection on the apt-key list command.	2020-11-05 17:32:39 +01:00
Ash Berlin-Taylor	79836bb92c	Convert OpenAPI client generation tests to use selective checks (#12092 ) This test was bundled in with the existing needs-api tests, but then performed it's _own_ checks on if it should run. This changes that to have selective_ci_checks.sh do this check. Additionally CI_SOURCE_REPO was often wrong -- at least for me as I don't open PRs from ashb/airflow, and this lead to a confusing message: > https://github.com/ashb/airflow.git Branch my_branch does not exist But all we were using this for was to find the "parent" commit, but there is any easier way we can do that: HEAD^1 with a fetch depth of 2 to the checkout option. So I've removed calculating that and where it is used. If we need to bring it back we should use the output from the `potiuk/get-workflow-origin` action -- that gets the correct value	2020-11-04 19:12:49 +00:00
Kaxil Naik	bec9f3b29f	Use sys.exit() instead of exit() (#12084 ) The `exit` and `quit` functions are actually `site.Quitter` objects and are loaded, at interpreter start up, from site.py. However, if the interpreter is started with the `-S` flag, or a custom site.py is used then exit and quit may not be present. It is recommended to use `sys.exit()` which is built into the interpreter and is guaranteed to be present.	2020-11-04 11:50:52 +00:00
Jarek Potiuk	d971c1c0e1	Fixes problem with building a PROD image (#12080 ) The change #12050 that aimed at automation of Docker images building in DockerHub had an undesired effect of overriding the production image tag with the CI one. This is fixed by this PR.	2020-11-04 09:31:00 +01:00
Kaxil Naik	4e8f9cc8d0	Enable Black - Python Auto Formmatter (#9550 )	2020-11-03 23:51:54 +00:00
Kaxil Naik	8c42cf1b00	Use PyUpgrade to use Python 3.6 features (#11447 ) Use features like `f-strings` instead of format across the code-base. More details: https://github.com/asottile/pyupgrade	2020-11-03 21:53:59 +00:00
Ash Berlin-Taylor	8000ab7375	If we build a new image, we should run more than basic checks (#12070 ) This lead to bases such as in #11699 where despite there being changes, and an image being build, the pre-commit tests were not being run.	2020-11-03 17:42:01 +00:00
Jarek Potiuk	5c199fbddf	Uses DOCKER_TAG when building image in DockerHub (#12050 ) DockerHub uses `hooks/build` to build the image and it passes DOCKER_TAG variable when the script is called. This PR makes the DOCKER_TAG to provide the default valuei for tag that is calculated from sources (taking the default branch and python version). Since it is only set in the DockerHub build, it should be safe. Fixes #11937	2020-11-02 22:00:51 +01:00
Jarek Potiuk	adbf764ade	Fixes documentation-only selective checks (#12038 ) There was a problem that documentation-only checks triggered selective checks without docs build (they resulted in basic-checks-only and no images being built. This occured for example in #12025 This PR fixes it by adding image-build and docs-build as two separate outputs.	2020-11-02 15:16:24 +01:00
SZN	2354bd2be3	Checks if all the libraries in setup.py are listed in installation.rst file (#12023 )	2020-11-02 14:17:41 +01:00
Ash Berlin-Taylor	0314a3a218	Allow airflow.providers to be installed in multiple python folders (#10806 ) For example, this allows some providers to be installed in site packages (`/usr/local/python3.7/...`) and others to be installed in the user folder (`~/.local/lib/python3.7/...`) and both be importable. If we didn't have code in `airflow/__init__.py` this would be much easier to achieve (we simply delete the top level init file would be enough) - but sadly we can't take that route. From the docs of pkgutil: https://docs.python.org/3/library/pkgutil.html#module-pkgutil > This will add to the package’s __path__ all subdirectories of > directories on sys.path named after the package. This is useful if one > wants to distribute different parts of a single logical package as > multiple directories. Tested as follows: ``` $ pip install /wheels/apache_airflow-2.0.0.dev0-py3-none-any.whl $ ls -ald $(python -c 'import os; print(os.path.dirname(__import__("airflow").__file__))')/providers ls: cannot access '/usr/local/lib/python3.7/site-packages/airflow/providers': No such file or directory $ pip install --constraint <(echo 'apache-airflow==2.0.0.dev0') apache-airflow-backport-providers-redis $ pip install --user --constraint <(echo 'apache-airflow==2.0.0.dev0') apache-airflow-backport-providers-imap $ python -c 'import airflow.providers.imap, airflow.providers.redis; print(airflow.providers.imap.__file__); print(airflow.providers.redis.__file__)' /root/.local/lib/python3.7/site-packages/airflow/providers/imap/__init__.py /usr/local/lib/python3.7/site-packages/airflow/providers/redis/__init__.py ```	2020-11-01 15:01:28 +00:00
Kamil Breguła	9322f3e46c	Migrate from helm-unittest to python unittest (#11827 ) * Migrate from helm-unittest to python unittest * fixup! Migrate from helm-unittest to python unittest * fixup! fixup! Migrate from helm-unittest to python unittest	2020-10-30 07:48:22 -07:00
Jarek Potiuk	2124453421	Moves tests that should be always executed to 'always' directory (#11948 ) Some tests (testing the structure and importability of example) should be always run even if core part was not modified. That's why we move it to "always" directory.	2020-10-30 12:53:25 +01:00
Jarek Potiuk	37eaac3c5d	The PRs which are not approved run subset of tests (#11828 ) This PR is an implementation of optimisation - to only run default values for build matrix in case PR does not have "okay to test" label. This "okay to test" label is set when the PR gets approved but it was not approved before, also then a comment is generated urging the committer to rebase the PR to run full set of tests. Additionally a check is added (in-progress) that makes the PR not yet ready to be merged. Only after re-running it it will become truly readty to be merged.	2020-10-29 08:07:02 +01:00
Daniel Imberman	0d1ad6648e	Add Python Helm testing framework (#11693 ) * Helm Python Testing * helm change * add back args	2020-10-27 18:29:47 -07:00
Jarek Potiuk	923cc09830	Fixes the doc pattern in selective checks (#11834 ) The pattern contained $ which effectively stopped docs from being run on doc-only change :(	2020-10-27 10:20:01 +01:00
Kaxil Naik	0f41ba9e77	Use correct name for PostgreSQL (#11869 ) `PostgresSQL` -> `PostgreSQL`	2020-10-27 01:47:47 +00:00
Jarek Potiuk	2f4a3d48a8	Occasional docker-compose errors will be easier to diagnose (#11835 ) With this change we attempt to better diagnose some occasional network docker-compose issues that have beeen plaguing us after we solved or workarounded other CI-related issues. Sometimes the docker compose jobs fail on checking if the container is up and running with either of the two errors: * 'forward host lookup failed: Unknown host` * 'DNS fwd/rev mismatch' Usually this happens in rabbitMQ and openldap containers. Both indicate a problem with DNS of the docker engine or maybe some remnants of the previous docker run that do not allow us to start those containers. This change introduces few improvements: * added --volume in `docker system prune` command which might clean-up some anonymous volumes left by the containers between runs * removed docker-compose down --remove-orphans --down command after failure, as currently we are anyhow always doing it few lines before (before the test). This change will cause that our mechanism of logging container logs after failure will likely give us more information about in case the root cause is rabbitmq or openldap container failing to start * Increases number of tries to 5 in case of failed containers.	2020-10-26 17:21:21 +01:00
Jarek Potiuk	03905158bb	Local Executor is used by default for MySQL/Postgres breeze (#11792 )	2020-10-26 12:16:13 +01:00
Jarek Potiuk	00bec1b09e	Fix the script that builds source for backports (#11846 ) First time preparing backports after converting scripts to also support regular providers. Some small bugs were found and fixed.	2020-10-26 10:18:16 +01:00
Jarek Potiuk	f5410f2486	Removes duplicates from DISABLED_INTEGRATIONS variable (#11831 ) Presto DB is checked several times but it also means that it is added several times to DISABLED_INTEGRATIONS in case it is not enabled. This commit fixes it.	2020-10-25 11:30:23 +01:00
Kaxil Naik	f93175d5a7	Fix typo in scripts/in_container/entrypoint_ci.sh (#11824 ) * Fix typo in scripts/in_container/entrypoint_ci.sh * Update bats_tests.sh	2020-10-24 16:58:01 +02:00
Jarek Potiuk	8d94214575	Switch postgres from 10 to 13 (#11785 ) Seems that postgres is really stable when it comes to upgrades, so we take the assumption that if we test 9.6 and 13, and they work, all the versions between will also work. This PR changes Postgres 10 to 13 in tests and updates documentation with all the versions in between.	2020-10-24 14:39:01 +02:00
John Bampton	aacf6025f6	Fix spelling (#11457 )	2020-10-24 12:23:52 +02:00
Kaxil Naik	0218bcf838	Use LocalExecutor by default with tmux + Breeze (#11791 ) * Use LocalExecutor by default with tmux + Breeze * Update run_tmux.sh * Update run_tmux.sh	2020-10-24 08:07:19 +02:00
Jarek Potiuk	53e5d8f1f2	The .pypirc file is read from docker-context-files (#11779 ) If you used context from git repo, the .piprc file was missing and COPY in Dockerfile is not conditional. This change copies the .pypirc conditionally from the docker-context-files folder instead. Also it was needlessly copied in the main image where it is not needed and it was even dangerous to do so.	2020-10-23 17:55:15 +02:00
Jarek Potiuk	4d04bb663c	The .tar.gz provider packages are installable now. (#11630 ) The packages lacked setup.py and they could not be installed. This change automatically generates setup.py for the packages and adds them to the packages. Fixes: #11546	2020-10-23 16:47:47 +02:00
Jarek Potiuk	0647888c15	Enables splitting tests into smaller chunks (#11659 ) We've implemented the capability of running the tests in smaller chunks and selective running only some of those, but this capability have been disabled by mistake by default setting of TEST_TYPE to "All" and not removing it when TEST_TYPES are set to the sets of tests that should be run. This should speed up many of our tests and also hopefully lower the chance of EXIT 137 errors.	2020-10-22 23:25:00 +02:00
John Bampton	172820db4d	Fix case of GitHub (#11398 )	2020-10-21 14:32:41 +02:00
Kamil Breguła	ffc9aebeb2	Better file extension for Helm template (#11702 ) * Better file extension for Helm template * fixup! Better file extension for Helm template	2020-10-21 12:17:19 +02:00
Jarek Potiuk	c568c8886a	Fixing problem with missing output in pre-commits in some cases (#11684 ) Dumping logs from container should only be done in CI. Problem was introduced in #11614	2020-10-20 13:54:12 +02:00
Kamil Breguła	1543923c19	Add Kerberos Auth for PrestoHook (#10488 )	2020-10-20 13:43:18 +02:00
Jarek Potiuk	e3a0839e21	Security scans are also selective now (#11674 ) The security scans take a long time, especially for python code - it is about ~18 minutes now. This PR reduces strain on the GitHub actions by only running the scan in pull requests when any of python/javascript code changed respectively.	2020-10-20 12:19:16 +02:00
Jarek Potiuk	9a90ebeabe	Bats tests should be much faster now for pre-commits. (#11662 ) For pre-commit run of the tests only the corresponding tests for changed .sh files and changed .bats files should be run	2020-10-20 09:21:28 +02:00
Jarek Potiuk	03730891cb	Brings back fixup to CI optimisation (#11671 ) The fixup was lost during the rebase. This one restores it	2020-10-20 07:25:08 +02:00
Jarek Potiuk	dd1c07b20d	Optimizes CI builds heavily with selective checks (#11656 ) * Images are not built if the change is not touching code or docs. * In case we have no need for CI images we run stripped-down pre-commit checks which skip the long checks and only run for changed files * If none of the CLI/Providers/Kubernetes/WWW files changed the relevant tests are skipped, unless some of the core files changed as well. * The selective checks logic is explained and documented. This is the second attempt at the problem with better strategy to get the list of files from the incoming PR. The strategy works now better in a number of cases: * when PR comes from the same repo * when PR comes from the pull_repo * when PR contains more than one commit * when PR is based on older master and GitHub creates merge commit	2020-10-20 06:27:20 +02:00
Jarek Potiuk	ae06ad01a2	Fixes versioning for pre-release provider packages (#11586 ) When we prepare pre-release versions, they are not intended to be converted to final release versions, so there is no need to replace version number for them artificially, For release candidates on the other hand, we should internally use the "final" version because those packages might be simply renamed to the final "production" versions. Fixes #11585	2020-10-19 12:32:07 +02:00
alxdembo	73b0991e48	Sourcing the profile file should be sufficient to update the PATH, re-login is not required. (#11588 )	2020-10-19 10:57:10 +01:00
Jarek Potiuk	68f647a4e1	Dumps more logs in case of CI failure (#11614 ) We do not dump airflow logs on success any more, but we dump them and all the container logs in case of failure, so that we can better investigate cases like #11543 - that includes enabling full deadlock information dumping in our mysql database.	2020-10-19 08:59:43 +02:00
Kaxil Naik	d93b6e53eb	Revert "Optimizes CI builds heavily with selective checks (#11541 )" (#11648 ) This reverts commit `9237338f75`.	2020-10-19 01:56:38 +01:00
Kaxil Naik	8ed2229298	Revert "Fixed an error introduced in selective checks (#11640 )" (#11647 ) This reverts commit `6fbb235f25`.	2020-10-19 01:53:08 +01:00
Kaxil Naik	c06addf276	Revert "Fixes selective tests in case of missing merge commits (#11641 )" (#11646 ) This reverts commit `4fcc71c2ff`.	2020-10-19 01:52:27 +01:00
Jarek Potiuk	4fcc71c2ff	Fixes selective tests in case of missing merge commits (#11641 ) In case of very simple changes, there might be no merge commits generated by GitHub. In such cases we should take the commit SHA instead as the base of change calculation for selective tests.	2020-10-19 01:05:50 +01:00
Jarek Potiuk	6fbb235f25	Fixed an error introduced in selective checks (#11640 ) A few remnants of earlier version of the script caused occasional errors. Error introduced in #11541	2020-10-19 00:18:29 +02:00
Jarek Potiuk	bf79578ed0	Fix random kills during pre-commit image building (#11535 ) Seems like the trap with several steps and \|\| true does not really work the way I wanted and when kill is run but the process is already gone, we had error in the script. Looks like this approach with sub-process kill will do it.	2020-10-18 23:50:50 +02:00
Jarek Potiuk	4655409982	Improves stability of K8S tests by caching binaries and repeats (#11634 ) * Improves stability of K8S tests by caching binaries and repeats The K8S tests on CI are controlled from the host, not from inside of the CI container image. Therefore it needs virtualenv to run the tests as well as some tools such as helm, kubectl and kind. While those tools can bee downloaded and installed on demand, from time to time the download fails intermittently. This change introduces the following improvements: * the commands to download and setup kind, helm, kubectl are repeated up to 4 times in case they fail * the "bin" directory where those binaries are downloaded is cached between runs. Only the same combination of versions of the tools are sharing the same cache. This way both cases - regular re-runs of the same jobs and upgrade of tools will be much more stable.	2020-10-18 22:45:00 +02:00
Jarek Potiuk	9237338f75	Optimizes CI builds heavily with selective checks (#11541 ) * Images are not built if the change is not touching code or docs. * In case we have no need for CI images we run stripped-down pre-commit checks which skip the long checks and only run for changed files * If none of the CLI/Providers/Kubernetes/WWW files changed the relevant tests are skipped, unless some of the core files changed as well. * The selective checks logic is explained and documented.	2020-10-18 20:47:21 +02:00
Jarek Potiuk	66ced72fca	Name and optionally preserve data volumes in Breeze (#11628 ) So far breeze used in-container data for persisting it (mysql redis, postgres). This means that the data was kept as long, as long the containers were running. If you stopped Breeze via `stop` command the data was always deleted. This changes the behaviour - each of the Breeze containers has a named volume where data is kept. Those volumes are also deleted by default when Breeze is stopped, but you can choose to preserve them by adding ``--preserve-volumes`` when you run ``stop`` or ``restart`` command. Fixes: #11625	2020-10-18 16:39:44 +02:00
Tomek Urbaszek	e74b861fd8	Expose flower and redis ports in breeze (#11624 )	2020-10-18 11:46:22 +02:00
Jarek Potiuk	925f7619e1	Behaviour to install all airflow providers added (#11529 ) In Airflow 2.0 we decided to split Airlow into separate providers. this means that when you prepare core airflow package, providers are not installed by default. This is not very convenient for local development though and for docker images built from sources, where you would like to install all providers by default. A new INSTALL_ALL_AIRFLOW_PROVIDERS environment variable controls this behaviour now. It is is set to "true", all packages including provider packages are installed. If missing or set to false, only the core provider package is installed. For Breeze, the default is set to "true", as for those cases you want to install all providers in your environment. Similarly if you build the production image from sources. However when you build image using github tag or pip package, you should specify appropriate extras to install the required provider packages. Note that if you install Airflow via 'pip install .' from sources in local virtualenv, provider packages are not going to be installed unless you set INSTALL_ALL_AIRFLOW_PROVIDERS to "true". Fixes #11489	2020-10-17 11:16:28 +02:00
Jarek Potiuk	6733f2d7b9	The scripts fixing ownership and cleaning tmp use docker run (#11569 ) The scripts were using docker compose, but they can be docker run commands. Also they are not needed to be run by breeze directly in CI image because I've added traps to run the commands at the exit of all "in_container" scripts.	2020-10-16 10:50:59 +02:00
Jarek Potiuk	e7dc964619	Adds capability of installing wheel packages in CI image (#11527 ) The production image had the capability of installing images from wheels (for security teams/air-gaped systems). This capability might also be useful when building CI image espeically when we are installing separately core and providers packages and we do not yet have provider packages available in PyPI. This is an intermediate step to implement #11490	2020-10-15 15:19:18 +02:00
Jarek Potiuk	3447b55ba5	More stable kubernetes port forwarding (#11538 ) Seems that port forwarding during kubernetes tests started to behave erratically - seems that kubectl port forward sometimes might hang indefinitely rather than connect or fail. We change the strategy a bit to try to allocate increasing port numbers in case something like that happens.	2020-10-15 11:05:58 +02:00
Kaxil Naik	e9f7bdd25f	Fix typo in scripts/ci/libraries/_initialization.sh (#11517 ) `initialized` -> `initialize`	2020-10-14 08:24:32 +02:00
Jarek Potiuk	4297abab26	Combine back multiple test types into single jobs (#11504 ) Seems that by splitting the tests into many small jobs has a bad effect - since we only have queue size = 180 for the whole Apache organisation, we are competing with other projects for the jobs and with the jobs being so short we got starved much more than if we had long jobs. Therefore we are re-combining the test types into single jobs per Python version/Database version and run all the tests sequentially on those machines.	2020-10-13 20:51:08 +02:00
Jarek Potiuk	16e7129719	Added support for provider packages for Airflow 2.0 (#11487 ) * Separate changes/readmes for backport and regular providers We have now separate release notes for backport provider packages and regular provider packages. They have different versioning - backport provider packages with CALVER, regular provider packages with semver. * Added support for provider packages for Airflow 2.0 This change consists of the following changes: * adds provider package support for 2.0 * adds generation of package readme and change notes * versions are for now hard-coded to 0.0.1 for first release * adds automated tests for installation of the packages * rename backport package readmes/changes to BACKPORT_* * adds regulaar packge readmes/changes * updates documentation on generating the provider packaes * adds CI tests for the packages * maintains backport packages generation with --backports flag Fixes #11421 Fixes #11424	2020-10-13 16:33:00 +01:00
Jarek Potiuk	32f2a45819	Rename backport packages to provider packages (#11459 ) In preparation for adding provider packages to 2.0 line we are renaming backport packages to provider packages. We want to implement this in stages - first to rename the packages, then split-out backport/2.0 providers as part of the #11421 issue.	2020-10-12 16:29:48 +02:00
Jarek Potiuk	369bbf0427	Selective tests - depends on files changed in the commit. (#11417 ) This is final step of implementing #10507 - selective tests. Depending on files changed by the incoming commit, only subset of the tests are exucted. The conditions below are evaluated in the sequence specified below as well: * In case of "push" and "schedule" type of events, all tests are executed. * If no important files and folders changed - no tests are executed. This is a typical case for doc-only changes. * If any of the environment files (Dockerfile/setup.py etc.) all tests are executed. * If no "core/other" files are changed, only the relevant types of tests are executed: * API - if any of the API files/tests changed * CLI - if any of the CLI files/tests changed * WWW - if any of the WWW files/tests changed * Providers - if any of the Providers files/tests changed * Integration Heisentests, Quarantined, Postgres and MySQL runs are always run unless all tests are skipped like in case of doc-only changes. * If "Kubernetes" related files/tests are changed, the "Kubernetes" tests with Kind are run. Note that those tests are run separately using Host environment and those tests are stored in "kubernetes_tests" folder. * If some of the core/other files change, all tests are run. This is calculated by substracting all the files count calculated above from the total count of important files. Fixes: #10507	2020-10-12 00:28:11 +02:00
Jarek Potiuk	ce2f19d30d	Fix constraints generation script (#11412 ) Constraints generation script was broken by recent changes in naming of constraints URL variables and moving generation of the link to the Dockerfile This change restores the script's behaviour.	2020-10-11 17:49:19 +02:00
Jarek Potiuk	5bc5994c2c	Split tests to more sub-types (#11402 ) We seem to have a problem with running all tests at once - most likely due to some resource problems in our CI, therefore it makes sense to split the tests into more batches. This is not yet full implementation of selective tests but it is going in this direction by splitting to Core/Providers/API/CLI tests. The full selective tests approach will be implemented as part of #10507 issue. This split is possible thanks to #10422 which moved building image to a separate workflow - this way each image is only built once and it is uploaded to a shared registry, where it is quickly downloaded from rather than built by all the jobs separately - this way we can have many more jobs as there is very little per-job overhead before the tests start runnning.	2020-10-11 07:40:31 -07:00
Jarek Potiuk	f9dddd5d3c	Workarounds "unknown blob" issue by introducing retries (#11411 ) We have started to experience "unknown_blob" errors intermittently recently with GitHub Docker registry. We might eventually need to migrate to GCR (which eventually is going to replace the Docker Registry for GitHub: The ticket is opened to the Apache Infrastructure to enable access to the GCR and to make some statements about Access Rights management for GCR https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20959 Also a ticket to GitHub Support has been raised about it https://support.github.com/ticket/personal/0/861667 as we cannot delete our public images in Docker registry. But until this happens, the workaround might help us to handle the situations where we got intermittent errors while pushing to the registry. This seems to be a common error, when NGINX proxy is used to proxy Github Registry so it is likely that retrying will workaround the issue.	2020-10-11 06:02:46 +02:00
Jarek Potiuk	04973904c3	Constraints and PIP packages can be installed from local sources (#11382 ) * Constraints and PIP packages can be installed from local sources This is the final part of implementing #11171 based on feedback from enterprise customers we worked with. They want to have a capability of building the image using binary wheel packages that are locally available and the official Dockerfile. This means that besides the official APT sources the Dockerfile build should not needd GitHub, nor any other external files pulled from outside including PIP repository. This change also includes documentation on how to prepare set of such binaries ready for inspection and review by security teams in Enterprise environment. Such sets of "known-working-binary-whl" files can then be separately committed, tracked and scrutinized in an artifact repository of such an Enterprise. Fixes: #11171 * Update docs/production-deployment.rst	2020-10-10 12:58:09 +02:00
John Bampton	39fc961eec	Fix case of JavaScript. (#10957 )	2020-10-10 00:50:31 +02:00
Jarek Potiuk	d752575e78	Revert "Revert "Adds --install-wheels flag to breeze command line (#11317 )" (#11348 )" (#11356 ) This reverts commit `f67e6cb805`.	2020-10-10 00:41:11 +02:00
Jarek Potiuk	e198077f3e	Add pypirc initialization (#11386 ) This PR needs to be merged first in order to handle the #11385 which requires .pypirc to be created before dockerfile gets build. This means that the script change needs to be merged to master first in this PR.	2020-10-09 22:55:03 +02:00
Jarek Potiuk	f5b7bbcb92	Better diagnostics when there are problems with Kerberos (#11353 )	2020-10-08 21:08:11 +02:00
Ash Berlin-Taylor	f67e6cb805	Revert "Adds --install-wheels flag to breeze command line (#11317 )" (#11348 ) This reverts commit `de07d135ae`.	2020-10-08 14:35:04 +01:00
Jarek Potiuk	de07d135ae	Adds --install-wheels flag to breeze command line (#11317 ) If this flag is specified it will look for wheel packages placed in dist folder and it will install the wheels from there after installing Airflow. This is useful for testing backport packages as well as in the future for testing provider packages for 2.0.	2020-10-08 10:06:53 +02:00
Jarek Potiuk	e2655f60b3	Prints nicer message in case of git push errors (#11320 ) We started to get more often "unknown blob" kind of errors when pushing the images to GitHub Registry. While this is clearly a GitHub issue, it's frequency of occurence and unclear message make it a good candidate to write additional message with instructions to the users, especially that now they have an easy way to get to that information via status checks and links leading to the log file, when this problem happens during image building process. This way users will know that they should simply rebase or amend/force-push their change to fix it.	2020-10-07 10:30:16 +02:00
Jarek Potiuk	22c6a843d7	Adds --no-rbac-ui flag for Breeze airflow 1.10 installation (#11315 ) When installing airflow 1.10 via breeze we now enable rbac by default, but we can disable it with --no-rbac-ui flag. This is useful to test different variants of 1.10 when testing release candidataes in connection with the 'start-airflow' command.	2020-10-07 01:00:00 +01:00
mucio	03e0ff24b1	Breeze start-airflow command wasn't able to initialize the db in 1.10.x (#11207 )	2020-10-06 10:40:32 +02:00
Kaxil Naik	6dce7a6c26	Enable MySQL 8 CI jobs (#11247 ) closes https://github.com/apache/airflow/issues/11164	2020-10-04 13:45:05 +02:00
Jarek Potiuk	e89d384688	The bats script for CI image is now placed in the docker folder (#11262 ) The script was previously placed in scripts/ci which caused a bit of a problem in 1-10-test branch where PRs were using scripts/ci from the v1-10-test HEAD but they were missing the ci script from the PR. The scripts "ci" are parts of the host scripts that are always taken from master when the image is built, but all the other stuff should be taken from "docker" folder - which will be taken from the PR.	2020-10-04 08:30:11 +02:00
Kaxil Naik	3db2e7cbfb	Breeze: Fix issue with pulling an image via ID (#11255 )	2020-10-03 12:56:19 +01:00
Jarek Potiuk	ebd7150862	More customizable build process for Docker images (#11176 ) * Allows more customizations for image building. This is the third (and not last) part of making the Production image more corporate-environment friendly. It's been prepared for the request of one of the big Airflow user (company) that has rather strict security requirements when it comes to preparing and building images. They are committed to synchronizing with the progress of Apache Airflow 2.0 development and making the image customizable so that they can build it using only sources controlled by them internally was one of the important requirements for them. This change adds the possibilty of customizing various steps in the build process: * adding custom scripts to be run before installation of both build image and runtime image. This allows for example to add installing custom GPG keys, and adding custom sources. * customizing the way NodeJS and Yarn are installed in the build image segment - as they might rely on their own way of installation. * adding extra packages to be installed during both build and dev segment build steps. This is crucial to achieve the same size optimizations as the original image. * defining additional environment variables (for example environment variables that indicate acceptance of the EULAs in case of installing proprietary packages that require EULA acceptance - both in the build image and runtime image (again the goal is to keep the image optimized for size) The image build process remains the same when no customization options are specified, but having those options increases flexibility of the image build process in corporate environments. This is part of #11171. This change also fixes some of the issues opened and raised by other users of the Dockerfile. Fixes: #10730 Fixes: #10555 Fixes: #10856 Input from those issues has been taken into account when this change was designed so that the cases described in those issues could be implemented. Example from one of the issue landed as an example way of building highly customized Airflow Image using those customization options. Depends on #11174 * Update IMAGES.rst Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>	2020-09-29 15:30:00 +02:00
Jarek Potiuk	17c810ec36	Fixes image tag readonly failure (#11194 ) The image builds fine, but produces an unnecessary error message. Bug Introduced in `c9a34d2ef9`	2020-09-29 13:07:51 +02:00
Omair Khan	68e0eb6976	in_container bats pre-commit hook and updated bats-tests hook (#11179 )	2020-09-29 11:59:06 +02:00
Jarek Potiuk	c9a34d2ef9	Optionally tags image when building with Breeze (#11181 ) Breeze tags the image based on the default python version, branch, type of the image, but you might want to tag the image in the same command especially in automated cases of building the image via CI scripts or security teams that tag the imge based on external factors (build time, person etc.). This is part of #11171 which makes the image easier to build in corporate environments.	2020-09-29 11:45:37 +02:00
Jarek Potiuk	044b441257	Conditional MySQL Client installation (#11174 ) This is the second step of making the Production Docker Image more corporate-environment friendly, by making MySQL client installation optional. Instaling MySQL Client on Debian requires to reach out to oracle deb repositories which might not be approved by security teams when you build the images. Also not everyone needs MySQL client or might want to install their own MySQL client or MariaDB client - from their own repositories. This change makes the installation step separated out to script (with prod/dev installation option). The prod/dev separation is needed because MySQL needs to be installed with dev libraries in the "Build" segment of the image (requiring build essentials etc.) but in "Final" segment of the image only runtime libraries are needed. Part of #11171 Depends on #11173.	2020-09-27 18:56:58 +02:00
mucio	0db7a30782	New Breeze command start-airflow, it replaces the previous flag (#11157 )	2020-09-27 18:31:50 +02:00
Jarek Potiuk	f16354bc02	Optionally disables PIP cache from GitHub during the build (#11173 ) This is first step of implementing the corporate-environment friendly way of building images, where in the corporate environment, this might not be possible to install the packages using the GitHub cache initially. Part of #11171	2020-09-27 18:00:03 +02:00
Jarek Potiuk	620b0989b8	Add Helm Chart linting (#11108 )	2020-09-24 13:02:11 +02:00
Kaxil Naik	7644c37082	Revert "Introducing flags to skip example dags and default connections (#11099 )" (#11110 ) This reverts commit `0edc3dd579`.	2020-09-23 19:47:43 +01:00
Kaxil Naik	ccfbc319dd	Fix sort-in-the-wild pre-commit on Mac (#11103 )	2020-09-23 15:10:15 +01:00
mucio	0edc3dd579	Introducing flags to skip example dags and default connections (#11099 )	2020-09-23 14:56:29 +02:00
Jarek Potiuk	3db4d3b04d	All versions in CI yamls are not hard-coded any more (#10959 ) GitHub Actions allow to use `fromJson` method to read arrays or even more complex json objects into the CI workflow yaml files. This, connected with set::output commands, allows to read the list of allowed versions as well as default ones from the environment variables configured in ./scripts/ci/libraries/initialization.sh This means that we can have one plece in which versions are configured. We also need to do it in "breeze-complete" as this is a standalone script that should not source anything we added BATS tests to verify if the versions in breeze-complete correspond with those defined in the initialization.sh Also we do not limit tests any more in regular PRs now - we run all combinations of available versions. Our tests run quite a bit faster now so we should be able to run more complete matrixes. We can still exclude individual values of the matrixes if this is too much. MySQL 8 is disabled from breeze for now. I plan a separate follow up PR where we will run MySQL 8 tests (they were not run so far)	2020-09-21 20:02:04 +02:00
mucio	17faea0b5c	Starting breeze will run an init script after the environment is setup (#11029 ) Added the possibility to run an init script	2020-09-21 11:58:30 +01:00
Daniel Imberman	cba51d49ee	Simplify the K8sExecutor and K8sPodOperator (#10393 ) * Simplify Airflow on Kubernetes Story Removes thousands of lines of code that essentially ammount to us re-creating the Kubernetes API. Will offer a faster, simpler KubernetesExecutor for 2.0 * Fix podgen tests * fix documentation * simplify validate function * @mik-laj comments * spellcheck * spellcheck * Update airflow/executors/kubernetes_executor.py Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com> Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>	2020-09-17 08:40:20 -07:00
Jarek Potiuk	4a46f4368b	Allows to build production images for 1.10.2 and 1.10.1 Airflow (#10983 ) Airflow below 1.10.2 required SLUGIFY_USES_TEXT_UNIDECODE env variable to be set to yes. Our production Dockerfile and Breeze supports building images for any version of airflow >= 1.10.1 but it failed on 1.10.2 and 1.10.1 because this variable was not set. You can now set the variable when building image manually and Breeze does it automatically if image is 1.10.1 or 1.10.2 Fixes #10974	2020-09-17 14:25:34 +02:00
Kaxil Naik	a575c79cee	Fix 'Upload documentation' step in CI (#10981 )	2020-09-16 19:57:25 -07:00
Kaxil Naik	e066260ef8	Improve the Error message in Breeze for invalid params (#10980 ) Changed `Is` to `Passed` Before: ``` ERROR: Allowed backend: [ sqlite mysql postgres ]. Is: 'dpostgres'. Switch to supported value with --backend flag. ``` After: ``` ERROR: Allowed backend: [ sqlite mysql postgres ]. Passed: 'dpostgres'. Switch to supported value with --backend flag. ```	2020-09-17 03:21:47 +01:00
Ash Berlin-Taylor	1ed621ea86	Fix static error (tabs) introduced in #10971 (#10973 )	2020-09-16 14:39:17 +01:00
mucio	c9f006b540	added environment configuration for using --start-airflow (#10971 )	2020-09-16 13:41:58 +02:00
John Bampton	ce19657ec6	Fix case of GitHub. (#10955 ) Changed `Github` to `GitHub`.	2020-09-15 14:49:27 -04:00
Jarek Potiuk	45272a8e41	Github repository can be overridden in command line by Breeze (#10943 ) During testing v1-10-test backport for Breeze the --github-repository flag did not work. It turned out that the lowercase variable was not re-set when the flag was provided by Breeze. This change causes the lowercasing to be run just before it is used to make sure that the GITHUB_REPOSITORY value is used after it's been overwritten.	2020-09-15 15:42:31 +02:00
Jarek Potiuk	14f27635f6	Fixes retrieval of correct branch in non-master related builds (#10912 ) When we ported the new CI mechanism to v1-10-test it turned out that we have to correct the retrieval of DEFAULT BRANCH and DEFAULT_CONSTRAINTS_BRANCH. Since we are building the images using the "master" scripts, we need to make sure the branches are retrieved from _initialization.sh of the incoming PR, not from the one in the master branch. Additionally versions 2.7 and 3.5 builds have to be merged to master and excluded when the build is run targeting master branch.	2020-09-15 15:24:33 +02:00
Jarek Potiuk	b2dc346062	Make breeeze-complete Google Shell Guide compatible (#10708 ) Also added unit tests for breeze-complete Part of #10576	2020-09-14 10:21:09 +02:00
Kaxil Naik	69be4b8bce	Fix typos in scripts/ci/docker-compose/local.yml (#10906 ) `an` -> `on` (grammatically that makes sense)	2020-09-13 22:41:59 +01:00
Kaxil Naik	9c866cd9ef	Fix syntax error in Dockerfile 'maintainer' Label (#10899 )	2020-09-12 18:32:07 +02:00
Jarek Potiuk	106c0f556f	Add pre-commit to sort INTHEWILD.md file automatically (#10851 )	2020-09-12 18:26:12 +02:00
mucio	47e592e3a0	Flag --start-airflow for breeze (#10837 )	2020-09-11 23:26:56 +02:00
Daniel Imberman	56bd9b7d6b	Modify helm chart to use pod_template_file (#10872 ) * Modify helm chart to use pod_template_file Since we are deprecating most k8sexecutor arguments we should use the pod_template_file when launching airflow using the KubernetesExecutor * fix tests * one more nit * fix dag command * fix pylint	2020-09-11 10:47:59 -07:00
Jarek Potiuk	a356656d44	Make dockerfiles Google Shell Guide Compliant (#10734 ) Part of #10576	2020-09-09 14:04:16 +02:00
Jarek Potiuk	409ebc1097	Make scripts/ci/tools Google Shell Guide Compatible (#10811 ) Part of #10576	2020-09-09 11:44:17 +02:00
Jarek Potiuk	40939dca86	Make airflow testing Google Shell Guide compatible (#10813 ) Part of #10576	2020-09-09 11:43:25 +02:00
Jarek Potiuk	c60fcccdb6	Fix integration tests being accidentally excluded (#10807 ) The change from #10769 accidentally switched Integration tests into far-longer run unit tests (we effectively run the tests twice and did not run integration tests. This fixes the problem by removing readonly status from INTEGRATIONS and only setting it after the integrations are set.	2020-09-08 20:46:46 +02:00
Jarek Potiuk	3c6fdd84f8	Make ci/backport_packages Google Shell guide compliant (#10733 )	2020-09-08 19:29:37 +02:00
Jarek Potiuk	71e1d09175	Fixed wrong "-e" on md5 file status check (#10803 ) The "-e" flag was not reset properly in the md5 status check which could lead in some cases to removing output of flake check.	2020-09-08 19:23:07 +02:00
Jarek Potiuk	4f07463cf2	Make script/ci/kubernetes Google Shell Guide Compatible (#10746 ) Part of #10576	2020-09-08 19:21:45 +02:00
Jarek Potiuk	43303f10aa	Make script/ci/images Google Shell Guide compatible (#10745 ) Part of #10576	2020-09-08 19:20:26 +02:00
Jarek Potiuk	cd0cc4ca86	Check that all pre-commits are synchronized code<>docs (#10789 ) Until pre-commit implements export of all configured checks, we need to maintain the list manually updated. We check both - pre-commit list in breeze-complete and descriptions in STATIC_CODE_CHECKS.rst	2020-09-08 14:06:42 +02:00
Jarek Potiuk	4de67a6731	Move dev docker images to airflow registry (#9652 ) Part of #9401	2020-09-08 10:07:10 +02:00
Jarek Potiuk	b746f33fc6	Removes stable tests from quarantine (#10768 ) We've observed the tests for last couple of weeks and it seems most of the tests marked with "quarantine" marker are succeeding in a stable way (https://github.com/apache/airflow/issues/10118) The removed tests have success ratio of > 95% (20 runs without problems) and this has been verified a week ago as well, so it seems they are rather stable. There are literally few that are either failing or causing the Quarantined builds to hang. I manually reviewed the master tests that failed for last few weeks and added the tests that are causing the build to hang. Seems that stability has improved - which might be casued by some temporary problems when we marked the quarantined builds or too "generous" way of marking test as quarantined, or maybe improvement comes from the #10368 as the docker engine and machines used to run the builds in GitHub experience far less load (image builds are executed in separate builds) so it might be that resource usage is decreased. Another reason might be Github Actions stability improvements. Or simply those tests are more stable when run isolation. We might still add failing tests back as soon we see them behave in a flaky way. The remaining quarantined tests that need to be fixed: * test_local_run (often hangs the build) * test_retry_handling_job * test_clear_multiple_external_task_marker * test_should_force_kill_process * test_change_state_for_tis_without_dagrun * test_cli_webserver_background We also move some of those tests to "heisentests" category Those testst run fine in isolation but fail the builds when run with all other tests: * TestImpersonation tests We might find that those heisentest can be fixed but for now we are going to run them in isolation. Also - since those quarantined tests are failing more often the "num runs" to track for those has been decreased to 10 to keep track of 10 last runs only.	2020-09-08 07:36:12 +02:00
Jarek Potiuk	ef0d639b34	Fixes pre-commit failing on build step (#10785 ) When rebuildig the image during commit, kill command failed to find the spinner job to kill (this is just preventive measure) and failed the rebuild step in pre-commit. This is now fixed.	2020-09-07 22:38:39 +02:00
Jarek Potiuk	1959d6aee2	Make static checks Google Shell Guide compatible (#10750 ) Part of #10576	2020-09-07 08:05:07 +02:00
Jarek Potiuk	18b80f34e8	The scripts to run tests properly initialises constants (#10769 ) The constants were initialised after the readonly status was set for the constants in the test script. This was mainly about default values for those consttants (but this has already been handled by the _script_init.sh but more importantly the INTEGRATIONS were not properly initialized that cause skipping of some integration tests.	2020-09-07 08:03:48 +02:00
Jarek Potiuk	ebb0a97018	Make ci/scripts/pre-commit Google Shell Guide compatible (#10748 ) Part of #10576	2020-09-06 20:00:54 +02:00
Jarek Potiuk	fbce92e8e7	The verbose functions will not exit immediately if not asked to (#10731 ) The docker(), helm(), kubectl() functions replace the real tools to get verbose behaviour (we can print the exact command being executed for those. But when 'set +e' was set before the command was called - indicating that error in those functions should be ignored - this did not happen. The functions set 'set -e' just before returning the non-zero value, effectively exiting the script right after. This caused first time experience to be not good. The fix also fixes behaviour of stdout and stderr for those functions - previously they were joined to be able to be printed to OUTPUT_FILE but this lost the stderr/stdout distinction. Now both stdout and stderr are printed to the output file but they are also redirected to stdout/stderr respectively, so that 2>/dev/null works as expected. While fixing it, it turned out that one of the remove_images methods was not used any more - merged it with the breeze one.	2020-09-06 19:56:35 +02:00
Jarek Potiuk	e3c83da984	Check all dockerfiles with hadolint (#10754 ) The hadolint check only checked the "main dir" Dockerfile but we have more of them now. All of them are now checked. The following problems are fixed: * DL3000 Use absolute WORKDIR * DL4000 MAINTAINER is deprecated * DL4006 Set the SHELL option -o pipefail before RUN with a pipe in it. * SC2046 Quote this to prevent word splitting. The followiing problems are ignored: * DL3018 Pin versions in apk add. Instead of `apk add <package>` use `apk add <package>=<version>`	2020-09-06 18:06:05 +02:00
Jarek Potiuk	ba36f876dc	Make scripts/ci/openapi Google Shell Guide compatible (#10747 ) Part of #10576	2020-09-05 22:25:39 +02:00
Kush	28c21cfd1d	clean-logs script for Dockerfile: trim logs before sleep (#10685 ) If the pod restarts before the sleep time is over, the trim command will not run. I think it's better if we reorder the commands to execute the delete and then go to sleep. At the moment sleep is every 15 mins but people will just increase the EVERY line if they want longer sleep time and can encounter this bug.	2020-09-05 17:40:57 +02:00
Jarek Potiuk	e4de7288a3	Switches to better BATS asserts (#10718 ) BATS has additional libraries of asserts that are much more straightforward and nicer to write tests for bash scripts There is no dockerfile from BATS that contains those, so we had to build our own (but it follows the same structure as #9652 - where we keep our dev docker image sources inside our repository and the generated docker images in "apache/airflow:<tool>-CALVER-TOOLVER format. We have more BATS unit test to add - following #10576 and this change will be of great help.	2020-09-04 22:25:29 +02:00
João Marques	5b6464f489	Migrate speccy to spectral in OpenAPI linting. (#10351 )	2020-09-03 18:06:23 +02:00
Jarek Potiuk	4e09cb53ea	Add packages to function names in bash (#10670 ) (#10696 ) Inspired by the Google Shell Guide where they mentioned separating package names with :: I realized that this was one of the missing pieces in the bash scripts of ours. While we already had packages (in libraries folders) it's been difficult to realise which function is where. With introducing packages - equal to the library file name we are almost at a level of a structured language - and it's easier to find the functions if you are looking for them. Way easier in fact. Part of #10576 (cherry picked from commit `cc551ba793`) (cherry picked from commit 2bba276f0f06a5981bdd7e4f0e7e5ca2fe84f063)	2020-09-02 21:58:37 +02:00
Jarek Potiuk	649ce4ba9d	Implement Google Shell Conventions for breeze script (#10695 ) * Implement Google Shell Conventions for breeze script … (#10651) Part of #10576 First (and the biggest of the series of commits to introduce Google Shell Conventions in our bash scripts. This is about the biggest and the most complex breeze script so it is rather huge but it is difficult to split it into smaller pieces. The rules implemented (from the conventions): * constants and exported variables are CAPITALIZED, where local/temporary variables are lowercase * following the shell guide, once all the variables are set to their final values (either from exported variables, calculation or --switches ) I have a single function that makes all the variables read-only. That helped to clean-up a lot of places where same functions was called several times, or where variables were defined in a few places. Now the behavior should be rather consistent and we should easily catch some duplications * function headers (following the guide) explaining arguments, variables expected, variables modified in the functions used. * setting the variables as read-only also helped to clean-up the "ifs" where we often had ":=}" in variables and != "" or == "". Those are replaced with `=}` and tests are replaced with `-n` and `-z` - also following the shell guide (readonly helped to detect and clean all such cases). This also should be much more robust in the future. * reorganized initialization of those constants and variables - simplified a few places where initialization was overlapping. It should be much more straightforward and clean now * a number of internal function breeze variables are "local" - this is helpful in accidental variables overwriting and keeping stuff localized * trap_add function is separated out to help in cases where we had several traps handling the same signals. (cherry picked from commit `46c8d6714c`) (cherry picked from commit c822fd7b4bf2a9c5a9bb3c6e783cbea9dac37246) * fixup! Implement Google Shell Conventions for breeze script … (#10651)	2020-09-02 21:55:50 +02:00
Kaxil Naik	9a10f83ab0	Revert recent breeze changes (#10651 & #10670 ) (#10694 ) * Revert "Add packages to function names in bash (#10670)" This reverts commit `cc551ba793`. * Revert "Implement Google Shell Conventions for breeze script … (#10651)" This reverts commit `46c8d6714c`.	2020-09-02 17:27:36 +01:00
Jarek Potiuk	cc551ba793	Add packages to function names in bash (#10670 ) Inspired by the Google Shell Guide where they mentioned separating package names with :: I realized that this was one of the missing pieces in the bash scripts of ours. While we already had packages (in libraries folders) it's been difficult to realise which function is where. With introducing packages - equal to the library file name we are almost at a level of a structured language - and it's easier to find the functions if you are looking for them. Way easier in fact. Part of #10576	2020-09-01 13:40:06 +02:00
Jarek Potiuk	46c8d6714c	Implement Google Shell Conventions for breeze script … (#10651 ) Part of #10576 First (and the biggest of the series of commits to introduce Google Shell Conventions in our bash scripts. This is about the biggest and the most complex breeze script so it is rather huge but it is difficult to split it into smaller pieces. The rules implemented (from the conventions): * constants and exported variables are CAPITALIZED, where local/temporary variables are lowercase * following the shell guide, once all the variables are set to their final values (either from exported variables, calculation or --switches ) I have a single function that makes all the variables read-only. That helped to clean-up a lot of places where same functions was called several times, or where variables were defined in a few places. Now the behavior should be rather consistent and we should easily catch some duplications * function headers (following the guide) explaining arguments, variables expected, variables modified in the functions used. * setting the variables as read-only also helped to clean-up the "ifs" where we often had ":=}" in variables and != "" or == "". Those are replaced with `=}` and tests are replaced with `-n` and `-z` - also following the shell guide (readonly helped to detect and clean all such cases). This also should be much more robust in the future. * reorganized initialization of those constants and variables - simplified a few places where initialization was overlapping. It should be much more straightforward and clean now * a number of internal function breeze variables are "local" - this is helpful in accidental variables overwriting and keeping stuff localized * trap_add function is separated out to help in cases where we had several traps handling the same signals.	2020-08-31 13:24:53 +02:00
Kamil Breguła	7200835d0e	Improve output of check_environment.sh (#10631 )	2020-08-28 21:53:11 +02:00
Jarek Potiuk	cd1f794242	Bring back some inclusions before we solve cyclic deps problems (#10551 )	2020-08-25 20:04:51 +02:00
Jarek Potiuk	c6e6d6dedd	Helm Docker image sources are now included in the Airlfow codebase (#9650 ) We can now build all the images from Airlfow sources in a reproducible fashion and our users can use the helm chart based on the images build from official images + code in Airflow Codebase. We also have consistent versioning scheme based on calver version of releasing the images coupled with the version of the original package. Part of #9401	2020-08-25 16:01:39 +01:00
Jarek Potiuk	1775474484	Make configuration.py Pylint compatible (#10494 )	2020-08-25 00:16:54 +02:00
Jarek Potiuk	4f6d53eaa7	Make models/taskinstance.py pylint compatible (#10499 )	2020-08-25 00:16:05 +02:00
Jarek Potiuk	2f2d8dbfaf	Remove all "noinspection" comments native to IntelliJ (#10525 ) We have already fixed a lot of problems that were marked with those, also IntelluiJ gotten a bit smarter on not detecting false positives as well as understand more pylint annotation. Wherever the problem remained we replaced it with # noqa comments - as it is also well understood by IntelliJ.	2020-08-25 00:01:37 +02:00
Jarek Potiuk	f2da6b419f	Updated documentation for the CI with mermaid sequence diagrams (#10380 )	2020-08-24 22:45:28 +02:00
Jarek Potiuk	8fdcc5760a	Make www/views.py pylint compatible (#10498 )	2020-08-24 22:38:23 +02:00
Jarek Potiuk	be1a67b93f	Make models/crypto.py Pylint-compatible (#10500 )	2020-08-24 16:36:04 +01:00

... 2 3 4 5 6 ...

1023 Коммитов