incubator-airflow

Граф коммитов

Автор	SHA1	Сообщение	Дата
Jarek Potiuk	45d33dbd43	Add capability of customising PyPI sources (#11385 ) * Add capability of customising PyPI sources This change adds capability of customising installation of PyPI modules via custom .pypirc file. This might allow to install dependencies from in-house, vetted registry of PyPI	2020-10-11 06:19:57 +02:00
Jarek Potiuk	04973904c3	Constraints and PIP packages can be installed from local sources (#11382 ) * Constraints and PIP packages can be installed from local sources This is the final part of implementing #11171 based on feedback from enterprise customers we worked with. They want to have a capability of building the image using binary wheel packages that are locally available and the official Dockerfile. This means that besides the official APT sources the Dockerfile build should not needd GitHub, nor any other external files pulled from outside including PIP repository. This change also includes documentation on how to prepare set of such binaries ready for inspection and review by security teams in Enterprise environment. Such sets of "known-working-binary-whl" files can then be separately committed, tracked and scrutinized in an artifact repository of such an Enterprise. Fixes: #11171 * Update docs/production-deployment.rst	2020-10-10 12:58:09 +02:00
Jarek Potiuk	ebd7150862	More customizable build process for Docker images (#11176 ) * Allows more customizations for image building. This is the third (and not last) part of making the Production image more corporate-environment friendly. It's been prepared for the request of one of the big Airflow user (company) that has rather strict security requirements when it comes to preparing and building images. They are committed to synchronizing with the progress of Apache Airflow 2.0 development and making the image customizable so that they can build it using only sources controlled by them internally was one of the important requirements for them. This change adds the possibilty of customizing various steps in the build process: * adding custom scripts to be run before installation of both build image and runtime image. This allows for example to add installing custom GPG keys, and adding custom sources. * customizing the way NodeJS and Yarn are installed in the build image segment - as they might rely on their own way of installation. * adding extra packages to be installed during both build and dev segment build steps. This is crucial to achieve the same size optimizations as the original image. * defining additional environment variables (for example environment variables that indicate acceptance of the EULAs in case of installing proprietary packages that require EULA acceptance - both in the build image and runtime image (again the goal is to keep the image optimized for size) The image build process remains the same when no customization options are specified, but having those options increases flexibility of the image build process in corporate environments. This is part of #11171. This change also fixes some of the issues opened and raised by other users of the Dockerfile. Fixes: #10730 Fixes: #10555 Fixes: #10856 Input from those issues has been taken into account when this change was designed so that the cases described in those issues could be implemented. Example from one of the issue landed as an example way of building highly customized Airflow Image using those customization options. Depends on #11174 * Update IMAGES.rst Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>	2020-09-29 15:30:00 +02:00
Jarek Potiuk	044b441257	Conditional MySQL Client installation (#11174 ) This is the second step of making the Production Docker Image more corporate-environment friendly, by making MySQL client installation optional. Instaling MySQL Client on Debian requires to reach out to oracle deb repositories which might not be approved by security teams when you build the images. Also not everyone needs MySQL client or might want to install their own MySQL client or MariaDB client - from their own repositories. This change makes the installation step separated out to script (with prod/dev installation option). The prod/dev separation is needed because MySQL needs to be installed with dev libraries in the "Build" segment of the image (requiring build essentials etc.) but in "Final" segment of the image only runtime libraries are needed. Part of #11171 Depends on #11173.	2020-09-27 18:56:58 +02:00
Jarek Potiuk	f16354bc02	Optionally disables PIP cache from GitHub during the build (#11173 ) This is first step of implementing the corporate-environment friendly way of building images, where in the corporate environment, this might not be possible to install the packages using the GitHub cache initially. Part of #11171	2020-09-27 18:00:03 +02:00
Jarek Potiuk	4a46f4368b	Allows to build production images for 1.10.2 and 1.10.1 Airflow (#10983 ) Airflow below 1.10.2 required SLUGIFY_USES_TEXT_UNIDECODE env variable to be set to yes. Our production Dockerfile and Breeze supports building images for any version of airflow >= 1.10.1 but it failed on 1.10.2 and 1.10.1 because this variable was not set. You can now set the variable when building image manually and Breeze does it automatically if image is 1.10.1 or 1.10.2 Fixes #10974	2020-09-17 14:25:34 +02:00
Jarek Potiuk	d9920faa80	The entrypoints in Docker Image should be owned by Airflow (#10853 ) Since we are running the airflow image as airflow user, the entrypoint and clear-logs scripts should also be set as airflow. This had no impact if you actually run this as root user or when your group was root (which was recommended).	2020-09-12 10:54:25 +02:00
Jarek Potiuk	018ae0ed95	The PIP version is not pinned to 19.0.2 any more (#10542 ) Fixes #10516	2020-08-25 15:45:59 +02:00
Jarek Potiuk	1cf1af664f	Do not override in_container scripts when building the image (#10442 ) After #10368, we've changed the way we build the images on CI. We are overriding the ci scripts that we use to build the image with the scripts taken from master to not give roque PR authors the possibiility to run something with the write credentials. We should not override the in_container scripts, however because they become part of the image, so we should use those that came with the PR. That's why we have to move the "in_container" scripts out of the "ci" folder and only override the "ci" folder with the one from master. We've made sure that those scripts in ci are self-contained and they do not need reach outside of that folder. Also the static checks are done with local files mounted on CI because we want to check all the files - not only those that are embedded in the container.	2020-08-21 17:21:57 +02:00
Jarek Potiuk	e17985382c	Kubernetes image is extended rather than customized (#10399 ) The EMBEDDED dags were only really useful for testing but it required to customise built production image (run with extra --build-arg flag). This is not needed as it is better to extend the image instead with FROM and add dags afterwards. This way you do not have to rebuild the image while iterating on it.	2020-08-19 14:19:05 +02:00
Jarek Potiuk	306a6660fd	Docker images are now consistently labelled and a bit smaller (#10387 ) Extracted from #10368	2020-08-19 02:03:22 +02:00
Jarek Potiuk	de9eaeb434	Constraint files are now maintained automatically (#9889 ) * Constraint files are now maintained automatically * No need to generate requirements when setup.py changes * requirements are kept in separate orphan branches not in main repo * merges to master verify if latest requirements are working and push tested requirements to orphaned branches * we keep history of requirement changes and can label them individually for each version (by constraint-1.10.n tag name) * consistently changed all references to be 'constraints' not 'requirements'	2020-07-20 14:36:03 +02:00
Jarek Potiuk	593a0ddaae	Remove package.json and yarn.lock from the prod image (#9814 ) Closes #9810	2020-07-14 16:34:21 +02:00
Jarek Potiuk	8f6b8378aa	The group of embedded DAGs should be root to be OpenShift compatible (#9794 )	2020-07-13 20:47:55 +02:00
Jarek Potiuk	8bd15ef634	Switches to Helm Chart for Kubernetes tests (#9468 ) The Kubernetes tests are now run using Helm chart rather than the custom templates we used to have. The Helm Chart uses locally build production image so the tests are testing not only Airflow but also Helm Chart and a Production image - all at the same time. Later on we will add more tests covering more functionalities of both Helm Chart and Production Image. This is the first step to get all of those bundle together and become testable. This change introduces also 'shell' sub-command for Breeze's kind-cluster command and EMBEDDED_DAGS build args for production image - both of them useful to run the Kubernetes tests more easily - without building two images and with an easy-to-iterate-over-tests shell command - which works without any other development environment. Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Co-authored-by: Daniel Imberman <daniel@astronomer.io>	2020-07-01 14:50:30 +02:00
Jarek Potiuk	cf510a30fb	Make Production Dockerfile OpenShift-compatible (#9545 ) OpenShift (and other Kubernetes platforms) often use the approach that they start containers with random user and root group. This is described in the https://docs.openshift.com/container-platform/3.7/creating_images/guidelines.html All the files created by the "airflow" user are now belonging to 'root' group and the root group has the same access to those files as the Airflow user. Additionally, the random user gets automatically added /etc/passwd entry which is name 'default'. The name of the user can be set by setting the USER_NAME variable when starting the container. Closes #9248 Closes #8706	2020-06-27 14:29:55 +02:00
Jarek Potiuk	2cf167b047	Gunicorn works better if temporary folder uses tmpfs (#9534 ) This is discussed in the documentation of gunicorn. You can find more information here: https://docs.gunicorn.org/en/stable/faq.html#how-do-i-avoid-gunicorn-excessively-blocking-in-os-fchmod Since we are using docker, we always have shared memory available (at least 64MB). Closes #9379	2020-06-26 16:41:21 +02:00
Jarek Potiuk	7c12a9d4e0	Improve production image iteration speed (#9162 ) For a long time the way how entrypoint worked in ci scripts was wrong. The way it worked was convoluted and short of black magic. This did not allow to pass multiple test targets and required separate execute command scripts in Breeze. This is all now straightened out and both production and CI image are always using the right entrypoint by default and we can simply pass parameters to the image as usual without escaping strings. This also allowed to remove some breeze commands and change names of several flags in Breeze to make them more meaningful. Both CI and PROD image have now embedded scripts for log cleaning. History of image releases is added for 1.10.10-* alpha quality images.	2020-06-16 12:36:46 +02:00
zikun	82c8343ab6	Support additional apt dependencies (#9189 ) * Add ADDITONAL_DEV_DEPS and ADDITONAL_RUNTIME_DEPS * Add examples for additional apt dev and runtime dependencies * Update comment * Fix typo	2020-06-09 23:05:43 +02:00
Jarek Potiuk	738667082d	Additional python extras and deps can be set in breeze (#9035 ) Closes #8604 Closes #8866	2020-05-27 17:09:11 +02:00
Fabian	5a7a3d13ee	Add ADDITIONAL_AIRFLOW_EXTRAS (#9032 ) * Add build-arg ADDITIONAL_AIRFLOW_EXTRAS * Add ADDITIONAL_AIRFLOW_EXTRAS example and description	2020-05-27 12:58:59 +02:00
Fabian	6fc555d0bc	Add ADDITIONAL_PYTHON_DEPS (#9031 ) * add build-arg ADDITIONAL_PYTHON_DEPS * Add ADDITIONAL_PYTHON_DEPS example and description Co-authored-by: Fabian Witt <fabian.witt@redheads.de>	2020-05-27 11:52:26 +02:00
Jarek Potiuk	064cb67ae5	Pin Hadolint to version released 2020.04.20 (#8485 )	2020-04-21 13:33:11 +02:00
Kaxil Naik	6c5fba2570	Remove duplicate dependency ('curl') from Dockerfile (#8412 )	2020-04-17 16:09:20 +01:00
Hao Liang	bc230a9711	Fix subcommand error when running production image without argument (#8415 ) Co-authored-by: Liang Hao <liahao@tesla.com>	2020-04-17 14:47:23 +02:00
Daniel Imberman	baa61c9c84	Add migration waiting script and log cleaner (#8219 ) * Add migration waiting script and log cleaner This PR creates a "migration spinner" that allows the webserver to wait for all database migrations to complete before starting up. Is a necessary component before we can merge the helm chart. * Update airflow/cli/cli_parser.py Co-Authored-By: Tomek Urbaszek <turbaszek@gmail.com> Co-authored-by: Tomek Urbaszek <tomasz.urbaszek@polidea.com> Co-authored-by: Tomek Urbaszek <turbaszek@gmail.com>	2020-04-16 00:12:41 -07:00
Felix Uellendall	cf6c254ebd	Expose Airflow Webserver Port in Production Docker Image (#8228 )	2020-04-15 13:05:02 +02:00
Jarek Potiuk	4e8a979d03	Docker image build include now releses 1.10.10 version (#8234 ) It also installs properly on Mac as well as it auto-detects if yarn prod is needed - based on presence of proper package.json in either www or www_rbac which makes it simpler for remote installations.	2020-04-10 15:07:15 +02:00
Jarek Potiuk	07fd0d71c8	Add Production Docker image support (#7832 )	2020-04-02 18:52:11 +01:00
Jarek Potiuk	210de87d6d	Move Dockerfile to Dockerfile.ci (#7829 )	2020-03-23 08:56:26 +01:00
Kamil Breguła	8465d66f05	Remove airflow.bin package (#7808 )	2020-03-22 22:01:06 +01:00
Jarek Potiuk	8c5638832f	[AIRFLOW-7067] Pinned version of Apache Airflow (#7730 )	2020-03-22 13:34:48 +01:00
Jarek Potiuk	cabd684b46	[AIRFLOW-7097] Install gcloud beta componensts in CI image (#7772 )	2020-03-21 15:45:10 +01:00
Jarek Potiuk	dced43bee9	[AIRFLOW-6946] Switch to MySQL 5.7 in 2.0 as base (#7570 ) Switch to MySQL 5.7 in tests. Fixes the utf8mb4 encoding issue where utf8mb4 encoding produces too long keys for mysql to handle in XCom table. You can optionally specify a separate option to set encoding differently for the columns that are part of the index - dag_id, task_id and key.	2020-03-14 22:24:03 +01:00
Ash Berlin-Taylor	ef71ac6a22	[AIRFLOW-7029] Use separate docker image for running license check (#7678 ) Each stage of the CI tests needs to pull our `ci` image. By removing java from it we can save 1-2minutes from each test stage. This is part of that work.	2020-03-13 18:54:22 +00:00
Jarek Potiuk	cad20c28da	[AIRFLOW-5842] Swtch to Debian buster image as a base (#7647 )	2020-03-07 20:20:05 +01:00
Kamil Breguła	609707eddf	[AIRFLOW-6967] Add tests to avoid performance regression in DagFileProcessor (#7602 )	2020-03-07 19:13:00 +01:00
Vanessasaurus	0bb687990b	[AIRFLOW-4030] second attempt to add singularity to airflow (#7191 ) * adding singularity operator and tests Signed-off-by: Vanessa Sochat <vsochat@stanford.edu> * removing encoding pragmas and fixing up dockerfile to pass linting Signed-off-by: Vanessa Sochat <vsochat@stanford.edu> * make workdir in /tmp because AIRFLOW_SOURCES not defined yet Signed-off-by: Vanessa Sochat <vsochat@stanford.edu> * curl needs to follow redirects with -L Signed-off-by: Vanessa Sochat <vsochat@stanford.edu> * moving files to where they are supposed to be, more changes to mock, no clue Signed-off-by: vsoch <vsochat@stanford.edu> * removing trailing whitespace, moving example_dag for singularity, adding licenses to empty init files Signed-off-by: vsoch <vsochat@stanford.edu> * ran isort on example dags file Signed-off-by: vsoch <vsochat@stanford.edu> * adding missing init in example_dags folder for singularity Signed-off-by: vsoch <vsochat@stanford.edu> * removing code from __init__.py files for singularity operator to fix documentation generation Signed-off-by: vsoch <vsochat@stanford.edu> * forgot to update link to singularity in operators and hooks ref Signed-off-by: vsoch <vsochat@stanford.edu> * command must have been provided on init of singularity operator instance Signed-off-by: vsoch <vsochat@stanford.edu> * I guess I'm required to have a task_id? Signed-off-by: vsoch <vsochat@stanford.edu> * try adding working_dir to singularity operator type definitions Signed-off-by: vsoch <vsochat@stanford.edu> * disable too many arguments for pylint of singularity operator init Signed-off-by: vsoch <vsochat@stanford.edu> * move pylint disable up to line 64 - doesnt catch at end of statement like other examples Signed-off-by: vsoch <vsochat@stanford.edu> * two spaces before inline comment Signed-off-by: vsoch <vsochat@stanford.edu> * I dont see task_id as a param for other providers, removing for singularity operator Signed-off-by: vsoch <vsochat@stanford.edu> * adding debug print Signed-off-by: vsoch <vsochat@stanford.edu> * allow for return of just image and/or lines Signed-off-by: vsoch <vsochat@stanford.edu> * dont understand how mock works, but the image should exist after its pulled.... Signed-off-by: vsoch <vsochat@stanford.edu> * try removing shutil, the client should handle pull folder instead Signed-off-by: vsoch <vsochat@stanford.edu> * try changing pull-file to same uri that is expected to be pulled Signed-off-by: vsoch <vsochat@stanford.edu> * import of AirflowException moved to exceptions Signed-off-by: vsoch <vsochat@stanford.edu> * DAG module was moved to airflow.models Signed-off-by: vsoch <vsochat@stanford.edu> * ensure pull is called with pull_folder Signed-off-by: vsoch <vsochat@stanford.edu>	2020-02-23 10:49:47 +01:00
Kamil Breguła	175a160463	[AIRFLOW-6828] Stop using the zope library (#7448 )	2020-02-19 13:08:48 +01:00
Ash Berlin-Taylor	cec9249f90	[AIRFLOW-6818] Prevent Docker cache-busting on when editing www templates (#7432 ) There is two parts to this PR: 1. Only copying www/webpack.config.js and www/static/ before running the asset pipeline 2. Making sure that _all_ files (not just the critical ones) have the same permissions.	2020-02-16 17:58:46 +01:00
Jarek Potiuk	627365ab74	Revert "[AIRFLOW-XXXX] Prevent Docker cache-busting on when editing www templates (#7427 )" This reverts commit `3eb30ed12c`.	2020-02-16 10:10:37 +01:00
Ash Berlin-Taylor	3eb30ed12c	[AIRFLOW-XXXX] Prevent Docker cache-busting on when editing www templates (#7427 ) There is two parts to this PR: 1. Only copying www/webpack.config.js and www/static/ before running the asset pipeline 2. Making sure that _all_ files (not just the critical ones) have the same permissions. The goal of both of these is to make sure that the docker build cache for the "expensive" operations (installing NPM modules, running asset pipeline, installing python modules) isn't run when it isn't necessary.	2020-02-15 20:24:35 +00:00
Jarek Potiuk	945b988cc2	[AIRFLOW-6662] install dumb init (#7300 ) * Revert "[AIRFLOW-6662] Switch to --init docker flag for signal propagation (#7278)" This reverts commit `d1bf343ffe`. * [AIRFLOW-6662] return back the dumb-init - installed by apt We had stability problems with tests with --init flag so we are going back to it	2020-02-02 11:13:04 +01:00
Jarek Potiuk	d7d2794d05	[AIRFLOW-6701] Rat is downloaded from stable backup/mirrors (#7323 ) Also curl options are now using long format and include --fail to protect against some temporary errors (5xx). Also RAT download uses now two possible sources of downloads and fallbacks to the second if first is not available.	2020-02-02 11:11:38 +01:00
Jarek Potiuk	d1bf343ffe	[AIRFLOW-6662] Switch to --init docker flag for signal propagation (#7278 ) We are now using native --init flag of docker run and init: parameter of docker compose to pass signals and reap child processes	2020-01-29 14:07:34 +01:00
dstandish	2a819b11fb	[AIRFLOW-6296] add OdbcHook & deprecation warning for pymssql (#6850 )	2020-01-19 07:54:56 +01:00
Jarek Potiuk	73403cc8f4	[AIRFLOW-5704] Improve Kind Kubernetes scripts for local testing (#6516 ) * Fixed problem that Kubernetes tests were testing latest master rather than what came from the local sources. * Kind (Kubernetes in Dcocker) is run in the same Docker as Breeze env * Moved Kubernetes scripts to 'in_container' dir where they belong now * Kubernetes cluster is reused until it is stopped * Kubernetes image is build from image in docker already + mounted sources * Kubectl version name is corrected in the Dockerfile * KUBERNETES_VERSION can now be used to select Kubernetes version * Running kubernetes scripts is now easy in Breeze * We can start/recreate/stop cluster using --<ACTION>-kind-cluster * Instructions on how to run Kubernetes tests are updated * The old "bare" environment is replaced by --no-deps switch	2020-01-11 16:25:19 +01:00
Kamil Breguła	9fce4eca4e	[AIRFLOW-6470] Avoid pipe to file when do curl (#7063 )	2020-01-05 19:00:02 +01:00
Kamil Breguła	5ae2f968e5	[AIRFLOW-6462] Limit exported variables in Dockerfile/Breeze (#7057 )	2020-01-05 17:11:34 +01:00
Kamil Breguła	4c0cbe5843	[AIRFLOW-6465] Add bash autocomplete for airflow in Breeze (#7060 )	2020-01-05 10:22:07 +01:00

1 2

89 Коммитов