incubator-airflow

Граф коммитов

Автор	SHA1	Сообщение	Дата
Tomek Urbaszek	7c66936985	Add Github Code Scanning (#11211 ) Github just released Github Code Scanning: https://github.blog/2020-09-30-code-scanning-is-now-available/	2020-10-03 15:11:11 +02:00
Tomek Urbaszek	f697ff2381	Move test tools from tests.utils to tests.test_utils (#10889 )	2020-10-03 14:27:06 +02:00
Kaxil Naik	3db2e7cbfb	Breeze: Fix issue with pulling an image via ID (#11255 )	2020-10-03 12:56:19 +01:00
Kaxil Naik	ee812665c9	Add missing "example" tag on example DAG (#11253 ) `example_task_group` and `example_nested_branch_dag` didn't have the example tag while all the other ones do have it	2020-10-03 11:39:40 +01:00
Wyatt Shapiro	6d573e8abb	Add s3 key to template fields for s3/redshift transfer operators (#10890 )	2020-10-03 12:23:26 +02:00
Ephraim Anierobi	4210618789	Ensure target_dedicated_nodes or enable_auto_scale is set in AzureBatchOperator (#11251 )	2020-10-03 10:59:51 +01:00
Kaxil Naik	b7183ded04	Update yamllint & isort pre-commit hooks (#11252 ) yamllint: v1.24.2 -> v1.25.0 isort: 5.5.3 -> 5.5.4	2020-10-03 11:46:24 +02:00
Arunvel Sriram	e4125666b5	Add option to bulk clear DAG Runs in Browse DAG Runs page (#11226 ) closes: #11076	2020-10-03 10:30:08 +01:00
Kaxil Naik	0a0e1af800	Fix Broken Markdown links in Providers README TOC (#11249 )	2020-10-03 10:00:27 +01:00
Kaxil Naik	96626260dc	Remove redundant parentheses (#11248 )	2020-10-03 09:47:10 +01:00
Daniel Imberman	7338912a78	Add task adoption to CeleryKubernetesExecutor (#11244 ) Routes task adoption based on queue name to CeleryExecutor or KubernetesExecutor Co-authored-by: Daniel Imberman <daniel@astronomer.io>	2020-10-02 11:51:11 -07:00
Jarek Potiuk	ca4238eb4d	Fixed month in backport packages to October (#11242 )	2020-10-02 18:31:21 +02:00
Jarek Potiuk	6d7c143e8e	Small updates to provider preparation docs. (#11240 )	2020-10-02 17:22:45 +02:00
Jarek Potiuk	983e5a62df	Restore description for provider packages. (#11239 ) The change #10445 caused empty descriptions for all packages. This change restores it and also makes sure package creation works when there is no README.md	2020-10-02 17:20:08 +02:00
Ryan Hamilton	24d0ecf4ee	Airflow 2.0 UI Overhaul/Refresh (#11195 ) Resolves #10953. A refreshed UI for the 2.0 release. The existing "theming" is a bit long in the tooth and this PR attempts to give it a modern look and some freshness to compliment all of the new features under the hood. The majority of the changes to UI have been done through updates to the Bootstrap theme contained in bootstrap-theme.css. These are simply overrides to the default stylings that are packaged with Bootstrap.	2020-10-02 15:58:58 +01:00
Jarek Potiuk	5220e4c384	Prepare Backport release 2020.09.07 (#11238 )	2020-10-02 16:01:14 +02:00
Tomek Urbaszek	0382f7728e	Use more meaningfull message for DagBag timeouts (#11235 ) Instead of 'Timeout, PID: 1234' we can use something more meaningful that will help users understand the logs.	2020-10-02 13:04:51 +01:00
Patrick Cando	e37dfc8588	Add Python version to Breeze cmd (#11228 )	2020-10-02 11:04:19 +02:00
Tobiasz Kędzierski	18f6cf138b	Fix typo in command in CI.rst (#11233 )	2020-10-02 10:50:28 +02:00
Satyasheel	720912f67b	Strict type check for multiple providers (#11229 )	2020-10-02 02:13:39 +01:00
Jed Cunningham	c74b3ac79a	Optional import error tracebacks in web ui (#10663 ) This PR allows for partial import error tracebacks to be exposed on the UI, if enabled. This extra context can be very helpful for users without access to the parsing logs to determine why their DAGs are failing to import properly.	2020-10-01 21:48:48 +02:00
Daniel Imberman	3ca11eb9b0	Kubernetes executor can adopt tasks from other schedulers (#10996 ) * KubernetesExecutor can adopt tasks from other schedulers * simplify * recreate tables properly * fix pylint Co-authored-by: Daniel Imberman <daniel@astronomer.io>	2020-10-01 12:07:38 -07:00
James Timmins	427a4a8f01	Replace get accessible dag ids (#11027 )	2020-10-01 17:37:00 +01:00
Satyasheel	b6d5d1e985	Strict type checking for SSH (#11216 )	2020-10-01 11:44:33 +02:00
Satyasheel	5093245d6f	Strict type coverage for Oracle and Yandex provider (#11198 ) * type coverage for yandex provider * type coverage for oracle provider * import optimisation and mypy fix * import optimisation * static check fix	2020-09-30 23:26:41 +02:00
Michał Słowikowski	00ffedb8c4	Add amazon glacier to GCS transfer operator (#10947 ) Add Amazon Glacier to GCS transfer operator, Glacier job operator and sensor.	2020-09-30 14:59:26 +02:00
Daniel Imberman	9860719c72	[AIRFLOW-5545] Fixes recursion in DAG cycle testing (#6175 ) * Fixes an issue where cycle detection uses recursion and stack overflows after about 1000 tasks (cherry picked from commit 63f1a180a17729aa937af642cfbf4ddfeccd1b9f) * reduce test length * slightly more efficient * Update airflow/utils/dag_cycle_tester.py Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com> * slightly more efficient * actually works this time Co-authored-by: Daniel Imberman <daniel@astronomer.io> Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>	2020-09-29 11:34:55 -07:00
Jarek Potiuk	ebd7150862	More customizable build process for Docker images (#11176 ) * Allows more customizations for image building. This is the third (and not last) part of making the Production image more corporate-environment friendly. It's been prepared for the request of one of the big Airflow user (company) that has rather strict security requirements when it comes to preparing and building images. They are committed to synchronizing with the progress of Apache Airflow 2.0 development and making the image customizable so that they can build it using only sources controlled by them internally was one of the important requirements for them. This change adds the possibilty of customizing various steps in the build process: * adding custom scripts to be run before installation of both build image and runtime image. This allows for example to add installing custom GPG keys, and adding custom sources. * customizing the way NodeJS and Yarn are installed in the build image segment - as they might rely on their own way of installation. * adding extra packages to be installed during both build and dev segment build steps. This is crucial to achieve the same size optimizations as the original image. * defining additional environment variables (for example environment variables that indicate acceptance of the EULAs in case of installing proprietary packages that require EULA acceptance - both in the build image and runtime image (again the goal is to keep the image optimized for size) The image build process remains the same when no customization options are specified, but having those options increases flexibility of the image build process in corporate environments. This is part of #11171. This change also fixes some of the issues opened and raised by other users of the Dockerfile. Fixes: #10730 Fixes: #10555 Fixes: #10856 Input from those issues has been taken into account when this change was designed so that the cases described in those issues could be implemented. Example from one of the issue landed as an example way of building highly customized Airflow Image using those customization options. Depends on #11174 * Update IMAGES.rst Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>	2020-09-29 15:30:00 +02:00
Jarek Potiuk	17c810ec36	Fixes image tag readonly failure (#11194 ) The image builds fine, but produces an unnecessary error message. Bug Introduced in `c9a34d2ef9`	2020-09-29 13:07:51 +02:00
Omair Khan	68e0eb6976	in_container bats pre-commit hook and updated bats-tests hook (#11179 )	2020-09-29 11:59:06 +02:00
Jarek Potiuk	c9a34d2ef9	Optionally tags image when building with Breeze (#11181 ) Breeze tags the image based on the default python version, branch, type of the image, but you might want to tag the image in the same command especially in automated cases of building the image via CI scripts or security teams that tag the imge based on external factors (build time, person etc.). This is part of #11171 which makes the image easier to build in corporate environments.	2020-09-29 11:45:37 +02:00
Kaxil Naik	4ff1290d8d	Remove Unnecessary comprehension in 'any' builtin function (#11188 ) The inbuilt functions `any()` support short-circuiting (evaluation stops as soon as the overall return value of the function is known), but this behavior is lost if you use comprehension. This affects performance.	2020-09-29 07:47:32 +02:00
Kaxil Naik	2ec12474ff	Fix typos in Dockerfile.ci (#11187 ) Fixed some spellings	2020-09-29 07:41:05 +02:00
Michał Słowikowski	42f1da179d	Improve Google Transfer header in documentation index file (#11166 )	2020-09-28 22:51:16 +02:00
Ash Berlin-Taylor	6694eaa831	Show the location of the queries when the assert_queries_count fails. (#11186 ) Example output (I forced one of the existing tests to fail) ``` E AssertionError: The expected number of db queries is 3. The current number is 2. E E Recorded query locations: E scheduler_job.py:_run_scheduler_loop>scheduler_job.py:_emit_pool_metrics>pool.py:slots_stats:94: 1 E scheduler_job.py:_run_scheduler_loop>scheduler_job.py:_emit_pool_metrics>pool.py:slots_stats:101: 1 ``` This makes it a bit easier to see what the queries are, without having to re-run with full query tracing and then analyze the logs.	2020-09-28 19:39:21 +01:00
Tomek Urbaszek	e2dc706b08	Make kill log in DagFileProcessorProcess more informative (#11124 )	2020-09-28 00:24:58 +02:00
Jarek Potiuk	4d2a787070	Enables Kerberos sidecar support (#11130 ) Some of the users of Airflow are using Kerberos to authenticate their worker workflows. Airflow has a basic support for Kerberos for some of the operators and it has support to refresh the temporary Kerberos tokens via `airflow kerberos` command. This change adds support for the Kerberos side-car that connects to the Kerberos Key Distribution Center and retrieves the token using Keytab that should be deployed as Kubernetes Secret. It uses shared volume to share the temporary token. The nice thing about setting it up as a sidecar is that the Keytab is never shared with the workers - the secret is only mounted by the sidecar and the workers have only access to the temporary token. Depends on #11129	2020-09-28 00:13:36 +02:00
Daniel Imberman	a888198c27	Allow overrides for pod_template_file (#11162 ) * Allow overrides for pod_template_file A pod_template_file should be treated as a template not a steadfast rule. This PR ensures that users can override individual values set by the pod_template_file s.t. the same file can be used for multiple tasks. * fix podtemplatetest * fix name	2020-09-27 23:39:35 +02:00
Jarek Potiuk	0ea3e611d3	Adds Kubernetes Service Account for the webserver (#11131 ) Webserver did not have a Kubernetes Service Account defined and while we do not strictly need to use the service account for anything now, having the Service Account defined allows to define various capabilities for the webserver. For example when you are in the GCP environment, you can map the Kubernetes service account into a GCP one, using Workload Identity without the need to define any secrets and performing additional authentication. Then you can have that GCP service account get the permissions to write logs to GCS bucket. Similar mechanisms exist in AWS and it also opens up on-premises configuration. See more at https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity Co-authored-by: Jacob Ferriero <jferriero@google.com> Co-authored-by: Jacob Ferriero <jferriero@google.com>	2020-09-27 23:39:14 +02:00
Satyasheel	54353f8745	Increase type coverage for five different providers (#11170 ) * Increasing type coverage for five different providers * Added more strict type	2020-09-27 20:00:27 +02:00
Ephraim Anierobi	cb52fb0ae1	Add example DAG and system test for MySQLToGCSOperator (#10990 )	2020-09-27 19:05:04 +02:00
Jarek Potiuk	044b441257	Conditional MySQL Client installation (#11174 ) This is the second step of making the Production Docker Image more corporate-environment friendly, by making MySQL client installation optional. Instaling MySQL Client on Debian requires to reach out to oracle deb repositories which might not be approved by security teams when you build the images. Also not everyone needs MySQL client or might want to install their own MySQL client or MariaDB client - from their own repositories. This change makes the installation step separated out to script (with prod/dev installation option). The prod/dev separation is needed because MySQL needs to be installed with dev libraries in the "Build" segment of the image (requiring build essentials etc.) but in "Final" segment of the image only runtime libraries are needed. Part of #11171 Depends on #11173.	2020-09-27 18:56:58 +02:00
mucio	0db7a30782	New Breeze command start-airflow, it replaces the previous flag (#11157 )	2020-09-27 18:31:50 +02:00
Kamil Breguła	2d831fbbc5	Update UPDATING.md (#11172 )	2020-09-27 18:24:24 +02:00
Jarek Potiuk	f16354bc02	Optionally disables PIP cache from GitHub during the build (#11173 ) This is first step of implementing the corporate-environment friendly way of building images, where in the corporate environment, this might not be possible to install the packages using the GitHub cache initially. Part of #11171	2020-09-27 18:00:03 +02:00
Satyasheel	0161b5ea2b	Increasing type coverage for multiple provider (#11159 )	2020-09-26 15:40:28 +01:00
Satyasheel	08dfd8cd00	Increase Type coverage for IMAP provider (#11154 )	2020-09-25 21:08:26 +01:00
Ash Berlin-Taylor	ee90807ace	Massively speed up the query returned by TI.filter_for_tis (#11147 ) The previous query generated SQL like this: ``` WHERE (task_id = ? AND dag_id = ? AND execution_date = ?) OR (task_id = ? AND dag_id = ? AND execution_date = ?) ``` Which is fine for one or maybe even 100 TIs, but when testing DAGs at extreme size (over 21k tasks!) this query was taking for ever (162s on Postgres, 172s on MySQL 5.7) By changing this query to this ``` WHERE task_id IN (?,?) AND dag_id = ? AND execution_date = ? ``` the time is reduced to 1s! (1.03s on Postgres, 1.19s on MySQL) Even on 100 tis the reduction is large, but the overall time is not significant (0.01451s -> 0.00626s on Postgres). Times included SQLA query construction time (but not time for calling filter_for_tis. So a like-for-like comparison), not just DB query time: ```python ipdb> start_filter_20k = time.monotonic(); result_filter_20k = session.query(TI).filter(tis_filter).all(); end_filter_20k = time.monotonic() ipdb> end_filter_20k - start_filter_20k 172.30647455298458 ipdb> in_filter = TI.dag_id == self.dag_id, TI.execution_date == self.execution_date, TI.task_id.in_([o.task_id for o in old_states.keys()]); ipdb> start_20k_custom = time.monotonic(); result_custom_20k = session.query(TI).filter(in_filter).all(); end_20k_custom = time.monotonic() ipdb> end_20k_custom - start_20k_custom 1.1882996069907676 ``` I have also removed the check that was ensuring everything was of the same type (all TaskInstance or all TaskInstanceKey) as it felt needless - both types have the three required fields, so the "duck-typing" approach at runtime (crash if doesn't have the required property)+mypy checks felt Good Enough.	2020-09-25 20:49:11 +01:00
Kamil Breguła	b92c60af8a	Add new member to Polidea (#11153 )	2020-09-25 20:31:03 +02:00
Jarek Potiuk	c65d46634c	Update to latest version of pbgouncer-exporter (#11150 ) There was a problem with Mac version of pgbouncer exporter created and released previously. This commit releases the latest version making sure that Linux Go is used to build the pgbouncer binary.	2020-09-25 18:55:26 +02:00

1 2 3 4 5 ...

10078 Коммитов Все ветки Поиск

10078 Коммитов

Все ветки