Граф коммитов

10078 Коммитов

Автор SHA1 Сообщение Дата
Tomek Urbaszek 7c66936985
Add Github Code Scanning (#11211)
Github just released Github Code Scanning:
https://github.blog/2020-09-30-code-scanning-is-now-available/
2020-10-03 15:11:11 +02:00
Tomek Urbaszek f697ff2381
Move test tools from tests.utils to tests.test_utils (#10889) 2020-10-03 14:27:06 +02:00
Kaxil Naik 3db2e7cbfb
Breeze: Fix issue with pulling an image via ID (#11255) 2020-10-03 12:56:19 +01:00
Kaxil Naik ee812665c9
Add missing "example" tag on example DAG (#11253)
`example_task_group` and `example_nested_branch_dag` didn't have the example tag while all the other ones do have it
2020-10-03 11:39:40 +01:00
Wyatt Shapiro 6d573e8abb
Add s3 key to template fields for s3/redshift transfer operators (#10890) 2020-10-03 12:23:26 +02:00
Ephraim Anierobi 4210618789
Ensure target_dedicated_nodes or enable_auto_scale is set in AzureBatchOperator (#11251) 2020-10-03 10:59:51 +01:00
Kaxil Naik b7183ded04
Update yamllint & isort pre-commit hooks (#11252)
yamllint: v1.24.2 -> v1.25.0
isort: 5.5.3 -> 5.5.4
2020-10-03 11:46:24 +02:00
Arunvel Sriram e4125666b5
Add option to bulk clear DAG Runs in Browse DAG Runs page (#11226)
closes: #11076
2020-10-03 10:30:08 +01:00
Kaxil Naik 0a0e1af800
Fix Broken Markdown links in Providers README TOC (#11249) 2020-10-03 10:00:27 +01:00
Kaxil Naik 96626260dc
Remove redundant parentheses (#11248) 2020-10-03 09:47:10 +01:00
Daniel Imberman 7338912a78
Add task adoption to CeleryKubernetesExecutor (#11244)
Routes task adoption based on queue name to CeleryExecutor
or KubernetesExecutor

Co-authored-by: Daniel Imberman <daniel@astronomer.io>
2020-10-02 11:51:11 -07:00
Jarek Potiuk ca4238eb4d
Fixed month in backport packages to October (#11242) 2020-10-02 18:31:21 +02:00
Jarek Potiuk 6d7c143e8e
Small updates to provider preparation docs. (#11240) 2020-10-02 17:22:45 +02:00
Jarek Potiuk 983e5a62df
Restore description for provider packages. (#11239)
The change #10445 caused empty descriptions for all packages.

This change restores it and also makes sure package creation works
when there is no README.md
2020-10-02 17:20:08 +02:00
Ryan Hamilton 24d0ecf4ee
Airflow 2.0 UI Overhaul/Refresh (#11195)
Resolves #10953.

A refreshed UI for the 2.0 release. The existing "theming" is a bit long in the tooth and this PR attempts to give it a modern look and some freshness to compliment all of the new features under the hood.

The majority of the changes to UI have been done through updates to the Bootstrap theme contained in bootstrap-theme.css. These are simply overrides to the default stylings that are packaged with Bootstrap.
2020-10-02 15:58:58 +01:00
Jarek Potiuk 5220e4c384
Prepare Backport release 2020.09.07 (#11238) 2020-10-02 16:01:14 +02:00
Tomek Urbaszek 0382f7728e
Use more meaningfull message for DagBag timeouts (#11235)
Instead of 'Timeout, PID: 1234' we can use something more meaningful
that will help users understand the logs.
2020-10-02 13:04:51 +01:00
Patrick Cando e37dfc8588
Add Python version to Breeze cmd (#11228) 2020-10-02 11:04:19 +02:00
Tobiasz Kędzierski 18f6cf138b
Fix typo in command in CI.rst (#11233) 2020-10-02 10:50:28 +02:00
Satyasheel 720912f67b
Strict type check for multiple providers (#11229) 2020-10-02 02:13:39 +01:00
Jed Cunningham c74b3ac79a
Optional import error tracebacks in web ui (#10663)
This PR allows for partial import error tracebacks to be exposed on the UI, if enabled. This extra context can be very helpful for users without access to the parsing logs to determine why their DAGs are failing to import properly.
2020-10-01 21:48:48 +02:00
Daniel Imberman 3ca11eb9b0
Kubernetes executor can adopt tasks from other schedulers (#10996)
* KubernetesExecutor can adopt tasks from other schedulers

* simplify

* recreate tables properly

* fix pylint

Co-authored-by: Daniel Imberman <daniel@astronomer.io>
2020-10-01 12:07:38 -07:00
James Timmins 427a4a8f01
Replace get accessible dag ids (#11027) 2020-10-01 17:37:00 +01:00
Satyasheel b6d5d1e985
Strict type checking for SSH (#11216) 2020-10-01 11:44:33 +02:00
Satyasheel 5093245d6f
Strict type coverage for Oracle and Yandex provider (#11198)
* type coverage for yandex provider

* type coverage for oracle provider

* import optimisation and mypy fix

* import optimisation

* static check fix
2020-09-30 23:26:41 +02:00
Michał Słowikowski 00ffedb8c4
Add amazon glacier to GCS transfer operator (#10947)
Add Amazon Glacier to GCS transfer operator, Glacier job operator and sensor.
2020-09-30 14:59:26 +02:00
Daniel Imberman 9860719c72
[AIRFLOW-5545] Fixes recursion in DAG cycle testing (#6175)
* Fixes an issue where cycle detection uses recursion

and stack overflows after about 1000 tasks

(cherry picked from commit 63f1a180a17729aa937af642cfbf4ddfeccd1b9f)

* reduce test length

* slightly more efficient

* Update airflow/utils/dag_cycle_tester.py

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>

* slightly more efficient

* actually works this time

Co-authored-by: Daniel Imberman <daniel@astronomer.io>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-09-29 11:34:55 -07:00
Jarek Potiuk ebd7150862
More customizable build process for Docker images (#11176)
* Allows more customizations for image building.

This is the third (and not last) part of making the Production
image more corporate-environment friendly. It's been prepared
for the request of one of the big Airflow user (company) that
has rather strict security requirements when it comes to
preparing and building images. They are committed to
synchronizing with the progress of Apache Airflow 2.0 development
and making the image customizable so that they can build it using
only sources controlled by them internally was one of the important
requirements for them.

This change adds the possibilty of customizing various steps in
the build process:

* adding custom scripts to be run before installation of both
  build image and runtime image. This allows for example to
  add installing custom GPG keys, and adding custom sources.

* customizing the way NodeJS and Yarn are installed in the
  build image segment - as they might rely on their own way
  of installation.

* adding extra packages to be installed during both build and
  dev segment build steps. This is crucial to achieve the same
  size optimizations as the original image.

* defining additional environment variables (for example
  environment variables that indicate acceptance of the EULAs
  in case of installing proprietary packages that require
  EULA acceptance - both in the build image and runtime image
  (again the goal is to keep the image optimized for size)

The image build process remains the same when no customization
options are specified, but having those options increases
flexibility of the image build process in corporate environments.

This is part of #11171.

This change also fixes some of the issues opened and raised by
other users of the Dockerfile.

Fixes: #10730
Fixes: #10555
Fixes: #10856

Input from those issues has been taken into account when this
change was designed so that the cases described in those issues
could be implemented. Example from one of the issue landed as
an example way of building highly customized Airflow Image
using those customization options.

Depends on #11174

* Update IMAGES.rst

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-09-29 15:30:00 +02:00
Jarek Potiuk 17c810ec36
Fixes image tag readonly failure (#11194)
The image builds fine, but produces an unnecessary error message.

Bug Introduced in c9a34d2ef9
2020-09-29 13:07:51 +02:00
Omair Khan 68e0eb6976
in_container bats pre-commit hook and updated bats-tests hook (#11179) 2020-09-29 11:59:06 +02:00
Jarek Potiuk c9a34d2ef9
Optionally tags image when building with Breeze (#11181)
Breeze tags the image based on the default python version,
branch, type of the image, but you might want to tag the image
in the same command especially in automated cases of building
the image via CI scripts or security teams that tag the imge
based on external factors (build time, person etc.).

This is part of #11171 which makes the image easier to build in
corporate environments.
2020-09-29 11:45:37 +02:00
Kaxil Naik 4ff1290d8d
Remove Unnecessary comprehension in 'any' builtin function (#11188)
The inbuilt functions `any()` support short-circuiting (evaluation stops as soon as the overall return value of the function is known), but this behavior is lost if you use comprehension. This affects performance.
2020-09-29 07:47:32 +02:00
Kaxil Naik 2ec12474ff
Fix typos in Dockerfile.ci (#11187)
Fixed some spellings
2020-09-29 07:41:05 +02:00
Michał Słowikowski 42f1da179d
Improve Google Transfer header in documentation index file (#11166) 2020-09-28 22:51:16 +02:00
Ash Berlin-Taylor 6694eaa831
Show the location of the queries when the assert_queries_count fails. (#11186)
Example output (I forced one of the existing tests to fail)

```
E   AssertionError: The expected number of db queries is 3. The current number is 2.
E
E   Recorded query locations:
E   	scheduler_job.py:_run_scheduler_loop>scheduler_job.py:_emit_pool_metrics>pool.py:slots_stats:94:	1
E   	scheduler_job.py:_run_scheduler_loop>scheduler_job.py:_emit_pool_metrics>pool.py:slots_stats:101:	1
```

This makes it a bit easier to see what the queries are, without having
to re-run with full query tracing and then analyze the logs.
2020-09-28 19:39:21 +01:00
Tomek Urbaszek e2dc706b08
Make kill log in DagFileProcessorProcess more informative (#11124) 2020-09-28 00:24:58 +02:00
Jarek Potiuk 4d2a787070
Enables Kerberos sidecar support (#11130)
Some of the users of Airflow are using Kerberos to authenticate
their worker workflows. Airflow has a basic support for Kerberos
for some of the operators and it has support to refresh the
temporary Kerberos tokens via `airflow kerberos` command.

This change adds support for the Kerberos side-car that connects
to the Kerberos Key Distribution Center and retrieves the
token using Keytab that should be deployed as Kubernetes Secret.

It uses shared volume to share the temporary token. The nice
thing about setting it up as a sidecar is that the Keytab
is never shared with the workers - the secret is only mounted
by the sidecar and the workers have only access to the temporary
token.

Depends on #11129
2020-09-28 00:13:36 +02:00
Daniel Imberman a888198c27
Allow overrides for pod_template_file (#11162)
* Allow overrides for pod_template_file

A pod_template_file should be treated as a *template* not a steadfast
rule.

This PR ensures that users can override individual values set by the
pod_template_file s.t. the same file can be used for multiple tasks.

* fix podtemplatetest

* fix name
2020-09-27 23:39:35 +02:00
Jarek Potiuk 0ea3e611d3
Adds Kubernetes Service Account for the webserver (#11131)
Webserver did not have a Kubernetes Service Account defined and
while we do not strictly need to use the service account for
anything now, having the Service Account defined allows to
define various capabilities for the webserver.

For example when you are in the GCP environment, you can map
the Kubernetes service account into a GCP one, using
Workload Identity without the need to define any secrets
and performing additional authentication.
Then you can have that GCP service account get
the permissions to write logs to GCS bucket. Similar mechanisms
exist in AWS and it also opens up on-premises configuration.

See more at
https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity

Co-authored-by: Jacob Ferriero <jferriero@google.com>

Co-authored-by: Jacob Ferriero <jferriero@google.com>
2020-09-27 23:39:14 +02:00
Satyasheel 54353f8745
Increase type coverage for five different providers (#11170)
* Increasing type coverage for five different providers
* Added more strict type
2020-09-27 20:00:27 +02:00
Ephraim Anierobi cb52fb0ae1
Add example DAG and system test for MySQLToGCSOperator (#10990) 2020-09-27 19:05:04 +02:00
Jarek Potiuk 044b441257
Conditional MySQL Client installation (#11174)
This is the second step of making the Production Docker Image more
corporate-environment friendly, by making MySQL client installation
optional. Instaling MySQL Client on Debian requires to reach out
to oracle deb repositories which might not be approved by security
teams when you build the images. Also not everyone needs MySQL
client or might want to install their own MySQL client or MariaDB
client - from their own repositories.

This change makes the installation step separated out to
script (with prod/dev installation option). The prod/dev separation
is needed because MySQL needs to be installed with dev libraries
in the "Build" segment of the image (requiring build essentials
etc.) but in "Final" segment of the image only runtime libraries
are needed.

Part of #11171

Depends on #11173.
2020-09-27 18:56:58 +02:00
mucio 0db7a30782
New Breeze command start-airflow, it replaces the previous flag (#11157) 2020-09-27 18:31:50 +02:00
Kamil Breguła 2d831fbbc5
Update UPDATING.md (#11172) 2020-09-27 18:24:24 +02:00
Jarek Potiuk f16354bc02
Optionally disables PIP cache from GitHub during the build (#11173)
This is first step of implementing the corporate-environment
friendly way of building images, where in the corporate
environment, this might not be possible to install the packages
using the GitHub cache initially.

Part of #11171
2020-09-27 18:00:03 +02:00
Satyasheel 0161b5ea2b
Increasing type coverage for multiple provider (#11159) 2020-09-26 15:40:28 +01:00
Satyasheel 08dfd8cd00
Increase Type coverage for IMAP provider (#11154) 2020-09-25 21:08:26 +01:00
Ash Berlin-Taylor ee90807ace
Massively speed up the query returned by TI.filter_for_tis (#11147)
The previous query generated SQL like this:

```
WHERE (task_id = ? AND dag_id = ? AND execution_date = ?) OR (task_id = ? AND dag_id = ? AND execution_date = ?)
```

Which is fine for one or maybe even 100 TIs, but when testing DAGs at
extreme size (over 21k tasks!) this query was taking for ever (162s on
Postgres, 172s on MySQL 5.7)

By changing this query to this

```
WHERE task_id IN (?,?) AND dag_id = ? AND execution_date = ?
```

the time is reduced to 1s! (1.03s on Postgres, 1.19s on MySQL)

Even on 100 tis the reduction is large, but the overall time is not
significant (0.01451s -> 0.00626s on Postgres).

Times included SQLA query construction time (but not time for calling
filter_for_tis. So a like-for-like comparison), not just DB query time:

```python
ipdb> start_filter_20k = time.monotonic(); result_filter_20k = session.query(TI).filter(tis_filter).all(); end_filter_20k = time.monotonic()
ipdb> end_filter_20k - start_filter_20k
172.30647455298458
ipdb> in_filter = TI.dag_id == self.dag_id, TI.execution_date == self.execution_date, TI.task_id.in_([o.task_id for o in old_states.keys()]);
ipdb> start_20k_custom = time.monotonic(); result_custom_20k = session.query(TI).filter(in_filter).all(); end_20k_custom = time.monotonic()
ipdb> end_20k_custom - start_20k_custom
1.1882996069907676
```

I have also removed the check that was ensuring everything was of the
same type (all TaskInstance or all TaskInstanceKey) as it felt needless
- both types have the three required fields, so the "duck-typing"
approach at runtime (crash if doesn't have the required property)+mypy
checks felt Good Enough.
2020-09-25 20:49:11 +01:00
Kamil Breguła b92c60af8a
Add new member to Polidea (#11153) 2020-09-25 20:31:03 +02:00
Jarek Potiuk c65d46634c
Update to latest version of pbgouncer-exporter (#11150)
There was a problem with Mac version of pgbouncer exporter
created and released previously. This commit releases the
latest version making sure that Linux Go is used to build
the pgbouncer binary.
2020-09-25 18:55:26 +02:00