Граф коммитов

10044 Коммитов

Автор SHA1 Сообщение Дата
Ash Berlin-Taylor 6694eaa831
Show the location of the queries when the assert_queries_count fails. (#11186)
Example output (I forced one of the existing tests to fail)

```
E   AssertionError: The expected number of db queries is 3. The current number is 2.
E
E   Recorded query locations:
E   	scheduler_job.py:_run_scheduler_loop>scheduler_job.py:_emit_pool_metrics>pool.py:slots_stats:94:	1
E   	scheduler_job.py:_run_scheduler_loop>scheduler_job.py:_emit_pool_metrics>pool.py:slots_stats:101:	1
```

This makes it a bit easier to see what the queries are, without having
to re-run with full query tracing and then analyze the logs.
2020-09-28 19:39:21 +01:00
Tomek Urbaszek e2dc706b08
Make kill log in DagFileProcessorProcess more informative (#11124) 2020-09-28 00:24:58 +02:00
Jarek Potiuk 4d2a787070
Enables Kerberos sidecar support (#11130)
Some of the users of Airflow are using Kerberos to authenticate
their worker workflows. Airflow has a basic support for Kerberos
for some of the operators and it has support to refresh the
temporary Kerberos tokens via `airflow kerberos` command.

This change adds support for the Kerberos side-car that connects
to the Kerberos Key Distribution Center and retrieves the
token using Keytab that should be deployed as Kubernetes Secret.

It uses shared volume to share the temporary token. The nice
thing about setting it up as a sidecar is that the Keytab
is never shared with the workers - the secret is only mounted
by the sidecar and the workers have only access to the temporary
token.

Depends on #11129
2020-09-28 00:13:36 +02:00
Daniel Imberman a888198c27
Allow overrides for pod_template_file (#11162)
* Allow overrides for pod_template_file

A pod_template_file should be treated as a *template* not a steadfast
rule.

This PR ensures that users can override individual values set by the
pod_template_file s.t. the same file can be used for multiple tasks.

* fix podtemplatetest

* fix name
2020-09-27 23:39:35 +02:00
Jarek Potiuk 0ea3e611d3
Adds Kubernetes Service Account for the webserver (#11131)
Webserver did not have a Kubernetes Service Account defined and
while we do not strictly need to use the service account for
anything now, having the Service Account defined allows to
define various capabilities for the webserver.

For example when you are in the GCP environment, you can map
the Kubernetes service account into a GCP one, using
Workload Identity without the need to define any secrets
and performing additional authentication.
Then you can have that GCP service account get
the permissions to write logs to GCS bucket. Similar mechanisms
exist in AWS and it also opens up on-premises configuration.

See more at
https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity

Co-authored-by: Jacob Ferriero <jferriero@google.com>

Co-authored-by: Jacob Ferriero <jferriero@google.com>
2020-09-27 23:39:14 +02:00
Satyasheel 54353f8745
Increase type coverage for five different providers (#11170)
* Increasing type coverage for five different providers
* Added more strict type
2020-09-27 20:00:27 +02:00
Ephraim Anierobi cb52fb0ae1
Add example DAG and system test for MySQLToGCSOperator (#10990) 2020-09-27 19:05:04 +02:00
Jarek Potiuk 044b441257
Conditional MySQL Client installation (#11174)
This is the second step of making the Production Docker Image more
corporate-environment friendly, by making MySQL client installation
optional. Instaling MySQL Client on Debian requires to reach out
to oracle deb repositories which might not be approved by security
teams when you build the images. Also not everyone needs MySQL
client or might want to install their own MySQL client or MariaDB
client - from their own repositories.

This change makes the installation step separated out to
script (with prod/dev installation option). The prod/dev separation
is needed because MySQL needs to be installed with dev libraries
in the "Build" segment of the image (requiring build essentials
etc.) but in "Final" segment of the image only runtime libraries
are needed.

Part of #11171

Depends on #11173.
2020-09-27 18:56:58 +02:00
mucio 0db7a30782
New Breeze command start-airflow, it replaces the previous flag (#11157) 2020-09-27 18:31:50 +02:00
Kamil Breguła 2d831fbbc5
Update UPDATING.md (#11172) 2020-09-27 18:24:24 +02:00
Jarek Potiuk f16354bc02
Optionally disables PIP cache from GitHub during the build (#11173)
This is first step of implementing the corporate-environment
friendly way of building images, where in the corporate
environment, this might not be possible to install the packages
using the GitHub cache initially.

Part of #11171
2020-09-27 18:00:03 +02:00
Satyasheel 0161b5ea2b
Increasing type coverage for multiple provider (#11159) 2020-09-26 15:40:28 +01:00
Satyasheel 08dfd8cd00
Increase Type coverage for IMAP provider (#11154) 2020-09-25 21:08:26 +01:00
Ash Berlin-Taylor ee90807ace
Massively speed up the query returned by TI.filter_for_tis (#11147)
The previous query generated SQL like this:

```
WHERE (task_id = ? AND dag_id = ? AND execution_date = ?) OR (task_id = ? AND dag_id = ? AND execution_date = ?)
```

Which is fine for one or maybe even 100 TIs, but when testing DAGs at
extreme size (over 21k tasks!) this query was taking for ever (162s on
Postgres, 172s on MySQL 5.7)

By changing this query to this

```
WHERE task_id IN (?,?) AND dag_id = ? AND execution_date = ?
```

the time is reduced to 1s! (1.03s on Postgres, 1.19s on MySQL)

Even on 100 tis the reduction is large, but the overall time is not
significant (0.01451s -> 0.00626s on Postgres).

Times included SQLA query construction time (but not time for calling
filter_for_tis. So a like-for-like comparison), not just DB query time:

```python
ipdb> start_filter_20k = time.monotonic(); result_filter_20k = session.query(TI).filter(tis_filter).all(); end_filter_20k = time.monotonic()
ipdb> end_filter_20k - start_filter_20k
172.30647455298458
ipdb> in_filter = TI.dag_id == self.dag_id, TI.execution_date == self.execution_date, TI.task_id.in_([o.task_id for o in old_states.keys()]);
ipdb> start_20k_custom = time.monotonic(); result_custom_20k = session.query(TI).filter(in_filter).all(); end_20k_custom = time.monotonic()
ipdb> end_20k_custom - start_20k_custom
1.1882996069907676
```

I have also removed the check that was ensuring everything was of the
same type (all TaskInstance or all TaskInstanceKey) as it felt needless
- both types have the three required fields, so the "duck-typing"
approach at runtime (crash if doesn't have the required property)+mypy
checks felt Good Enough.
2020-09-25 20:49:11 +01:00
Kamil Breguła b92c60af8a
Add new member to Polidea (#11153) 2020-09-25 20:31:03 +02:00
Jarek Potiuk c65d46634c
Update to latest version of pbgouncer-exporter (#11150)
There was a problem with Mac version of pgbouncer exporter
created and released previously. This commit releases the
latest version making sure that Linux Go is used to build
the pgbouncer binary.
2020-09-25 18:55:26 +02:00
Ruben Laguna 33fe9a52cd
Make sure pgbouncer-exporter docker image is linux/amd64 (#11148)
Closes #11145
2020-09-25 17:26:44 +02:00
Ryan Hamilton edf803374d
Remove link to Dag Model view given the redundancy with DAG Details view (#11082) 2020-09-25 13:57:06 +01:00
Kaxil Naik 99accec29d
Fix incorrect Usage of Optional[str] & Optional[int] (#11141)
From https://docs.python.org/3/library/typing.html#typing.Optional

```
Optional[X] is equivalent to Union[X, None].
```

>Note that this is not the same concept as an optional argument, which is one that has a default. An optional argument with a default does not require the Optional qualifier on its type annotation just because it is optional.

There were incorrect usages where the default was already set to
a string or int value but still Optional was used
2020-09-25 12:25:32 +01:00
Jarek Potiuk ce6b257de7
Fix gitSync user in the helm Chart (#11127)
There was a problem with user in Git Sync mode of the Helm Chart
in connection with the git sync image and official Airflow
image. Since we are using the official image, most of the
containers are run with the "50000" user, but the git-sync image
used by the git sync user is 65533 so we have to set it as
default. We also exposed that value as parameter, so that
another image could be used here as well.
2020-09-25 11:31:45 +01:00
Jarek Potiuk b40df1bf12
Fixes celery deployments for Airflow 2.0 (#11129)
The celery flower and worker commands have changed in Airflow 2.0.
The Helm Chart supported only 1.10 version of those commands and
this PR fixes it by adding both variants of them.
2020-09-25 11:31:28 +01:00
Ruben Laguna 1f0a7857f2
Fix user in helm chart pgbouncer deployment (#11143) 2020-09-25 11:06:30 +01:00
Kaxil Naik f4ec1f6b41
Move Backport Providers docs to our docsite (#11136) 2020-09-25 09:28:47 +02:00
Logan Attwood 37798f0d2a
Do not silently allow the use of undefined variables in jinja2 templates (#11016)
This can have *extremely* bad consequences. After this change, a jinja2
template like the one below will cause the task instance to fail, if the
DAG being executed is not a sub-DAG. This may also display an error on
the Rendered tab of the Task Instance page.

task_instance.xcom_pull('z', key='return_value', dag_id=dag.parent_dag.dag_id)

Prior to the change in this commit, the above template would pull the
latest value for task_id 'z', for the given execution_date, from *any DAG*.
If your task_ids between DAGs are all unique, or if DAGs using the same
task_id always have different execution_date values, this will appear to
act like dag_id=None.

Our current theory is SQLAlchemy/Python doesn't behave as expected when
comparing `jinja2.Undefined` to `None`.
2020-09-25 09:15:28 +02:00
Kaxil Naik 6970584e79
Upgrade to latest isort & pydocstyle (#11142)
isort: from 5.4.2 to 5.5.3
pydocstyle: from 5.0.2 to 5.1.1
2020-09-25 09:13:59 +02:00
Kaxil Naik 7c0541bbf0
Fix error message when checking literalinclude in docs (#11140)
Before:
```
literalinclude directive is is prohibited for example DAGs
```

After:

```
literalinclude directive is prohibited for example DAGs
```
2020-09-25 07:06:44 +02:00
Nadim Younes 68fa29bff0
Added support for encrypted private keys in SSHHook (#11097)
* Added support for encrypted private keys in SSHHook

* Fixed Styling issues and added unit testing

* fixed last pylint styling issue by adding newline to the end of the file

* re-fixed newline issue for pylint checks

* fixed pep8 styling issues and black formatted files to pass static checks

* added comma as per suggestion to fix static check

Co-authored-by: Nadim Younes <nyounes@kobo.com>
2020-09-25 07:02:16 +02:00
Satyasheel 45669bea4f
Increasing type coverage for salesforce provide (#11135) 2020-09-24 23:28:55 +01:00
Kaxil Naik 51052aa4e2
Fix FROM directive in docs/production-deployment.rst (#11139)
`FROM:` -> `FROM`
2020-09-24 23:21:35 +01:00
Kaxil Naik e3f96ce7a8
Fix incorrect Usage of Optional[bool] (#11138)
Optional[bool] = Union[None, bool]

There were incorrect usages where the default was already set to
a boolean value but still Optional was used
2020-09-24 23:18:19 +01:00
Kaxil Naik af3c67775b
README Doc: Link to Airflow directory in ASF Directory (#11137)
`https://downloads.apache.org` -> `https://downloads.apache.org/airflow` (links to precise dir)
2020-09-24 22:28:00 +01:00
Jarek Potiuk 620b0989b8
Add Helm Chart linting (#11108) 2020-09-24 13:02:11 +02:00
Jarek Potiuk e252a6064f
Adds timeout in CI/PROD waiting jobs (#11117)
In very rare cases, the waiting job might not be cancelled when
the "Build Image" job fails or gets cancelled on its own.

In the "Build Image" workflow we have this step:

- name: "Canceling the CI Build source workflow in case of failure!"
  if: cancelled() || failure()
  uses: potiuk/cancel-workflow-runs@v2
  with:
    token: ${{ secrets.GITHUB_TOKEN }}
    cancelMode: self
    sourceRunId: ${{ github.event.workflow_run.id }}

But when this step fails or gets cancelled on its own before
cancel is triggered, the "wait for image" steps could
run for up to 6 hours.

This change sets 50 minutes timeout for those jobs.

Fixes #11114
2020-09-24 10:46:43 +02:00
Satyasheel bcdd3bb7bb
Increasing type coverage FTP (#11107) 2020-09-24 00:13:58 +02:00
jmfreeman b83507acb8
Update initialize-database.rst (#11109)
* Update initialize-database.rst

Remove ambiguity in the language as only MySQL, Postgres and SQLite are supported backends.

* Update docs/howto/initialize-database.rst

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>

Co-authored-by: Xiaodong DENG <xd.deng.r@gmail.com>
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2020-09-23 23:15:36 +02:00
Kaxil Naik 7644c37082
Revert "Introducing flags to skip example dags and default connections (#11099)" (#11110)
This reverts commit 0edc3dd579.
2020-09-23 19:47:43 +01:00
Joe Harris 04b8adf69d
Add Opensignal to INTHEWILD.md (#11105) 2020-09-23 18:52:00 +02:00
Xiaodong DENG 8a34719346
Fix typo in README (#11106) 2020-09-23 17:39:34 +01:00
Kaxil Naik ccfbc319dd
Fix sort-in-the-wild pre-commit on Mac (#11103) 2020-09-23 15:10:15 +01:00
Tomek Urbaszek daf8f31080
Add template fields renderers for better UI rendering (#11061)
This PR adds possibility to define template_fields_renderers for an
operator. In this way users will be able to provide information
what lexer should be used for rendering a particular field. This is
super useful for custom operator and gives more flexibility than
predefined keywords.

Co-authored-by: Kamil Olszewski <34898234+olchas@users.noreply.github.com>
Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>
2020-09-23 15:31:40 +02:00
mucio 0edc3dd579
Introducing flags to skip example dags and default connections (#11099) 2020-09-23 14:56:29 +02:00
Kaxil Naik 51181e885e
Security upgrade lodash from 4.17.19 to 4.17.20 (#11095)
Details: https://snyk.io/vuln/SNYK-JS-LODASH-590103
2020-09-23 10:06:28 +01:00
Xiaodong DENG a0374a5f95
Fix for pydocstyle D202 (#11096)
'issues' introduced in https://github.com/apache/airflow/pull/10594
2020-09-22 23:48:53 +02:00
Xiaodong DENG 35c43987e5
Avoid redundant SET conversion (#11091)
* Avoid redundant SET conversion

get_accessible_dag_ids() returns a SET, so no need to apply set() again

* Add type annotation for get_accessible_dag_ids()
2020-09-22 23:44:49 +02:00
Kaxil Naik 1a149827a2
Fix typo in STATIC_CODE_CHECKS.rst (#11094)
`realtive` -> `relative`
2020-09-22 21:36:38 +01:00
yuqian90 423a382678
SkipMixin: Add missing session.commit() and test (#10421) 2020-09-22 21:08:12 +01:00
yuqian90 e59ad5b2c6
Make Skipmixin handle empty branch properly (#10751)
closes: #10725

Make sure SkipMixin.skip_all_except() handles empty branches like this properly. When "task1" is followed, "join" must not be skipped even though it is considered to be immediately downstream of "branch".
2020-09-22 20:48:26 +01:00
James Timmins fbd994a4cf
Add permissions for stable API (#10594)
Related Github Issue: https://github.com/apache/airflow/issues/8112
2020-09-22 17:23:59 +01:00
Patrick Cando f3e87c5030
Add D202 pydocstyle check (#11032) 2020-09-22 16:17:24 +01:00
Jarek Potiuk 52fdb62314
Requirements might get upgraded without setup.py change (#10784)
I noticed that when there is no setup.py changes, the constraints
are not upgraded automatically. This is because of the docker
caching strategy used - it simply does not even know that the
upgrade of pip should happen.

I believe this is really good (from security and incremental updates
POV to attempt to upgrade at every successfull merge (not that
the upgrade will not be committed if any of the tests fail and this
is only happening on every merge to master or scheduled run.

This way we will have more often but smaller constraint changes.

Depends on #10828
2020-09-22 16:22:53 +02:00