Example output (I forced one of the existing tests to fail)
```
E AssertionError: The expected number of db queries is 3. The current number is 2.
E
E Recorded query locations:
E scheduler_job.py:_run_scheduler_loop>scheduler_job.py:_emit_pool_metrics>pool.py:slots_stats:94: 1
E scheduler_job.py:_run_scheduler_loop>scheduler_job.py:_emit_pool_metrics>pool.py:slots_stats:101: 1
```
This makes it a bit easier to see what the queries are, without having
to re-run with full query tracing and then analyze the logs.
Some of the users of Airflow are using Kerberos to authenticate
their worker workflows. Airflow has a basic support for Kerberos
for some of the operators and it has support to refresh the
temporary Kerberos tokens via `airflow kerberos` command.
This change adds support for the Kerberos side-car that connects
to the Kerberos Key Distribution Center and retrieves the
token using Keytab that should be deployed as Kubernetes Secret.
It uses shared volume to share the temporary token. The nice
thing about setting it up as a sidecar is that the Keytab
is never shared with the workers - the secret is only mounted
by the sidecar and the workers have only access to the temporary
token.
Depends on #11129
* Allow overrides for pod_template_file
A pod_template_file should be treated as a *template* not a steadfast
rule.
This PR ensures that users can override individual values set by the
pod_template_file s.t. the same file can be used for multiple tasks.
* fix podtemplatetest
* fix name
Webserver did not have a Kubernetes Service Account defined and
while we do not strictly need to use the service account for
anything now, having the Service Account defined allows to
define various capabilities for the webserver.
For example when you are in the GCP environment, you can map
the Kubernetes service account into a GCP one, using
Workload Identity without the need to define any secrets
and performing additional authentication.
Then you can have that GCP service account get
the permissions to write logs to GCS bucket. Similar mechanisms
exist in AWS and it also opens up on-premises configuration.
See more at
https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity
Co-authored-by: Jacob Ferriero <jferriero@google.com>
Co-authored-by: Jacob Ferriero <jferriero@google.com>
This is the second step of making the Production Docker Image more
corporate-environment friendly, by making MySQL client installation
optional. Instaling MySQL Client on Debian requires to reach out
to oracle deb repositories which might not be approved by security
teams when you build the images. Also not everyone needs MySQL
client or might want to install their own MySQL client or MariaDB
client - from their own repositories.
This change makes the installation step separated out to
script (with prod/dev installation option). The prod/dev separation
is needed because MySQL needs to be installed with dev libraries
in the "Build" segment of the image (requiring build essentials
etc.) but in "Final" segment of the image only runtime libraries
are needed.
Part of #11171
Depends on #11173.
This is first step of implementing the corporate-environment
friendly way of building images, where in the corporate
environment, this might not be possible to install the packages
using the GitHub cache initially.
Part of #11171
The previous query generated SQL like this:
```
WHERE (task_id = ? AND dag_id = ? AND execution_date = ?) OR (task_id = ? AND dag_id = ? AND execution_date = ?)
```
Which is fine for one or maybe even 100 TIs, but when testing DAGs at
extreme size (over 21k tasks!) this query was taking for ever (162s on
Postgres, 172s on MySQL 5.7)
By changing this query to this
```
WHERE task_id IN (?,?) AND dag_id = ? AND execution_date = ?
```
the time is reduced to 1s! (1.03s on Postgres, 1.19s on MySQL)
Even on 100 tis the reduction is large, but the overall time is not
significant (0.01451s -> 0.00626s on Postgres).
Times included SQLA query construction time (but not time for calling
filter_for_tis. So a like-for-like comparison), not just DB query time:
```python
ipdb> start_filter_20k = time.monotonic(); result_filter_20k = session.query(TI).filter(tis_filter).all(); end_filter_20k = time.monotonic()
ipdb> end_filter_20k - start_filter_20k
172.30647455298458
ipdb> in_filter = TI.dag_id == self.dag_id, TI.execution_date == self.execution_date, TI.task_id.in_([o.task_id for o in old_states.keys()]);
ipdb> start_20k_custom = time.monotonic(); result_custom_20k = session.query(TI).filter(in_filter).all(); end_20k_custom = time.monotonic()
ipdb> end_20k_custom - start_20k_custom
1.1882996069907676
```
I have also removed the check that was ensuring everything was of the
same type (all TaskInstance or all TaskInstanceKey) as it felt needless
- both types have the three required fields, so the "duck-typing"
approach at runtime (crash if doesn't have the required property)+mypy
checks felt Good Enough.
There was a problem with Mac version of pgbouncer exporter
created and released previously. This commit releases the
latest version making sure that Linux Go is used to build
the pgbouncer binary.
From https://docs.python.org/3/library/typing.html#typing.Optional
```
Optional[X] is equivalent to Union[X, None].
```
>Note that this is not the same concept as an optional argument, which is one that has a default. An optional argument with a default does not require the Optional qualifier on its type annotation just because it is optional.
There were incorrect usages where the default was already set to
a string or int value but still Optional was used
There was a problem with user in Git Sync mode of the Helm Chart
in connection with the git sync image and official Airflow
image. Since we are using the official image, most of the
containers are run with the "50000" user, but the git-sync image
used by the git sync user is 65533 so we have to set it as
default. We also exposed that value as parameter, so that
another image could be used here as well.
The celery flower and worker commands have changed in Airflow 2.0.
The Helm Chart supported only 1.10 version of those commands and
this PR fixes it by adding both variants of them.
This can have *extremely* bad consequences. After this change, a jinja2
template like the one below will cause the task instance to fail, if the
DAG being executed is not a sub-DAG. This may also display an error on
the Rendered tab of the Task Instance page.
task_instance.xcom_pull('z', key='return_value', dag_id=dag.parent_dag.dag_id)
Prior to the change in this commit, the above template would pull the
latest value for task_id 'z', for the given execution_date, from *any DAG*.
If your task_ids between DAGs are all unique, or if DAGs using the same
task_id always have different execution_date values, this will appear to
act like dag_id=None.
Our current theory is SQLAlchemy/Python doesn't behave as expected when
comparing `jinja2.Undefined` to `None`.
* Added support for encrypted private keys in SSHHook
* Fixed Styling issues and added unit testing
* fixed last pylint styling issue by adding newline to the end of the file
* re-fixed newline issue for pylint checks
* fixed pep8 styling issues and black formatted files to pass static checks
* added comma as per suggestion to fix static check
Co-authored-by: Nadim Younes <nyounes@kobo.com>
In very rare cases, the waiting job might not be cancelled when
the "Build Image" job fails or gets cancelled on its own.
In the "Build Image" workflow we have this step:
- name: "Canceling the CI Build source workflow in case of failure!"
if: cancelled() || failure()
uses: potiuk/cancel-workflow-runs@v2
with:
token: ${{ secrets.GITHUB_TOKEN }}
cancelMode: self
sourceRunId: ${{ github.event.workflow_run.id }}
But when this step fails or gets cancelled on its own before
cancel is triggered, the "wait for image" steps could
run for up to 6 hours.
This change sets 50 minutes timeout for those jobs.
Fixes#11114
* Update initialize-database.rst
Remove ambiguity in the language as only MySQL, Postgres and SQLite are supported backends.
* Update docs/howto/initialize-database.rst
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Co-authored-by: Xiaodong DENG <xd.deng.r@gmail.com>
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
This PR adds possibility to define template_fields_renderers for an
operator. In this way users will be able to provide information
what lexer should be used for rendering a particular field. This is
super useful for custom operator and gives more flexibility than
predefined keywords.
Co-authored-by: Kamil Olszewski <34898234+olchas@users.noreply.github.com>
Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>
* Avoid redundant SET conversion
get_accessible_dag_ids() returns a SET, so no need to apply set() again
* Add type annotation for get_accessible_dag_ids()
closes: #10725
Make sure SkipMixin.skip_all_except() handles empty branches like this properly. When "task1" is followed, "join" must not be skipped even though it is considered to be immediately downstream of "branch".
I noticed that when there is no setup.py changes, the constraints
are not upgraded automatically. This is because of the docker
caching strategy used - it simply does not even know that the
upgrade of pip should happen.
I believe this is really good (from security and incremental updates
POV to attempt to upgrade at every successfull merge (not that
the upgrade will not be committed if any of the tests fail and this
is only happening on every merge to master or scheduled run.
This way we will have more often but smaller constraint changes.
Depends on #10828