The change #10445 caused empty descriptions for all packages.
This change restores it and also makes sure package creation works
when there is no README.md
Resolves#10953.
A refreshed UI for the 2.0 release. The existing "theming" is a bit long in the tooth and this PR attempts to give it a modern look and some freshness to compliment all of the new features under the hood.
The majority of the changes to UI have been done through updates to the Bootstrap theme contained in bootstrap-theme.css. These are simply overrides to the default stylings that are packaged with Bootstrap.
This PR allows for partial import error tracebacks to be exposed on the UI, if enabled. This extra context can be very helpful for users without access to the parsing logs to determine why their DAGs are failing to import properly.
* Fixes an issue where cycle detection uses recursion
and stack overflows after about 1000 tasks
(cherry picked from commit 63f1a180a17729aa937af642cfbf4ddfeccd1b9f)
* reduce test length
* slightly more efficient
* Update airflow/utils/dag_cycle_tester.py
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
* slightly more efficient
* actually works this time
Co-authored-by: Daniel Imberman <daniel@astronomer.io>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
* Allows more customizations for image building.
This is the third (and not last) part of making the Production
image more corporate-environment friendly. It's been prepared
for the request of one of the big Airflow user (company) that
has rather strict security requirements when it comes to
preparing and building images. They are committed to
synchronizing with the progress of Apache Airflow 2.0 development
and making the image customizable so that they can build it using
only sources controlled by them internally was one of the important
requirements for them.
This change adds the possibilty of customizing various steps in
the build process:
* adding custom scripts to be run before installation of both
build image and runtime image. This allows for example to
add installing custom GPG keys, and adding custom sources.
* customizing the way NodeJS and Yarn are installed in the
build image segment - as they might rely on their own way
of installation.
* adding extra packages to be installed during both build and
dev segment build steps. This is crucial to achieve the same
size optimizations as the original image.
* defining additional environment variables (for example
environment variables that indicate acceptance of the EULAs
in case of installing proprietary packages that require
EULA acceptance - both in the build image and runtime image
(again the goal is to keep the image optimized for size)
The image build process remains the same when no customization
options are specified, but having those options increases
flexibility of the image build process in corporate environments.
This is part of #11171.
This change also fixes some of the issues opened and raised by
other users of the Dockerfile.
Fixes: #10730Fixes: #10555Fixes: #10856
Input from those issues has been taken into account when this
change was designed so that the cases described in those issues
could be implemented. Example from one of the issue landed as
an example way of building highly customized Airflow Image
using those customization options.
Depends on #11174
* Update IMAGES.rst
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
Breeze tags the image based on the default python version,
branch, type of the image, but you might want to tag the image
in the same command especially in automated cases of building
the image via CI scripts or security teams that tag the imge
based on external factors (build time, person etc.).
This is part of #11171 which makes the image easier to build in
corporate environments.
The inbuilt functions `any()` support short-circuiting (evaluation stops as soon as the overall return value of the function is known), but this behavior is lost if you use comprehension. This affects performance.
Example output (I forced one of the existing tests to fail)
```
E AssertionError: The expected number of db queries is 3. The current number is 2.
E
E Recorded query locations:
E scheduler_job.py:_run_scheduler_loop>scheduler_job.py:_emit_pool_metrics>pool.py:slots_stats:94: 1
E scheduler_job.py:_run_scheduler_loop>scheduler_job.py:_emit_pool_metrics>pool.py:slots_stats:101: 1
```
This makes it a bit easier to see what the queries are, without having
to re-run with full query tracing and then analyze the logs.
Some of the users of Airflow are using Kerberos to authenticate
their worker workflows. Airflow has a basic support for Kerberos
for some of the operators and it has support to refresh the
temporary Kerberos tokens via `airflow kerberos` command.
This change adds support for the Kerberos side-car that connects
to the Kerberos Key Distribution Center and retrieves the
token using Keytab that should be deployed as Kubernetes Secret.
It uses shared volume to share the temporary token. The nice
thing about setting it up as a sidecar is that the Keytab
is never shared with the workers - the secret is only mounted
by the sidecar and the workers have only access to the temporary
token.
Depends on #11129
* Allow overrides for pod_template_file
A pod_template_file should be treated as a *template* not a steadfast
rule.
This PR ensures that users can override individual values set by the
pod_template_file s.t. the same file can be used for multiple tasks.
* fix podtemplatetest
* fix name
Webserver did not have a Kubernetes Service Account defined and
while we do not strictly need to use the service account for
anything now, having the Service Account defined allows to
define various capabilities for the webserver.
For example when you are in the GCP environment, you can map
the Kubernetes service account into a GCP one, using
Workload Identity without the need to define any secrets
and performing additional authentication.
Then you can have that GCP service account get
the permissions to write logs to GCS bucket. Similar mechanisms
exist in AWS and it also opens up on-premises configuration.
See more at
https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity
Co-authored-by: Jacob Ferriero <jferriero@google.com>
Co-authored-by: Jacob Ferriero <jferriero@google.com>
This is the second step of making the Production Docker Image more
corporate-environment friendly, by making MySQL client installation
optional. Instaling MySQL Client on Debian requires to reach out
to oracle deb repositories which might not be approved by security
teams when you build the images. Also not everyone needs MySQL
client or might want to install their own MySQL client or MariaDB
client - from their own repositories.
This change makes the installation step separated out to
script (with prod/dev installation option). The prod/dev separation
is needed because MySQL needs to be installed with dev libraries
in the "Build" segment of the image (requiring build essentials
etc.) but in "Final" segment of the image only runtime libraries
are needed.
Part of #11171
Depends on #11173.
This is first step of implementing the corporate-environment
friendly way of building images, where in the corporate
environment, this might not be possible to install the packages
using the GitHub cache initially.
Part of #11171
The previous query generated SQL like this:
```
WHERE (task_id = ? AND dag_id = ? AND execution_date = ?) OR (task_id = ? AND dag_id = ? AND execution_date = ?)
```
Which is fine for one or maybe even 100 TIs, but when testing DAGs at
extreme size (over 21k tasks!) this query was taking for ever (162s on
Postgres, 172s on MySQL 5.7)
By changing this query to this
```
WHERE task_id IN (?,?) AND dag_id = ? AND execution_date = ?
```
the time is reduced to 1s! (1.03s on Postgres, 1.19s on MySQL)
Even on 100 tis the reduction is large, but the overall time is not
significant (0.01451s -> 0.00626s on Postgres).
Times included SQLA query construction time (but not time for calling
filter_for_tis. So a like-for-like comparison), not just DB query time:
```python
ipdb> start_filter_20k = time.monotonic(); result_filter_20k = session.query(TI).filter(tis_filter).all(); end_filter_20k = time.monotonic()
ipdb> end_filter_20k - start_filter_20k
172.30647455298458
ipdb> in_filter = TI.dag_id == self.dag_id, TI.execution_date == self.execution_date, TI.task_id.in_([o.task_id for o in old_states.keys()]);
ipdb> start_20k_custom = time.monotonic(); result_custom_20k = session.query(TI).filter(in_filter).all(); end_20k_custom = time.monotonic()
ipdb> end_20k_custom - start_20k_custom
1.1882996069907676
```
I have also removed the check that was ensuring everything was of the
same type (all TaskInstance or all TaskInstanceKey) as it felt needless
- both types have the three required fields, so the "duck-typing"
approach at runtime (crash if doesn't have the required property)+mypy
checks felt Good Enough.
There was a problem with Mac version of pgbouncer exporter
created and released previously. This commit releases the
latest version making sure that Linux Go is used to build
the pgbouncer binary.