The push and schedule builds should not be cancelled even if
they are duplicates. By seing which of the master merges
failed, we have better visibility on which merge caused
a problem and we can trace it's origin faster even if the builds
will take longer overall.
Scheduled builds also serve it's purpose and they should
be always run to completion.
* Constraints and PIP packages can be installed from local sources
This is the final part of implementing #11171 based on feedback
from enterprise customers we worked with. They want to have
a capability of building the image using binary wheel packages
that are locally available and the official Dockerfile. This means
that besides the official APT sources the Dockerfile build should
not needd GitHub, nor any other external files pulled from outside
including PIP repository.
This change also includes documentation on how to prepare set of
such binaries ready for inspection and review by security teams
in Enterprise environment. Such sets of "known-working-binary-whl"
files can then be separately committed, tracked and scrutinized
in an artifact repository of such an Enterprise.
Fixes: #11171
* Update docs/production-deployment.rst
* KubernetesPodOperator can retry log tailing in case of interruption
* fix failing test
* change read_pod_logs method formatting
* KubernetesPodOperator retry log tailing based on last read log timestamp
* fix test_parse_log_line test formatting
* add docstring to parse_log_line method
* fix kubernetes integration test
The custom ClusterPolicyViolation has been added in #10282
This one adds more comprehensive test to it.
Co-authored-by: Jacob Ferriero <jferriero@google.com>
* Fully support running more than one scheduler concurrently.
This PR implements scheduler HA as proposed in AIP-15. The high level
design is as follows:
- Move all scheduling decisions into SchedulerJob (requiring DAG
serialization in the scheduler)
- Use row-level locks to ensure schedulers don't stomp on each other
(`SELECT ... FOR UPDATE`)
- Use `SKIP LOCKED` for better performance when multiple schedulers are
running. (Mysql < 8 and MariaDB don't support this)
- Scheduling decisions are not tied to the parsing speed, but can
operate just on the database
*DagFileProcessorProcess*:
Previously this component was responsible for more than just parsing the
DAG files as it's name might imply. It also was responsible for creating
DagRuns, and also making scheduling decisions of TIs, sending them from
"None" to "scheduled" state.
This commit changes it so that the DagFileProcessorProcess now will
update the SerializedDAG row for this DAG, and make no scheduling
decisions itself.
To make the scheduler's job easier (so that it can make as many
decisions as possible without having to load the possibly-large
SerializedDAG row) we store/update some columns on the DagModel table:
- `next_dagrun`: The execution_date of the next dag run that should be created (or
None)
- `next_dagrun_create_after`: The earliest point at which the next dag
run can be created
Pre-computing these values (and updating them every time the DAG is
parsed) reduce the overall load on the DB as many decisions can be taken
by selecting just these two columns/the small DagModel row.
In case of max_active_runs, or `@once` these columns will be set to
null, meaning "don't create any dag runs"
*SchedulerJob*
The SchedulerJob used to only queue/send tasks to the executor after
they were parsed, and returned from the DagFileProcessorProcess.
This PR breaks the link between parsing and enqueuing of tasks, instead
of looking at DAGs as they are parsed, we now:
- store a new datetime column, `last_scheduling_decision` on DagRun
table, signifying when a scheduler last examined a DagRun
- Each time around the loop the scheduler will get (and lock) the next
_n_ DagRuns via `DagRun.next_dagruns_to_examine`, prioritising DagRuns
which haven't been touched by a scheduler in the longest period
- SimpleTaskInstance etc have been almost entirely removed now, as we
use the serialized versions
* Move callbacks execution from Scheduler loop to DagProcessorProcess
* Don’t run verify_integrity if the Serialized DAG hasn’t changed
dag_run.verify_integrity is slow, and we don't want to call it every time, just when the dag structure changes (which we can know now thanks to DAG Serialization)
* Add escape hatch to disable newly added "SELECT ... FOR UPDATE" queries
We are worried that these extra uses of row-level locking will cause
problems on MySQL 5.x (most likely deadlocks) so we are providing users
an "escape hatch" to be able to make these queries non-locking -- this
means that only a singe scheduler should be run, but being able to run
one is better than having the scheduler crash.
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
This PR needs to be merged first in order to handle the #11385
which requires .pypirc to be created before dockerfile gets build.
This means that the script change needs to be merged to master
first in this PR.
We can now add annotations to the service accounts in a generic
way. This allows for example to add Workflow Identitty in GKE
environment but it is not limited to it.
Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com>
Co-authored-by: Jacob Ferriero <jferriero@google.com>
Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com>
This is similar to #11327, but for Celery this time.
The impact is not quite as pronounced here (for simple dags at least)
but takes the average queued to start delay from 1.5s to 0.4s
Currently, upgrading dependencies in setup.py still runs with previous versions of the package for the PR which fails.
This will change to upgrade only the package that is required for the PRs
Spawning a whole new python process and then re-loading all of Airflow
is expensive. All though this time fades to insignificance for long
running tasks, this delay gives a "bad" experience for new users when
they are just trying out Airflow for the first time.
For the LocalExecutor this cuts the "queued time" down from 1.5s to 0.1s
on average.
If this flag is specified it will look for wheel packages placed in dist
folder and it will install the wheels from there after installing
Airflow. This is useful for testing backport packages as well as in the
future for testing provider packages for 2.0.
I decided to move it to CONTRIBUTING.rst as is it is an important
documentation on what policies we have agreed to as community and
also it is a great resource for the contributor to learn what are
the committer's responsibilities.
Fixes: #10179
As part of #11195 we re-styled the UI, changing a lot of the default
colours to make them look more modern. However for anyone upgrading and
keeping their airflow.cfg from 1.10 to 2.0 they would end up with things
looking a bit ugly, as the old navbar color would be kept.
This uses the existing config value upgrade feature to automatically
change the old default colour in to the new default colour.
We started to get more often "unknown blob" kind of errors when
pushing the images to GitHub Registry. While this is clearly a
GitHub issue, it's frequency of occurence and unclear message
make it a good candidate to write additional message with
instructions to the users, especially that now they have
an easy way to get to that information via status checks and
links leading to the log file, when this problem happens during
image building process.
This way users will know that they should simply rebase or
amend/force-push their change to fix it.
When installing airflow 1.10 via breeze we now enable rbac
by default, but we can disable it with --no-rbac-ui flag.
This is useful to test different variants of 1.10 when testing
release candidataes in connection with the 'start-airflow'
command.