There is a bug caused by scheduler_jobs refactor which leads to task failure
and scheduler locking.
Essentially when a there is an overflow of tasks going into the scheduler, the
tasks are set back to scheduled, but are not removed from the executor's
queued_tasks queue.
This means that the executor will attempt to run tasks that are in the scheduled
state, but those tasks will fail dependency checks. Eventually the queue is
filled with scheduled tasks, and the scheduler can no longer run.
Co-Authored-By: Kaxil Naik <kaxilnaik@gmail.com>, Kevin Yang <kevin.yang@airbnb.com>
There are cyclic imports detected seemingly randomly by pylint checks when some
of the PRs are run in CI
It was not deterministic because pylint usually uses as many processors as
many are available and it splits the list of .py files between the separate
pylint processors - depending on how the split is done, pylint check might
or might not detect it. The cycle is always detected when all files are used.
In order to make it more deterministic, all pylint and mypy errors were resolved
in all executors package and in dag_processor.
At the same time plugins_manager had also been moved out of the executors
and all of the operators/hooks/sensors/macros because it was also causing
cyclic dependencies and it's far easier to untangle those dependencies
in executor when we move the intialisation of all plugins to plugins_manager.
Additionally require_serial is set in pre-commit configuration to
make sure cycle detection is deterministic.
* [AIRFLOW-YYY] Lazy load API Client
* [AIRFLOW-YYY] Introduce order in CLI's function names
* [AIRFLOW-YYY] Create cli package
* [AIRLFOW-YYY] Move user and roles command to seperate files
* [AIRLFOW-YYY] Move sync_perm command to seperate file
* [AIRLFOW-YYY] Move task commands to separate file
* [AIRLFOW-YYY] Move pool commands to separate file
* [AIRLFOW-YYY] Move variable commands to separate file
* [AIRLFOW-YYY] Move db commands to separate file
* fixup! [AIRLFOW-YYY] Move variable commands to separate file
* [AIRLFOW-YYY] Move connection commands to separate file
* [AIRLFOW-YYY] Move version command to separate file
* [AIRLFOW-YYY] Move scheduler command to separate file
* [AIRLFOW-YYY] Move worker command to separate file
* [AIRLFOW-YYY] Move webserver command to separate file
* [AIRLFOW-YYY] Move dag commands to separate file
* [AIRLFOW-YYY] Move serve logs command to separate file
* [AIRLFOW-YYY] Move flower command to separate file
* [AIRLFOW-YYY] Move kerberos command to separate file
* [AIRFLOW-YYY] Lazy load CLI commands
* [AIRFLOW-YYY] Fix migration
* fixup! [AIRFLOW-YYY] Fix migration
* fixup! fixup! [AIRFLOW-YYY] Fix migration
Currently when a task in the DAG missed the SLA,
Airflow would traverse through all the tasks in the DAG
and collect all the task-level emails. Then Airflow would
send an SLA miss email to all those collected emails,
which can add unnecessary noise to task owners that
does not contribute to the SLA miss.
Thus, changing the code to only collect emails
from the tasks that missed the SLA.
If a LocalTaskJob fails to heartbeat for
scheduler_zombie_task_threshold, it should shut itself down.
However, at some point, a change was made to catch exceptions inside the
heartbeat, so the LocalTaskJob thought it had managed to heartbeat
successfully.
This effectively means that zombie tasks don't shut themselves down.
When the scheduler reschedules the job, this means we could have two
instances of the task running concurrently.
1. Issue old conf method deprecation warnings properly and remove current old conf method usages.
2. Unify the way to use conf as `from airflow.configuration import conf`
Using mock.assert_call_with method can result in flaky tests
(ex. iterating through dict in python 3.5 which does not
store order of elements). That's why it's better to
use assert_called_once_with or has_calls methods.
Change SubDagOperator to use Airflow scheduler to schedule
tasks in subdags instead of backfill.
In the past, SubDagOperator relies on backfill scheduler
to schedule tasks in the subdags. Tasks in parent DAG
are scheduled via Airflow scheduler while tasks in
a subdag are scheduled via backfill, which complicates
the scheduling logic and adds difficulties to maintain
the two scheduling code path.
This PR simplifies how tasks in subdags are scheduled.
SubDagOperator is reponsible for creating a DagRun for subdag
and wait until all the tasks in the subdag finish. Airflow
scheduler picks up the DagRun created by SubDagOperator,
create andschedule the tasks accordingly.
Full matching required in this case, so the regex should start and end
with "^$". Blurred matching might result in irrelevant task instances be
cleared.
Also in this commit:
* Added independent test dag: `clear_subdag_test_dag`
* Polished related unit test: `test_subdag_clear_parentdag_downstream_clear`
`non_pooled_task_slot_count` and `non_pooled_backfill_task_slot_count`
are removed in favor of a real pool, e.g. `default_pool`.
By default tasks are running in `default_pool`.
`default_pool` is initialized with 128 slots and user can change the
number of slots through UI/CLI. `default_pool` cannot be removed.
Now that the webserver is more stateless, if the scheduler is not
running the list of dags won't populate, making it harder for new
starters to work out what is going on.
New dep is BSD-2 which is Cat-A under ASF