In the case of OperationalError (caused deadlocks, network blips), the scheduler will now retry those methods 3 times.
closes#11899closes#13668
(cherry picked from commit 914e9ce042)
Fixes the issue wherein regardless of what role anonymous users are assigned (via the `AUTH_ROLE_PUBLIC` env var), they can't see any DAGs.
Current behavior causes:
Anonymous users are handled as a special case by Airflow's DAG-related security methods (`.has_access()` and `.get_accessible_dags()`). Rather than checking the `AUTH_ROLE_PUBLIC` value to check for role permissions, the methods reject access to view or edit any DAGs.
Changes in this PR:
Rather than hardcoding permission rules inside the security methods, this change checks the `AUTH_ROLE_PUBLIC` value and gives anonymous users all permissions linked to the designated role.
**This places security in the hands of the Airflow users. If the value is set to `Admin`, anonymous users will have full admin functionality.**
This also changes how the `Public` role is created. Currently, the `Public` role is created automatically by Flask App Builder. This PR explicitly declares `Public` as a default role with no permissions in `security.py`. This change makes it easier to test.
closes: #13340
(cherry picked from commit 78aa921a71)
From 1.10.x -> 2.0, the required permissions to trigger a dag have changed from DAG.can_edit to DAG.can_read + DagRun.can_create. Since the Viewer role has DAG.can_read by default, it isn't possible to give a Viewer trigger access to a single DAG without giving access to all DAGs.
This fixes that discrepancy by making the trigger requirement DAG.can_edit + DagRun.can_create. Now, to trigger a DAG, a viewer will need to be given both permissions, as neither is with the Viewer role by default.
This PR also hides the Trigger/Refresh buttons on the home page if the user doesn't have permission to perform those actions.
closes: #13891
related: #13891
(cherry picked from commit 629abfdbab)
closes#11899closes#13668
This PR disable row-level locking for MySQL variants that do not support skip_locked and no_wait -- MySQL < 8 and MariaDB
(cherry picked from commit 568327f01a)
closes https://github.com/apache/airflow/issues/13685
When the Scheduler is restarted or killed after creating Dag Run in `Scheduler._create_dag_runs` but
before `Scheduler.self._update_dag_next_dagruns`, the Scheduler falls in a loop because it will not try
to create the Dag Run again in the Scheduler Loop. However, as the DagRun already exists it will fail
with:
```
Traceback (most recent call last):
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "dag_run_dag_id_run_id_key"
DETAIL: Key (dag_id, run_id)=(scenario1_case2_02, scheduled__2021-01-25T00:00:00+00:00) already exists.
```
(cherry picked from commit 594069ee06)
Added a FAQ question to the Upgrading to 2 doc and added an initial
question and answer around needing providers to be installed before
connection types show up in the UI.
(cherry picked from commit 8e0db6eae3)
* pass image_pull_policy to V1Container
image_pull_policy is not being passed into the V1Container in
KubernetesPodOperator. This commit fixes this.
* add test for image_pull_policy not set
image_pull_policy should be IfNotPresent by default if
it's not set. The test ensure the correct value is passed
to the V1Container object.
(cherry picked from commit 7a560ab6de)
* Updated taskflow api doc to show dependency with sensor
Updated the taskflow api tutorial document to show how to setup a
dependency to a python-based decorated task from a classic
FileSensor task.
(cherry picked from commit df11a1d7dc)
In #13923, all permissions were removed from the Public role. This adds a test to ensure that the default public role doesn't have any permissions.
related: #13923
(cherry picked from commit a52e77d0b4)
K8S pod names follows DNS_SUBDOMAIN naming convention, which can be
broken down into one or more DNS_LABEL separated by `.`.
While the max length of pod name (DNS_SUBDOMAIN) is 253, each label
component (DNS_LABEL) of a the name cannot be longer than 63. Pod names
generated by k8s executor right now only contains one label, which means
the total effective name length cannot be greater than 63.
This patch concats uuid to pod_id using `.` to generate the pod anem,
thus extending the max name length to 63 + len(uuid).
Reference: https://github.com/kubernetes/kubernetes/blob/release-1.1/docs/design/identifiers.md
Relevant discussion: https://github.com/kubernetes/kubernetes/issues/79351#issuecomment-505228196
(cherry picked from commit 862443f6d3)
Resolves Issue #10186 (Move Tips & Tricks for Oracle shops should be moved to Airflow Docs)
Fixes broken link, add UI connection documentation link, and connection tips.
(cherry picked from commit 74da0faa7b)
By default sqlalchemy pass query params as is to db dialect drivers for
query execution. This causes inconsistent behavior of query param
evaluation between different db drivers. For example, MySQLdb will
convert `DagRunType.SCHEDULED` to string `'DagRunType.SCHEDULED'`
instead of string `'scheduled'`.
see https://github.com/apache/airflow/pull/11621 for relevant
discussions.
(cherry picked from commit 53e8283871)
closes https://github.com/apache/airflow/issues/13667
The following error happens when Serialized DAGs exist in Webserver or Scheduler but it has just been removed from serialized_dag table,
mainly due to the removal of DAG file.
```
Traceback (most recent call last):
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
self._run_scheduler_loop()
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1516, in _do_scheduling
self._schedule_dag_run(dag_run, active_runs_by_dag_id.get(dag_run.dag_id, set()), session)
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1629, in _schedule_dag_run
dag = dag_run.dag = self.dagbag.get_dag(dag_run.dag_id, session=session)
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/utils/session.py", line 62, in wrapper
return func(*args, **kwargs)
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/models/dagbag.py", line 187, in get_dag
if sd_last_updated_datetime > self.dags_last_fetched[dag_id]
```
A simple fix is to just check if `sd_last_updated_datetime` is not `None` i.e. Serialized DAG for that dag_id is not None
(cherry picked from commit 8958d125cd)
When running `airflow dags unpause` with a DAG that does not exist, it
currently shows this error
```
root@6f086ba87198:/opt/airflow# airflow dags unpause example_bash_operatoredd
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 33, in <module>
sys.exit(load_entry_point('apache-airflow', 'console_scripts', 'airflow')())
File "/opt/airflow/airflow/__main__.py", line 40, in main
args.func(args)
File "/opt/airflow/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/opt/airflow/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/opt/airflow/airflow/cli/commands/dag_command.py", line 160, in dag_unpause
set_is_paused(False, args)
File "/opt/airflow/airflow/cli/commands/dag_command.py", line 170, in set_is_paused
dag.set_is_paused(is_paused=is_paused)
AttributeError: 'NoneType' object has no attribute 'set_is_paused'
```
This commit changes the error to show a helpful error:
```
root@6f086ba87198:/opt/airflow# airflow dags unpause example_bash_operatoredd
DAG: example_bash_operatoredd does not exit in 'dag' table
```
(cherry picked from commit 8723b1feb8)
closes https://github.com/apache/airflow/issues/13504
Currently, the DagFileProcessor parses the DAG files, writes it to the
`dag` table and then writes DAGs to `serialized_dag` table.
At the same time, the scheduler loop is constantly looking for the next
DAGs to process based on ``next_dagrun_create_after`` column of the DAG
table.
It might happen that as soon as the DagFileProcessor writes DAG to `dag`
table, the scheduling loop in the Scheduler picks up the DAG for processing.
However, as the DagFileProcessor has not written to serialized DAG table yet
the scheduler will error with "Serialized Dag not Found" error.
This would mainly happen when the DAGs are dynamic where the result of one DAG,
creates multiple DAGs.
This commit changes the order of writing DAG and Serialized DAG and hence
before a DAG is written to `dag` table it will be written to `serialized_dag` table.
(cherry picked from commit b9eb51a0fb)