Граф коммитов

11446 Коммитов

Автор SHA1 Сообщение Дата
Kaxil Naik 3fbbe3ee7f Retry critical methods in Scheduler loop in case of OperationalError (#14032)
In the case of OperationalError (caused deadlocks, network blips), the scheduler will now retry those methods 3 times.

closes #11899
closes #13668

(cherry picked from commit 914e9ce042)
2021-02-04 14:34:41 +00:00
James Timmins 912b903fe6 Make the role assigned to anonymous users customizable (#14042)
Fixes the issue wherein regardless of what role anonymous users are assigned (via the `AUTH_ROLE_PUBLIC` env var), they can't see any DAGs.

Current behavior causes:
Anonymous users are handled as a special case by Airflow's DAG-related security methods (`.has_access()` and `.get_accessible_dags()`). Rather than checking the `AUTH_ROLE_PUBLIC` value to check for role permissions, the methods reject access to view or edit any DAGs.

Changes in this PR:
Rather than hardcoding permission rules inside the security methods, this change checks the `AUTH_ROLE_PUBLIC` value and gives anonymous users all permissions linked to the designated role.

**This places security in the hands of the Airflow users. If the value is set to `Admin`, anonymous users will have full admin functionality.**

This also changes how the `Public` role is created. Currently, the `Public` role is created automatically by Flask App Builder. This PR explicitly declares `Public` as a default role with no permissions in `security.py`. This change makes it easier to test.

closes: #13340
(cherry picked from commit 78aa921a71)
2021-02-04 14:34:41 +00:00
Ryan Hamilton d72e2de3e0 Utilize util method to yield versioned doc link (#14047)
(cherry picked from commit 570d322b7f)
2021-02-04 14:34:41 +00:00
James Timmins 6c690d2817 Bugfix: Fix permissions to triggering only specific DAGs (#13922)
From 1.10.x -> 2.0, the required permissions to trigger a dag have changed from DAG.can_edit to DAG.can_read + DagRun.can_create. Since the Viewer role has DAG.can_read by default, it isn't possible to give a Viewer trigger access to a single DAG without giving access to all DAGs.

This fixes that discrepancy by making the trigger requirement DAG.can_edit + DagRun.can_create. Now, to trigger a DAG, a viewer will need to be given both permissions, as neither is with the Viewer role by default.

This PR also hides the Trigger/Refresh buttons on the home page if the user doesn't have permission to perform those actions.

closes: #13891
related: #13891
(cherry picked from commit 629abfdbab)
2021-02-04 14:34:41 +00:00
Kaxil Naik 3870392356 Disable row level locking for Mariadb and MySQL <8 (#14031)
closes #11899
closes #13668

This PR disable row-level locking for MySQL variants that do not support skip_locked and no_wait -- MySQL < 8 and MariaDB

(cherry picked from commit 568327f01a)
2021-02-04 14:34:41 +00:00
Aaron D. Gonzalez c87bf1f204 Fix typos in Upgrade Check Doc (#14035)
(cherry picked from commit 019389d034)
2021-02-04 14:34:41 +00:00
Ash Berlin-Taylor 2d647caddc Make v1/config endpoint respect webserver expose_config setting (#14020)
(cherry picked from commit 0fdc03b764)
2021-02-04 14:34:41 +00:00
Kaxil Naik ccfeb3677f Bugfix: Don't try to create a duplicate Dag Run in Scheduler (#13920)
closes https://github.com/apache/airflow/issues/13685

When the Scheduler is restarted or killed after creating Dag Run in `Scheduler._create_dag_runs` but
before `Scheduler.self._update_dag_next_dagruns`, the Scheduler falls in a loop because it will not try
to create the Dag Run again in the Scheduler Loop. However, as the DagRun already exists it will fail
with:

```
Traceback (most recent call last):
  File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
    cursor, statement, parameters, context
  File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "dag_run_dag_id_run_id_key"
DETAIL:  Key (dag_id, run_id)=(scenario1_case2_02, scheduled__2021-01-25T00:00:00+00:00) already exists.
```

(cherry picked from commit 594069ee06)
2021-02-04 14:34:41 +00:00
Kaxil Naik bf73930eb3 Fix structure and typo in Updating.md (#14005)
(cherry picked from commit 72983287a1)
2021-02-04 14:34:41 +00:00
vikram Jadhav 0e7177051b Added missing return parameter in read function of FileTaskHandler (#14001)
this issue ouccurs when invalid try_number value is passed in get logs api
FIXES: #13638

(cherry picked from commit 2366f861ee)
2021-02-04 14:34:41 +00:00
Jarek Potiuk bd01549ef0 Update wording in upgrading documentation (#13862)
* Update wording in upgrading documentation

* Update docs/apache-airflow/upgrading-to-2.rst

Co-authored-by: Tomek Urbaszek <tomasz.urbaszek@polidea.com>

* Update docs/apache-airflow/upgrading-to-2.rst

Co-authored-by: Vikram Koka <vikram@astronomer.io>

Co-authored-by: Tomek Urbaszek <tomasz.urbaszek@polidea.com>
Co-authored-by: Vikram Koka <vikram@astronomer.io>
(cherry picked from commit 32d2c25e2d)
2021-02-04 14:34:41 +00:00
Ephraim Anierobi 366ae338ac Fix ./breeze exec command error: /bin/bash: -c: option requires an argument (#13998)
(cherry picked from commit 8eddc8b501)
2021-02-04 14:34:41 +00:00
Kaxil Naik 7c152906d5 Fix docs url in tests (#13992)
https://github.com/apache/airflow/pull/13250 missed a case to update
URL in `tests/www/test_views.py`.

(cherry picked from commit 9840e406fd)
2021-02-04 14:34:40 +00:00
Kaxil Naik cba60dc870 Stop loading Extra Operator links in Scheduler (#13932)
closes #13099

(cherry picked from commit 7034529303)
2021-02-04 14:34:40 +00:00
Kaxil Naik 958e25836f Bugfix: Manual DagRun trigger should not skip scheduled runs (#13963)
closes https://github.com/apache/airflow/issues/13434

(cherry picked from commit de277c69e7)
2021-02-04 14:34:40 +00:00
Vikram Koka 073d0b13c9 Added a FAQ section to the Upgrading to 2 doc (#13979)
Added a FAQ question to the Upgrading to 2 doc and added an initial
question and answer around needing providers to be installed before
connection types show up in the UI.

(cherry picked from commit 8e0db6eae3)
2021-02-04 14:34:40 +00:00
Kaxil Naik 5270fb4480 Ignore import order error in docs 2021-02-04 14:34:40 +00:00
Kaxil Naik 8132f32f5c Use 2.0.1 in docker-compose Quick start guide 2021-02-04 14:34:40 +00:00
Vladimir Mikhaylov f0fa496595 Add deprecated config options to docs (#13883)
closes: #12772

(cherry picked from commit 65e49fc56f)
2021-02-04 14:34:40 +00:00
Kevin Yuen 95b02c4710 Pass image_pull_policy in KubernetesPodOperator correctly (#13289)
* pass image_pull_policy to V1Container

image_pull_policy is not being passed into the V1Container in
KubernetesPodOperator. This commit fixes this.

* add test for image_pull_policy not set

image_pull_policy should be IfNotPresent by default if
it's not set. The test ensure the correct value is passed
to the V1Container object.

(cherry picked from commit 7a560ab6de)
2021-02-04 14:34:40 +00:00
Vladimir Mikhaylov 60493a764c Add params to the DAG details endpoint (#13790)
(cherry picked from commit 10b8ecc86f)
2021-02-04 14:34:40 +00:00
Kaxil Naik 669e14d7ee Fix DB Migration for SQLite to upgrade to 2.0 (#13921)
closes https://github.com/apache/airflow/issues/13877

(cherry picked from commit 7f45e62fdf)
2021-02-04 14:34:40 +00:00
Vikram Koka c62f118e5a Updated taskflow api doc to show dependency with sensor (#13968)
* Updated taskflow api doc to show dependency with sensor
Updated the taskflow api tutorial document to show how to setup a
dependency to a python-based decorated task from a classic
FileSensor task.

(cherry picked from commit df11a1d7dc)
2021-02-04 14:34:40 +00:00
Ephraim Anierobi 9a075be9d1 Bugfix: Allow getting details of a DAG with null start_date (REST API) (#13959)
(cherry picked from commit fdb83c7439)
2021-02-04 14:34:40 +00:00
James Timmins 6d2db67436 Add test for Public role permissions. (#13965)
In #13923, all permissions were removed from the Public role. This adds a test to ensure that the default public role doesn't have any permissions.

related: #13923
(cherry picked from commit a52e77d0b4)
2021-02-04 14:34:40 +00:00
Jed Cunningham 9650992355 Docs: Fix FAQ on scheduler latency (#13969)
(cherry picked from commit ddc424283c)
2021-02-04 14:34:40 +00:00
Kaxil Naik 91d636cc2d Only allow passing JSON Serializable conf to TriggerDagRunOperator (#13964)
closes https://github.com/apache/airflow/issues/13414

(cherry picked from commit b4885b2587)
2021-02-04 14:34:40 +00:00
QP Hou 0037104d5e Fix invalid value error caused by long k8s pod name (#13299)
K8S pod names follows DNS_SUBDOMAIN naming convention, which can be
broken down into one or more DNS_LABEL separated by `.`.

While the max length of pod name (DNS_SUBDOMAIN) is 253, each label
component (DNS_LABEL) of a the name cannot be longer than 63. Pod names
generated by k8s executor right now only contains one label, which means
the total effective name length cannot be greater than 63.

This patch concats uuid to pod_id using `.` to generate the pod anem,
thus extending the max name length to 63 + len(uuid).

Reference: https://github.com/kubernetes/kubernetes/blob/release-1.1/docs/design/identifiers.md
Relevant discussion: https://github.com/kubernetes/kubernetes/issues/79351#issuecomment-505228196

(cherry picked from commit 862443f6d3)
2021-02-04 14:34:40 +00:00
Kamil Breguła c4c7a17977 Add information about all access methods to the environment (#13940)
(cherry picked from commit 0400f09cb5)
2021-02-04 14:34:40 +00:00
Alexey Sanko 87e3a0165a Fix link in INTHEWILD.md (#13958)
Dropbox link fix

(cherry picked from commit 8cb85e8235)
2021-02-04 14:34:40 +00:00
Alexey Sanko def84c94c5 Dropbox uses Airflow (#13956)
(cherry picked from commit a7b85b0761)
2021-02-04 14:34:40 +00:00
James Timmins f0aa930b2b Don't add Website.can_read access to default roles. (#13923)
related: #13856
(cherry picked from commit 70ce0d8142)
2021-02-04 14:34:40 +00:00
James Timmins 46ea50736d Don't add User role perms to custom roles. (#13856)
closes: #9245 #13511
(cherry picked from commit 35b5a38313)
2021-02-04 14:34:40 +00:00
mattluudo 87c4f7bcdc Updates Oracle.rst documentation (#13871)
Resolves Issue #10186 (Move Tips & Tricks for Oracle shops should be moved to Airflow Docs)

Fixes broken link, add UI connection documentation link, and connection tips.

(cherry picked from commit 74da0faa7b)
2021-02-04 14:34:40 +00:00
Kaxil Naik 8ea6e3b908 Update Mongodb inventory URL to fix docs build (#13939)
Master is failing on docs build because of the following error:

```
Failed to fetch inventory: https://api.mongodb.com/python/current/objects.inv
```

This is because the URL is changed to https://pymongo.readthedocs.io/en/stable/objects.inv

(cherry picked from commit 7a5aafce08)
2021-02-04 14:34:40 +00:00
Lieven Govaerts d3200279f7 Update ``airflow_local_settings.py`` to fix an error message (#13927)
Fix error message, no functional change: remote_base_log_folder is now in the logging section, not in core.

(cherry picked from commit 495181b4f2)
2021-02-04 14:34:40 +00:00
Ian Carroll 21cedff205 Add authentication to lineage endpoint for experimental API (#13870)
(cherry picked from commit 24a54242d5)
2021-02-04 14:34:40 +00:00
QP Hou 4b1a6f78d1 Fix dag run type enum query for mysqldb driver (#13278)
By default sqlalchemy pass query params as is to db dialect drivers for
query execution. This causes inconsistent behavior of query param
evaluation between different db drivers. For example, MySQLdb will
convert `DagRunType.SCHEDULED` to string `'DagRunType.SCHEDULED'`
instead of string `'scheduled'`.

see https://github.com/apache/airflow/pull/11621 for relevant
discussions.

(cherry picked from commit 53e8283871)
2021-02-04 14:34:40 +00:00
Kaxil Naik 8659d930a1 Fix docker-compose command to initialize the environment (#13914)
(cherry picked from commit 7f4c88c068)
2021-02-04 14:34:40 +00:00
Kaxil Naik 3881ad423f Only compare updated time when Serialized DAG exists (#13899)
closes https://github.com/apache/airflow/issues/13667

The following error happens when Serialized DAGs exist in Webserver or Scheduler but it has just been removed from serialized_dag table,
mainly due to the removal of DAG file.

```
Traceback (most recent call last):
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1516, in _do_scheduling
    self._schedule_dag_run(dag_run, active_runs_by_dag_id.get(dag_run.dag_id, set()), session)
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1629, in _schedule_dag_run
    dag = dag_run.dag = self.dagbag.get_dag(dag_run.dag_id, session=session)
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/models/dagbag.py", line 187, in get_dag
    if sd_last_updated_datetime > self.dags_last_fetched[dag_id]
```

A simple fix is to just check if `sd_last_updated_datetime` is not `None` i.e. Serialized DAG for that dag_id is not None

(cherry picked from commit 8958d125cd)
2021-02-04 14:34:40 +00:00
Kamil Breguła 67a96ce510 Add quick start for Airflow on Docker (#13660)
Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
(cherry picked from commit ffb472cf9e)
2021-02-04 14:34:40 +00:00
Jun 15016a0258 Fix db shell for sqlite (#13907)
closes: #12806

(cherry picked from commit 0d1c39ad2d)
2021-02-04 14:34:40 +00:00
Vladimir Mikhaylov 51ac3b68a7 Fix typo in CLI error (#13913)
(cherry picked from commit c2266aac48)
2021-02-04 14:34:40 +00:00
Kaxil Naik 09e2c78eb2 Improve the error when DAG does not exist when running dag pause command (#13900)
When running `airflow dags unpause` with a DAG that does not exist, it
currently shows this error

```
root@6f086ba87198:/opt/airflow# airflow dags unpause example_bash_operatoredd
Traceback (most recent call last):
  File "/usr/local/bin/airflow", line 33, in <module>
    sys.exit(load_entry_point('apache-airflow', 'console_scripts', 'airflow')())
  File "/opt/airflow/airflow/__main__.py", line 40, in main
    args.func(args)
  File "/opt/airflow/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/opt/airflow/airflow/utils/cli.py", line 92, in wrapper
    return f(*args, **kwargs)
  File "/opt/airflow/airflow/cli/commands/dag_command.py", line 160, in dag_unpause
    set_is_paused(False, args)
  File "/opt/airflow/airflow/cli/commands/dag_command.py", line 170, in set_is_paused
    dag.set_is_paused(is_paused=is_paused)
AttributeError: 'NoneType' object has no attribute 'set_is_paused'
```

This commit changes the error to show a helpful error:

```
root@6f086ba87198:/opt/airflow# airflow dags unpause example_bash_operatoredd
DAG: example_bash_operatoredd does not exit in 'dag' table
```

(cherry picked from commit 8723b1feb8)
2021-02-04 14:34:40 +00:00
Ryan Hamilton abdf805719 Fix to ensure 100vh min plays nicely w/ Linux+Chrome (#13857)
(cherry picked from commit f72be51aec)
2021-02-04 14:34:40 +00:00
Kaxil Naik 253d20ad1c Fix race condition when using Dynamic DAGs (#13893)
closes https://github.com/apache/airflow/issues/13504

Currently, the DagFileProcessor parses the DAG files, writes it to the
`dag` table and then writes DAGs to `serialized_dag` table.

At the same time, the scheduler loop is constantly looking for the next
DAGs to process based on ``next_dagrun_create_after`` column of the DAG
table.

It might happen that as soon as the DagFileProcessor writes DAG to `dag`
table, the scheduling loop in the Scheduler picks up the DAG for processing.
However, as the DagFileProcessor has not written to serialized DAG table yet
the scheduler will error with "Serialized Dag not Found" error.

This would mainly happen when the DAGs are dynamic where the result of one DAG,
creates multiple DAGs.

This commit changes the order of writing DAG and Serialized DAG and hence
before a DAG is written to `dag` table it will be written to `serialized_dag` table.

(cherry picked from commit b9eb51a0fb)
2021-02-04 14:34:40 +00:00
Jarek Potiuk aa05140469 Clarifies differences between extras and provider packages (#13810)
(cherry picked from commit dbd0262279)
2021-02-04 14:34:40 +00:00
Kaxil Naik 9a15fbd026 Make Smart Sensors DB Migration idempotent (#13892)
(cherry picked from commit d7f7c63ca8)
2021-02-04 14:34:40 +00:00
Vivek Bhojawala eadf9245d1 Add extra field to get_connnection REST endpoint (#13885)
(cherry picked from commit adf7755eaa)
2021-02-04 14:34:40 +00:00
Mahesh Panati 6eb9b02125 Added Aviva Plc to INTHEWILD.md (#13875)
(cherry picked from commit c4b723f324)
2021-02-04 14:34:40 +00:00