Граф коммитов

3020 Коммитов

Автор SHA1 Сообщение Дата
Ash Berlin-Taylor f603b36aa4
Ensure that manually creating a DAG run doesn't "block" the scheduler (#11732)
It was possible to "block" the scheduler such that it would not
schedule or queue tasks for a dag if you triggered a DAG run when the
DAG was already at the max active runs.

This approach works around the problem for now, but a better longer term
fix for this would be to introduce a "queued" state for DagRuns, and
then when manually creating dag runs (or clearing) set it to queued, and
only have the scheduler set DagRuns to running, nothing else -- this
would mean we wouldn't need to examine active runs in the TI part of the
scheduler loop, only in DagRun creation part.

Fixes #11582
2020-10-23 09:51:03 +01:00
Darwin Yip 0df60b7736
Add reattach flag to ECSOperator (#10643)
..so that whenever the Airflow server restarts, it does not leave rogue ECS Tasks. Instead the operator will seek for any running instance and attach to it.
2020-10-23 09:10:07 +02:00
Jarek Potiuk 0647888c15
Enables splitting tests into smaller chunks (#11659)
We've implemented the capability of running the tests in smaller
chunks and selective running only some of those, but this
capability have been disabled by mistake by default setting of
TEST_TYPE to "All" and not removing it when TEST_TYPES are set
to the sets of tests that should be run.

This should speed up many of our tests and also hopefully
lower the chance of EXIT 137 errors.
2020-10-22 23:25:00 +02:00
yuqian90 4f2e0cf173
Speed up `dag.clear()` when clearing lots of ExternalTaskSensor and ExternalTaskMarker (#11184)
This is an improvement to the UI response time when clearing dozens of DagRuns of large DAGs (thousands of tasks) containing many ExternalTaskSensor + ExternalTaskMarker pairs. In the current implementation, clearing tasks can get slow especially if the user chooses to clear with Future, Downstream and Recursive all selected.

This PR speeds it up. There are two major improvements:

Updating self._task_group in dag.sub_dag() is improved to not deep copy _task_group because it's a waste of time. Instead, do something like dag.task_dict, set it to None first and then copy explicitly.
Pass the TaskInstance already visited down the recursive calls of dag.clear() as visited_external_tis. This speeds up the example in test_clear_overlapping_external_task_marker by almost five folds.
For real large dags containing 500 tasks set up in a similar manner, the time it takes to clear 30 DagRun is cut from around 100s to less than 10s.
2020-10-22 15:37:36 +01:00
Kaxil Naik 7c6dfcb0bf
Use unittest.mock instead of backported mock library (#11643)
mock is now part of the Python standard library, available as unittest.mock in Python 3.3 onwards.
2020-10-22 13:23:15 +01:00
Ash Berlin-Taylor 8045cc215d
Stop scheduler from thinking that upstream_failed tasks are running (#11730)
This was messing up the "max_active_runs" calculation, and this fix is a
"hack" until we add a better approach of adding a queued state to
DagRuns -- at which point we don't even have to do this calculation at
all.
2020-10-22 13:11:37 +01:00
Martijn Pieters 0eaa688796
Ensure task logs go to the correct try number file (#11723)
The run context (logging context) accesses task instance attributes via the log_filename_template configuration.

Fixes #11717.
2020-10-22 11:31:52 +01:00
Tanjin Panna 91503308c7
Add Google Cloud Memorystore Memcached Operators (#10121)
Co-authored-by: Tobiasz Kędzierski <tobiasz.kedzierski@polidea.com>
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-10-22 03:50:40 +02:00
Katsunori Kanda b9d677cdd6
Add type hints to aws provider (#11531)
* Added type hints to aws provider

* Update airflow/providers/amazon/aws/log/s3_task_handler.py

* Fix expectation for submit_job

* Fix documentation

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-10-22 02:49:22 +02:00
Ash Berlin-Taylor ae791e1916
Fix formatting errors introduced in #11720 (#11733) 2020-10-21 23:49:32 +01:00
Joe Beeson 1fb3c28e1a
Add support for setting ciphers for SFTPHook (#11720) 2020-10-21 22:13:50 +02:00
Kosteev Eugene 950c16d0b0
Retry requests in case of error in Google ML Engine Hook (#11712) 2020-10-21 21:29:31 +02:00
John Bampton 172820db4d
Fix case of GitHub (#11398) 2020-10-21 14:32:41 +02:00
Kamil Breguła 53e6062105
Enforce strict rules for yamllint (#11709) 2020-10-21 12:24:32 +02:00
Tomek Urbaszek 7ef0b3c929
Revert "Refactor celery worker command (#11336)" (#11698)
This reverts commit 02ce45cafe.
That refactored Clery worker to be compatible with 5.0. However this
introduced some incompatibilities.

Closes: #11622
Closes: #11697
2020-10-21 11:14:10 +02:00
Kamil Breguła 3a45f1f84d
Extract Kubernetes command to separate file (#11669)
* Move Kubernetes command to seperate file

* fixup! Move Kubernetes command to seperate file
2020-10-20 14:32:19 -07:00
Ryan Hamilton 080a470944
Improve example DAGs data by diversifying "tags" value (#11665) 2020-10-20 13:46:32 +01:00
Kamil Breguła 1543923c19
Add Kerberos Auth for PrestoHook (#10488) 2020-10-20 13:43:18 +02:00
Martijn Pieters 26ae8e93e8
StreamLogWriter: Provide (no-op) close method. (#10884)
Some contexts try to close their reference to the stderr stream at logging shutdown, this ensures these don't break.

* Make pylint happy

An explicit `pass` is better here, but the docstring _is_ a statement.

Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
2020-10-20 09:20:39 +01:00
Alex Begg 3ee618623b
Switch PagerdutyHook from pypd to use pdpyras instead (#11151) 2020-10-20 09:17:58 +01:00
Jarek Potiuk 9a90ebeabe
Bats tests should be much faster now for pre-commits. (#11662)
For pre-commit run of the tests only the corresponding tests
for changed .sh files and changed .bats files should be run
2020-10-20 09:21:28 +02:00
Jarek Potiuk b5d1ab9409
Introduced deterministic order in connection export (#11670)
The tests for connection export failed when CLI tests are
run in isolation. The problem was with non-deterministic
sequence of returned rows from connection export query.

Rather than fixing the test to accept the non-deterministic
sequence, it is better idea to return them always in the
connection_id order. This does not change functionality and
is backwards compatible, but at the same time it gives stability
in the export, which might be important if someone uses export
to determine for example if some connections were added/removed.
2020-10-20 08:34:38 +02:00
Joe Harris a221ccb956
Improvement: Populate 'Configuration JSON' form with DAG default params json in the Trigger-DAG UI (#10839) 2020-10-19 21:30:40 +01:00
daemon-demon a4dc11fae6
Change to pass all extra connection paramaters to psycopg2 (#11019)
Closes #10505

Co-authored-by: priyankagovindaraju <Pallavika.05>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
2020-10-19 15:14:51 +01:00
Daniel Burkhardt Cerigo 2d854c3505
Add service_account to Google ML Engine operator (#11619) 2020-10-19 14:42:50 +02:00
Shekhar Singh 91898e8421
Add Plugins View in web UI (#10770) 2020-10-19 12:19:10 +02:00
Ephraim Anierobi 7206fd7d54
Allow null schedule_interval in OpenAPI spec for DAGs (#11532) 2020-10-19 11:03:17 +01:00
Kaxil Naik 89e5acc1e2
Use Python 3 Style super calls (#11644) 2020-10-19 09:32:42 +01:00
Gabriel Montañola 674368f66c
Fixes MySQLToS3 float to int conversion (#10437)
* fix: 🐛 Float to Int columns conversion

The `_fix_int_dytpes` method is applying the `astype` transformation to
the return of a `np.where` call. I added an extra step to the method in
order to apply this to the whole pd.Series. Note that Int64Dtype must be
used as an instance, since Pandas will raise an Exception if a class is
used.

* test: Add dtype test for integers

* style: Change line length
2020-10-19 09:53:18 +02:00
Kaxil Naik fd8b07c6bb
Remove usage of six (#11645)
Since we support Py 3.6, there is no need of six library
2020-10-19 09:03:48 +02:00
Kaxil Naik 885db908d5
Fix minor typos in tests (#11638)
`cllient` -> `client`
`environement` -> `environment`
`naamespace` -> `namespace`
`alllow` -> `allow`
2020-10-18 21:12:05 +02:00
Ephraim Anierobi f8ff217e2f
Fix incorrect typing and move config args out of extra connection config to operator args (#11635) 2020-10-18 20:45:55 +02:00
Jarek Potiuk 66ced72fca
Name and optionally preserve data volumes in Breeze (#11628)
So far breeze used in-container data for persisting it (mysql redis,
postgres). This means that the data was kept as long, as long the
containers were running. If you stopped Breeze via `stop` command
the data was always deleted.

This changes the behaviour - each of the Breeze containers has
a named volume where data is kept. Those volumes are also deleted
by default when Breeze is stopped, but you can choose to preserve
them by adding ``--preserve-volumes`` when you run ``stop`` or
``restart`` command.

Fixes: #11625
2020-10-18 16:39:44 +02:00
Jarek Potiuk db3fe0926b
Teardown of webserver tests is not picky about processes. (#11616)
Fixes random failure when processes are still running
on teardown of some webserver tests. We simply ignor that
after we send sigkill to those processes.

Fixes #11615
2020-10-18 11:52:23 +02:00
Ash Berlin-Taylor f507180b90
Make DagRunType inherit from `str` too for easier use. (#11621)
This approach is documented in https://docs.python.org/3.6/library/enum.html#others:

```
While IntEnum is part of the enum module, it would be very simple to
implement independently:

class IntEnum(int, Enum):
    pass
```

We just extend this to a str -- this means the SQLAlchemy has no trouble
putting these in to queries, and `"scheduled" == DagRunType.SCHEDULED`
is true.

This change makes it simpler to use `dagrun.run_type`.
2020-10-18 08:18:48 +01:00
James Timmins 728518224b
Use permission constants (#11389)
Use constants for permission resource and action names.
2020-10-18 02:46:11 +01:00
Tomek Urbaszek 112f7d7169
Add creating_job_id to DagRun table (#11396)
This PR introduces creating_job_id column in DagRun table that links a
DagRun to job that created it. Part of #11302

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-10-17 12:31:07 +02:00
Michał Misiewicz 210a948658
Fix tcp keepalive parameters parsing (#11594) 2020-10-16 17:34:11 -07:00
Gerard Casas Saez 84dc2fbd2e
Set doc_md when using task decorator and function has __doc__ (#11598) 2020-10-17 01:09:01 +01:00
Daniel Imberman 00dd7586fb
Raises a warning for provide_context instead of killing the task (#11597)
* raises a warning for provide_context instead of killing the task

* Update airflow/operators/python.py

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>

* static checks

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-10-16 15:18:55 -07:00
Ash Berlin-Taylor 0c5bbe83c6
Replace methods on state with frozenset properties (#11576)
Although these lists are short, there's no need to re-create them each
time, and also no need for them to be a method.

I have made them lowercase (`finished`, `running`) instead of uppercase
(`FINISHED`, `RUNNING`) to distinguish them from the actual states.
2020-10-16 21:09:36 +01:00
Martijn Pieters 4d611f2ffd
Clean up _trigger_dag function (#11584)
- The dag_run argument is only there for test mocks, and only to access a static method. Removing this simplifies the function, reduces confusion.
- Give optional arguments a default value, reduce indentation of arg list to PEP / Black standard.
- Clean up tests for readability
2020-10-16 20:22:40 +01:00
Michał Misiewicz 91484b938f
Pass SQLAlchemy engine options to FAB based UI (#11395)
Co-authored-by: Tomek Urbaszek <turbaszek@gmail.com>
2020-10-16 19:55:41 +02:00
Kamil Breguła 3c10ca6504
Add DataflowStartFlexTemplateOperator (#8550) 2020-10-16 18:28:23 +02:00
Martijn Pieters 3163016450
Guard against kubernetes not being installed (#11558)
If the `kubernetes.client` import fails, then `airflow.kubernetes.pod_generator` also can't be imported, and there won't be attributes on `k8s` to use in `isinstance()` calls.

Instead of setting `k8s` to `None`, use an explicit flag so later code can disable kubernetes-specific branches explicitly.

Also, when de-serializing a Kubernetes pod with no kubernetes library installed is an error.
2020-10-16 16:23:20 +01:00
James Timmins 7ab62100af
Prepend `DAG:` to dag permissions (#11189)
This adds the prefix DAG: to newly created dag permissions. It supports checking permissions on both prefixed and un-prefixed DAG permission names.

This will make it easier to identify permissions that related to granular dag access.

This PR does not modify existing dag permission names to use the new prefixed naming scheme. That will come in a separate PR.

Related to issue #10469
2020-10-16 00:32:38 +01:00
Songkran Nethan 0646849e3d
Add protocol_version to conn_config for Cassandrahook (#11036) 2020-10-14 21:30:30 +02:00
kukigai 6c8cf6aebe
Add reset_dag_run option on dagrun_operator to clear existing dag run (#11484)
* Add reset_dag_run option on dagrun_operator so that user can clear target dag run if exists.

* Logging coding style changes.

* Make pylint check pass.

* Make pylint check pass.

* Make pylint check pass on unit test file.

* Make static check pass.

* Use settings.STORE_SERIALIZED_DAGS

Co-authored-by: Kaz Ukigai <kukigai@apple.com>
2020-10-14 14:15:42 +02:00
Vikram Koka 095756c6e8
Airflow tutorial to use Decorated Flows (#11308)
Created a new Airflow tutorial to use Decorated Flows (a.k.a. functional
DAGs). Also created a DAG to perform the same operations without using
functional DAGs to be compatible with Airflow 1.10.x and to show the
difference.

* Apply suggestions from code review

It makes sense to simplify the return variables being passed around without needlessly converting to JSON and then reconverting back.

* Update tutorial_functional_etl_dag.py

Fixed data passing between tasks to be more natural without converting to JSON and converting back to variables.

* Updated dag options and task doc formating

Based on feedback on the PR, updated the DAG options (including schedule) and the fixed the task documentation to avoid indentation.

* Added documentation file for functional dag tutorial

Added the tutorial documentation to the docs directory. Fixed linting errors in the example dags.
Tweaked some doc references in the example dags for inclusion into the tutorial documentation.
Added the example dags to example tests.

* Removed multiple_outputs from task defn

Had a multiple_outputs=True defined in the Extract task defn, which was unnecessary. - Removed based on feedback.

Co-authored-by: Gerard Casas Saez <casassg@users.noreply.github.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
2020-10-13 16:59:20 +01:00
Kamil Breguła 5772d4d150
Add endpoints for task instances (#9597) 2020-10-13 11:58:07 +01:00