incubator-airflow

Граф коммитов

Автор	SHA1	Сообщение	Дата
Ash Berlin-Taylor	f603b36aa4	Ensure that manually creating a DAG run doesn't "block" the scheduler (#11732 ) It was possible to "block" the scheduler such that it would not schedule or queue tasks for a dag if you triggered a DAG run when the DAG was already at the max active runs. This approach works around the problem for now, but a better longer term fix for this would be to introduce a "queued" state for DagRuns, and then when manually creating dag runs (or clearing) set it to queued, and only have the scheduler set DagRuns to running, nothing else -- this would mean we wouldn't need to examine active runs in the TI part of the scheduler loop, only in DagRun creation part. Fixes #11582	2020-10-23 09:51:03 +01:00
Darwin Yip	0df60b7736	Add reattach flag to ECSOperator (#10643 ) ..so that whenever the Airflow server restarts, it does not leave rogue ECS Tasks. Instead the operator will seek for any running instance and attach to it.	2020-10-23 09:10:07 +02:00
Jarek Potiuk	0647888c15	Enables splitting tests into smaller chunks (#11659 ) We've implemented the capability of running the tests in smaller chunks and selective running only some of those, but this capability have been disabled by mistake by default setting of TEST_TYPE to "All" and not removing it when TEST_TYPES are set to the sets of tests that should be run. This should speed up many of our tests and also hopefully lower the chance of EXIT 137 errors.	2020-10-22 23:25:00 +02:00
yuqian90	4f2e0cf173	Speed up `dag.clear()` when clearing lots of ExternalTaskSensor and ExternalTaskMarker (#11184 ) This is an improvement to the UI response time when clearing dozens of DagRuns of large DAGs (thousands of tasks) containing many ExternalTaskSensor + ExternalTaskMarker pairs. In the current implementation, clearing tasks can get slow especially if the user chooses to clear with Future, Downstream and Recursive all selected. This PR speeds it up. There are two major improvements: Updating self._task_group in dag.sub_dag() is improved to not deep copy _task_group because it's a waste of time. Instead, do something like dag.task_dict, set it to None first and then copy explicitly. Pass the TaskInstance already visited down the recursive calls of dag.clear() as visited_external_tis. This speeds up the example in test_clear_overlapping_external_task_marker by almost five folds. For real large dags containing 500 tasks set up in a similar manner, the time it takes to clear 30 DagRun is cut from around 100s to less than 10s.	2020-10-22 15:37:36 +01:00
Kaxil Naik	7c6dfcb0bf	Use unittest.mock instead of backported mock library (#11643 ) mock is now part of the Python standard library, available as unittest.mock in Python 3.3 onwards.	2020-10-22 13:23:15 +01:00
Ash Berlin-Taylor	8045cc215d	Stop scheduler from thinking that upstream_failed tasks are running (#11730 ) This was messing up the "max_active_runs" calculation, and this fix is a "hack" until we add a better approach of adding a queued state to DagRuns -- at which point we don't even have to do this calculation at all.	2020-10-22 13:11:37 +01:00
Martijn Pieters	0eaa688796	Ensure task logs go to the correct try number file (#11723 ) The run context (logging context) accesses task instance attributes via the log_filename_template configuration. Fixes #11717.	2020-10-22 11:31:52 +01:00
Tanjin Panna	91503308c7	Add Google Cloud Memorystore Memcached Operators (#10121 ) Co-authored-by: Tobiasz Kędzierski <tobiasz.kedzierski@polidea.com> Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>	2020-10-22 03:50:40 +02:00
Katsunori Kanda	b9d677cdd6	Add type hints to aws provider (#11531 ) * Added type hints to aws provider * Update airflow/providers/amazon/aws/log/s3_task_handler.py * Fix expectation for submit_job * Fix documentation Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>	2020-10-22 02:49:22 +02:00
Ash Berlin-Taylor	ae791e1916	Fix formatting errors introduced in #11720 (#11733 )	2020-10-21 23:49:32 +01:00
Joe Beeson	1fb3c28e1a	Add support for setting ciphers for SFTPHook (#11720 )	2020-10-21 22:13:50 +02:00
Kosteev Eugene	950c16d0b0	Retry requests in case of error in Google ML Engine Hook (#11712 )	2020-10-21 21:29:31 +02:00
John Bampton	172820db4d	Fix case of GitHub (#11398 )	2020-10-21 14:32:41 +02:00
Kamil Breguła	53e6062105	Enforce strict rules for yamllint (#11709 )	2020-10-21 12:24:32 +02:00
Tomek Urbaszek	7ef0b3c929	Revert "Refactor celery worker command (#11336 )" (#11698 ) This reverts commit `02ce45cafe`. That refactored Clery worker to be compatible with 5.0. However this introduced some incompatibilities. Closes: #11622 Closes: #11697	2020-10-21 11:14:10 +02:00
Kamil Breguła	3a45f1f84d	Extract Kubernetes command to separate file (#11669 ) * Move Kubernetes command to seperate file * fixup! Move Kubernetes command to seperate file	2020-10-20 14:32:19 -07:00
Ryan Hamilton	080a470944	Improve example DAGs data by diversifying "tags" value (#11665 )	2020-10-20 13:46:32 +01:00
Kamil Breguła	1543923c19	Add Kerberos Auth for PrestoHook (#10488 )	2020-10-20 13:43:18 +02:00
Martijn Pieters	26ae8e93e8	StreamLogWriter: Provide (no-op) close method. (#10884 ) Some contexts try to close their reference to the stderr stream at logging shutdown, this ensures these don't break. * Make pylint happy An explicit `pass` is better here, but the docstring _is_ a statement. Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>	2020-10-20 09:20:39 +01:00
Alex Begg	3ee618623b	Switch PagerdutyHook from pypd to use pdpyras instead (#11151 )	2020-10-20 09:17:58 +01:00
Jarek Potiuk	9a90ebeabe	Bats tests should be much faster now for pre-commits. (#11662 ) For pre-commit run of the tests only the corresponding tests for changed .sh files and changed .bats files should be run	2020-10-20 09:21:28 +02:00
Jarek Potiuk	b5d1ab9409	Introduced deterministic order in connection export (#11670 ) The tests for connection export failed when CLI tests are run in isolation. The problem was with non-deterministic sequence of returned rows from connection export query. Rather than fixing the test to accept the non-deterministic sequence, it is better idea to return them always in the connection_id order. This does not change functionality and is backwards compatible, but at the same time it gives stability in the export, which might be important if someone uses export to determine for example if some connections were added/removed.	2020-10-20 08:34:38 +02:00
Joe Harris	a221ccb956	Improvement: Populate 'Configuration JSON' form with DAG default params json in the Trigger-DAG UI (#10839 )	2020-10-19 21:30:40 +01:00
daemon-demon	a4dc11fae6	Change to pass all extra connection paramaters to psycopg2 (#11019 ) Closes #10505 Co-authored-by: priyankagovindaraju <Pallavika.05> Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>	2020-10-19 15:14:51 +01:00
Daniel Burkhardt Cerigo	2d854c3505	Add service_account to Google ML Engine operator (#11619 )	2020-10-19 14:42:50 +02:00
Shekhar Singh	91898e8421	Add Plugins View in web UI (#10770 )	2020-10-19 12:19:10 +02:00
Ephraim Anierobi	7206fd7d54	Allow null schedule_interval in OpenAPI spec for DAGs (#11532 )	2020-10-19 11:03:17 +01:00
Kaxil Naik	89e5acc1e2	Use Python 3 Style super calls (#11644 )	2020-10-19 09:32:42 +01:00
Gabriel Montañola	674368f66c	Fixes MySQLToS3 float to int conversion (#10437 ) * fix: 🐛 Float to Int columns conversion The `_fix_int_dytpes` method is applying the `astype` transformation to the return of a `np.where` call. I added an extra step to the method in order to apply this to the whole pd.Series. Note that Int64Dtype must be used as an instance, since Pandas will raise an Exception if a class is used. * test: Add dtype test for integers * style: Change line length	2020-10-19 09:53:18 +02:00
Kaxil Naik	fd8b07c6bb	Remove usage of six (#11645 ) Since we support Py 3.6, there is no need of six library	2020-10-19 09:03:48 +02:00
Kaxil Naik	885db908d5	Fix minor typos in tests (#11638 ) `cllient` -> `client` `environement` -> `environment` `naamespace` -> `namespace` `alllow` -> `allow`	2020-10-18 21:12:05 +02:00
Ephraim Anierobi	f8ff217e2f	Fix incorrect typing and move config args out of extra connection config to operator args (#11635 )	2020-10-18 20:45:55 +02:00
Jarek Potiuk	66ced72fca	Name and optionally preserve data volumes in Breeze (#11628 ) So far breeze used in-container data for persisting it (mysql redis, postgres). This means that the data was kept as long, as long the containers were running. If you stopped Breeze via `stop` command the data was always deleted. This changes the behaviour - each of the Breeze containers has a named volume where data is kept. Those volumes are also deleted by default when Breeze is stopped, but you can choose to preserve them by adding ``--preserve-volumes`` when you run ``stop`` or ``restart`` command. Fixes: #11625	2020-10-18 16:39:44 +02:00
Jarek Potiuk	db3fe0926b	Teardown of webserver tests is not picky about processes. (#11616 ) Fixes random failure when processes are still running on teardown of some webserver tests. We simply ignor that after we send sigkill to those processes. Fixes #11615	2020-10-18 11:52:23 +02:00
Ash Berlin-Taylor	f507180b90	Make DagRunType inherit from `str` too for easier use. (#11621 ) This approach is documented in https://docs.python.org/3.6/library/enum.html#others: ``` While IntEnum is part of the enum module, it would be very simple to implement independently: class IntEnum(int, Enum): pass ``` We just extend this to a str -- this means the SQLAlchemy has no trouble putting these in to queries, and `"scheduled" == DagRunType.SCHEDULED` is true. This change makes it simpler to use `dagrun.run_type`.	2020-10-18 08:18:48 +01:00
James Timmins	728518224b	Use permission constants (#11389 ) Use constants for permission resource and action names.	2020-10-18 02:46:11 +01:00
Tomek Urbaszek	112f7d7169	Add creating_job_id to DagRun table (#11396 ) This PR introduces creating_job_id column in DagRun table that links a DagRun to job that created it. Part of #11302 Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>	2020-10-17 12:31:07 +02:00
Michał Misiewicz	210a948658	Fix tcp keepalive parameters parsing (#11594 )	2020-10-16 17:34:11 -07:00
Gerard Casas Saez	84dc2fbd2e	Set doc_md when using task decorator and function has __doc__ (#11598 )	2020-10-17 01:09:01 +01:00
Daniel Imberman	00dd7586fb	Raises a warning for provide_context instead of killing the task (#11597 ) * raises a warning for provide_context instead of killing the task * Update airflow/operators/python.py Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com> * static checks Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>	2020-10-16 15:18:55 -07:00
Ash Berlin-Taylor	0c5bbe83c6	Replace methods on state with frozenset properties (#11576 ) Although these lists are short, there's no need to re-create them each time, and also no need for them to be a method. I have made them lowercase (`finished`, `running`) instead of uppercase (`FINISHED`, `RUNNING`) to distinguish them from the actual states.	2020-10-16 21:09:36 +01:00
Martijn Pieters	4d611f2ffd	Clean up _trigger_dag function (#11584 ) - The dag_run argument is only there for test mocks, and only to access a static method. Removing this simplifies the function, reduces confusion. - Give optional arguments a default value, reduce indentation of arg list to PEP / Black standard. - Clean up tests for readability	2020-10-16 20:22:40 +01:00
Michał Misiewicz	91484b938f	Pass SQLAlchemy engine options to FAB based UI (#11395 ) Co-authored-by: Tomek Urbaszek <turbaszek@gmail.com>	2020-10-16 19:55:41 +02:00
Kamil Breguła	3c10ca6504	Add DataflowStartFlexTemplateOperator (#8550 )	2020-10-16 18:28:23 +02:00
Martijn Pieters	3163016450	Guard against kubernetes not being installed (#11558 ) If the `kubernetes.client` import fails, then `airflow.kubernetes.pod_generator` also can't be imported, and there won't be attributes on `k8s` to use in `isinstance()` calls. Instead of setting `k8s` to `None`, use an explicit flag so later code can disable kubernetes-specific branches explicitly. Also, when de-serializing a Kubernetes pod with no kubernetes library installed is an error.	2020-10-16 16:23:20 +01:00
James Timmins	7ab62100af	Prepend `DAG:` to dag permissions (#11189 ) This adds the prefix DAG: to newly created dag permissions. It supports checking permissions on both prefixed and un-prefixed DAG permission names. This will make it easier to identify permissions that related to granular dag access. This PR does not modify existing dag permission names to use the new prefixed naming scheme. That will come in a separate PR. Related to issue #10469	2020-10-16 00:32:38 +01:00
Songkran Nethan	0646849e3d	Add protocol_version to conn_config for Cassandrahook (#11036 )	2020-10-14 21:30:30 +02:00
kukigai	6c8cf6aebe	Add reset_dag_run option on dagrun_operator to clear existing dag run (#11484 ) * Add reset_dag_run option on dagrun_operator so that user can clear target dag run if exists. * Logging coding style changes. * Make pylint check pass. * Make pylint check pass. * Make pylint check pass on unit test file. * Make static check pass. * Use settings.STORE_SERIALIZED_DAGS Co-authored-by: Kaz Ukigai <kukigai@apple.com>	2020-10-14 14:15:42 +02:00
Vikram Koka	095756c6e8	Airflow tutorial to use Decorated Flows (#11308 ) Created a new Airflow tutorial to use Decorated Flows (a.k.a. functional DAGs). Also created a DAG to perform the same operations without using functional DAGs to be compatible with Airflow 1.10.x and to show the difference. * Apply suggestions from code review It makes sense to simplify the return variables being passed around without needlessly converting to JSON and then reconverting back. * Update tutorial_functional_etl_dag.py Fixed data passing between tasks to be more natural without converting to JSON and converting back to variables. * Updated dag options and task doc formating Based on feedback on the PR, updated the DAG options (including schedule) and the fixed the task documentation to avoid indentation. * Added documentation file for functional dag tutorial Added the tutorial documentation to the docs directory. Fixed linting errors in the example dags. Tweaked some doc references in the example dags for inclusion into the tutorial documentation. Added the example dags to example tests. * Removed multiple_outputs from task defn Had a multiple_outputs=True defined in the Extract task defn, which was unnecessary. - Removed based on feedback. Co-authored-by: Gerard Casas Saez <casassg@users.noreply.github.com> Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com> Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>	2020-10-13 16:59:20 +01:00
Kamil Breguła	5772d4d150	Add endpoints for task instances (#9597 )	2020-10-13 11:58:07 +01:00

1 2 3 4 5 ...

3020 Коммитов