incubator-airflow

Граф коммитов

Автор	SHA1	Сообщение	Дата
Kaxil Naik	7c6dfcb0bf	Use unittest.mock instead of backported mock library (#11643 ) mock is now part of the Python standard library, available as unittest.mock in Python 3.3 onwards.	2020-10-22 13:23:15 +01:00
Ryan Hamilton	080a470944	Improve example DAGs data by diversifying "tags" value (#11665 )	2020-10-20 13:46:32 +01:00
Ash Berlin-Taylor	f507180b90	Make DagRunType inherit from `str` too for easier use. (#11621 ) This approach is documented in https://docs.python.org/3.6/library/enum.html#others: ``` While IntEnum is part of the enum module, it would be very simple to implement independently: class IntEnum(int, Enum): pass ``` We just extend this to a str -- this means the SQLAlchemy has no trouble putting these in to queries, and `"scheduled" == DagRunType.SCHEDULED` is true. This change makes it simpler to use `dagrun.run_type`.	2020-10-18 08:18:48 +01:00
James Timmins	728518224b	Use permission constants (#11389 ) Use constants for permission resource and action names.	2020-10-18 02:46:11 +01:00
Tomek Urbaszek	112f7d7169	Add creating_job_id to DagRun table (#11396 ) This PR introduces creating_job_id column in DagRun table that links a DagRun to job that created it. Part of #11302 Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>	2020-10-17 12:31:07 +02:00
James Timmins	7ab62100af	Prepend `DAG:` to dag permissions (#11189 ) This adds the prefix DAG: to newly created dag permissions. It supports checking permissions on both prefixed and un-prefixed DAG permission names. This will make it easier to identify permissions that related to granular dag access. This PR does not modify existing dag permission names to use the new prefixed naming scheme. That will come in a separate PR. Related to issue #10469	2020-10-16 00:32:38 +01:00
Ash Berlin-Taylor	73b9163a8f	Fully support running more than one scheduler concurrently (#10956 ) * Fully support running more than one scheduler concurrently. This PR implements scheduler HA as proposed in AIP-15. The high level design is as follows: - Move all scheduling decisions into SchedulerJob (requiring DAG serialization in the scheduler) - Use row-level locks to ensure schedulers don't stomp on each other (`SELECT ... FOR UPDATE`) - Use `SKIP LOCKED` for better performance when multiple schedulers are running. (Mysql < 8 and MariaDB don't support this) - Scheduling decisions are not tied to the parsing speed, but can operate just on the database DagFileProcessorProcess: Previously this component was responsible for more than just parsing the DAG files as it's name might imply. It also was responsible for creating DagRuns, and also making scheduling decisions of TIs, sending them from "None" to "scheduled" state. This commit changes it so that the DagFileProcessorProcess now will update the SerializedDAG row for this DAG, and make no scheduling decisions itself. To make the scheduler's job easier (so that it can make as many decisions as possible without having to load the possibly-large SerializedDAG row) we store/update some columns on the DagModel table: - `next_dagrun`: The execution_date of the next dag run that should be created (or None) - `next_dagrun_create_after`: The earliest point at which the next dag run can be created Pre-computing these values (and updating them every time the DAG is parsed) reduce the overall load on the DB as many decisions can be taken by selecting just these two columns/the small DagModel row. In case of max_active_runs, or `@once` these columns will be set to null, meaning "don't create any dag runs" SchedulerJob The SchedulerJob used to only queue/send tasks to the executor after they were parsed, and returned from the DagFileProcessorProcess. This PR breaks the link between parsing and enqueuing of tasks, instead of looking at DAGs as they are parsed, we now: - store a new datetime column, `last_scheduling_decision` on DagRun table, signifying when a scheduler last examined a DagRun - Each time around the loop the scheduler will get (and lock) the next _n_ DagRuns via `DagRun.next_dagruns_to_examine`, prioritising DagRuns which haven't been touched by a scheduler in the longest period - SimpleTaskInstance etc have been almost entirely removed now, as we use the serialized versions * Move callbacks execution from Scheduler loop to DagProcessorProcess * Don’t run verify_integrity if the Serialized DAG hasn’t changed dag_run.verify_integrity is slow, and we don't want to call it every time, just when the dag structure changes (which we can know now thanks to DAG Serialization) * Add escape hatch to disable newly added "SELECT ... FOR UPDATE" queries We are worried that these extra uses of row-level locking will cause problems on MySQL 5.x (most likely deadlocks) so we are providing users an "escape hatch" to be able to make these queries non-locking -- this means that only a singe scheduler should be run, but being able to run one is better than having the scheduler crash. Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>	2020-10-09 22:44:27 +01:00
Sumit Maheshwari	5605d1063b	Fix DagBag bug when a dag has invalid schedule_interval (#11344 )	2020-10-09 13:29:41 +05:30
Daniel Imberman	3ca11eb9b0	Kubernetes executor can adopt tasks from other schedulers (#10996 ) * KubernetesExecutor can adopt tasks from other schedulers * simplify * recreate tables properly * fix pylint Co-authored-by: Daniel Imberman <daniel@astronomer.io>	2020-10-01 12:07:38 -07:00
Logan Attwood	37798f0d2a	Do not silently allow the use of undefined variables in jinja2 templates (#11016 ) This can have extremely bad consequences. After this change, a jinja2 template like the one below will cause the task instance to fail, if the DAG being executed is not a sub-DAG. This may also display an error on the Rendered tab of the Task Instance page. task_instance.xcom_pull('z', key='return_value', dag_id=dag.parent_dag.dag_id) Prior to the change in this commit, the above template would pull the latest value for task_id 'z', for the given execution_date, from any DAG. If your task_ids between DAGs are all unique, or if DAGs using the same task_id always have different execution_date values, this will appear to act like dag_id=None. Our current theory is SQLAlchemy/Python doesn't behave as expected when comparing `jinja2.Undefined` to `None`.	2020-09-25 09:15:28 +02:00
yuqian90	423a382678	SkipMixin: Add missing session.commit() and test (#10421 )	2020-09-22 21:08:12 +01:00
Kaxil Naik	905cdd502a	Add a default for DagModel.default_view (#10897 ) fixes https://github.com/apache/airflow/issues/10283	2020-09-16 00:23:47 +01:00
Denis Evseev	f7da7d94b4	Fix ExternalTaskMarker serialized fields (#10924 ) Co-authored-by: Denis Evseev <xOnelinx@gmail.com> Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>	2020-09-15 23:40:41 +01:00
Tomek Urbaszek	eaa49b2257	Fix chain methods for XComArg (#10827 ) __lshift__ and __rshift__ methods should return other not self. This PR fixes XComArg implementation to support chain like this one: BaseOprator >> XComArg >> BaseOperator Related to: #10153	2020-09-14 13:13:04 +02:00
Yingbo Wang	ac943c9e18	[AIRFLOW-3964][AIP-17] Consolidate and de-dup sensor tasks using Smart Sensor (#5499 ) Co-authored-by: Yingbo Wang <yingbo.wang@airbnb.com>	2020-09-08 22:47:59 +01:00
Ash Berlin-Taylor	a01d986f6a	Don't commit when explicitly passed a session to TI.set_state (#10710 ) The `@provide_session` wrapper will already commit the transaction when returned, unless an explicit session is passed in -- removing this parameter changes the behaviour to be: - If session explicitly passed in: don't commit (caller's responsibility) - If no session passed in, `@provide_session` will commit for us already.	2020-09-03 17:18:32 +01:00
Kaxil Naik	725bf330ef	Revert Clean up DAG serializations based on last_updated (#7424 ) (#10613 ) This PR reverts the behavior of https://github.com/apache/airflow/pull/7424	2020-08-27 20:56:41 +01:00
Jarek Potiuk	2f2d8dbfaf	Remove all "noinspection" comments native to IntelliJ (#10525 ) We have already fixed a lot of problems that were marked with those, also IntelluiJ gotten a bit smarter on not detecting false positives as well as understand more pylint annotation. Wherever the problem remained we replaced it with # noqa comments - as it is also well understood by IntelliJ.	2020-08-25 00:01:37 +02:00
Kaxil Naik	3bc37013f6	Add back 'refresh_all' method in airflow/www/views.py (#10328 ) closes https://github.com/apache/airflow/issues/9749	2020-08-19 10:59:36 +01:00
Jacob Ferriero	7f76b8b942	Add ClusterPolicyViolation support to airflow local settings (#10282 ) This change will allow users to throw other exceptions (namely `AirflowClusterPolicyViolation`) than `DagCycleException` as part of Cluster Policies. This can be helpful for running checks on tasks / DAGs (e.g. asserting task has a non-airflow owner) and failing to run tasks aren't compliant with these checks. This is meant as a tool for airflow admins to prevent user mistakes (especially in shared Airflow infrastructure with newbies) than as a strong technical control for security/compliance posture.	2020-08-12 23:06:29 +01:00
Kaxil Naik	adce6f0296	Use Hash of Serialized DAG to determine DAG is changed or not (#10227 ) closes #10116	2020-08-11 22:31:55 +01:00
Sumit Maheshwari	2102122875	Handle IntegrityError while creating TIs (#10136 ) While doing a trigger_dag from UI, DagRun gets created first and then WebServer starts creating TIs. Meanwhile, Scheduler also picks up the DagRun and starts creating the TIs, which results in IntegrityError as the Primary key constraint gets violated. This happens when a DAG has a good number of tasks. Also, changing the TIs array with a set for faster lookups for Dags with too many tasks.	2020-08-07 18:25:10 +05:30
Leon Yuan	24c8e4c2d6	Changes to all the constructors to remove the args argument (#10163 )	2020-08-06 13:42:51 +01:00
QP Hou	1e36666695	prevent DAG callback exception from crashing scheduler (#10096 )	2020-08-06 10:31:10 +02:00
Kaxil Naik	84b85d8acc	Update Serialized DAGs in Webserver when DAGs are Updated (#9851 ) Before this change, if DAG Serialization was enabled the Webserver would not update the DAGs once they are fetched from DB. The default worker_refresh_interval was `30` so whenever the gunicorn workers were restarted, they used to pull the updated DAGs when needed. This change will allow us to have a larged worker_refresh_interval (e.g 30 mins or even 1 day)	2020-07-20 12:45:18 +01:00
Kaxil Naik	1a32c45126	Don't Update Serialized DAGs in DB if DAG didn't change (#9850 ) We should not update the "last_updated" column unnecessarily. This is first of few optimizations to DAG Serialization that would also aid in DAG Versioning	2020-07-20 12:31:05 +01:00
Andy	9c68e7cc6f	Add Snowflake support to SQL operator and sensor (#9843 ) * Add Snowflake support to SQL operator and sensor * Add test for conn_type to valid hook mapping * Improve code quality for conn type mapping test	2020-07-17 09:04:14 +02:00
Kaxil Naik	d008ff669d	Rename DagBag.store_serialized_dags to Dagbag.read_dags_from_db (#9838 )	2020-07-15 22:28:04 +01:00
Chao-Han Tsai	b01d95ec22	Change DAG.clear to take dag_run_state (#9824 ) * Change DAG.clear to take dag_run_state * fix lint * fix tests * assign var * extend original clause	2020-07-15 13:08:18 -07:00
Kaxil Naik	2d124417e6	Fix Writing Serialized Dags to DB (#9836 )	2020-07-15 18:35:59 +01:00
Kaxil Naik	0eb5020fda	Remove unnecessary comprehension (#9805 )	2020-07-14 09:04:14 +01:00
Chao-Han Tsai	7f64f2d00b	Backfill reset_dagruns set DagRun to NONE state (#9756 )	2020-07-13 10:33:15 -07:00
Kaxil Naik	631ac484f1	Some Pylint fixes in airflow/models/taskinstance.py (#9674 )	2020-07-06 20:32:02 +01:00
Jarek Potiuk	44d4ae809c	Upgrade to latest pre-commit checks (#9686 )	2020-07-06 11:37:22 +02:00
Kaxil Naik	bb19b9179a	Remove side effects from tests (#9675 ) Add setUp and tearDown methods to clear tabels	2020-07-05 22:56:15 +01:00
Kamil Breguła	444051d32c	Fix pylint issues in airflow/models/dagbag.py (#9666 )	2020-07-05 22:45:14 +01:00
Kaxil Naik	7a5441836b	Move XCom tests to tests/models/test_xcom.py (#9601 ) Move XCom tests from `tests/models/test_cleartasks.py` to `tests/models/test_xcom.py`	2020-07-01 08:38:49 +02:00
Kaxil Naik	d0e010f1f7	Add XCom.get_one() method back (#9580 )	2020-06-30 08:34:23 +01:00
Kaxil Naik	ee0335315e	Fix failing test in DagCode (#9565 ) PR https://github.com/apache/airflow/pull/9554 introduced this error and because of Github issue currently (github is down / has degraded performance) the CI didn't run fully	2020-06-29 15:16:41 +01:00
Kaxil Naik	87fdbd0708	Use literal syntax instead of function calls to create data structure (#9516 ) It is slower to call e.g. dict() than using the empty literal, because the name dict must be looked up in the global scope in case it has been rebound. Same for the other two types like list() and tuple().	2020-06-25 16:35:37 +01:00
Tomek Urbaszek	d914a9c3f6	Add query count tests for _run_raw_task (#9509 )	2020-06-25 15:23:32 +02:00
Kamil Breguła	7256f4caa2	Pylint fixes and deprecation of rare used methods in Connection (#9419 )	2020-06-22 13:38:07 +02:00
Tomek Urbaszek	416334e2ec	Properly propagated warnings in operators (#9348 ) * Test warnings are properly propagated * Adjust deprecation warnings * Separate tests and deprecated classes lists	2020-06-19 14:29:33 +02:00
Kaxil Naik	a771270593	Fix TestDagCode.test_remove_unused_code test (#9344 )	2020-06-17 12:52:19 +01:00
Kaxil Naik	9e6b5abea0	Fix retries causing constraint violation on MySQL with DAG Serialization (#9336 ) The issue was caused because the `rendered_task_instance_fields` table did not have precision and hence causing `_mysql_exceptions.IntegrityError`. closes https://github.com/apache/airflow/issues/9148	2020-06-17 11:22:13 +01:00
Tomek Urbaszek	431ea3291c	Resolve upstream tasks when template field is XComArg (#8805 ) * Resolve upstream tasks when template field is XComArg closes: #8054 * fixup! Resolve upstream tasks when template field is XComArg * Resolve task relations in DagRun and DagBag * Add tests for serialized DAG * Set dependencies only in bag_dag, refactor tests * Traverse template_fields attribute * Use provide_test_dag_bag in all tests * fixup! Use provide_test_dag_bag in all tests * Use metaclass + setattr * Add prepare_for_execution method * Check signature of __init__ not class * Apply suggestions from code review Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com> * Update airflow/models/baseoperator.py Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>	2020-06-15 12:29:16 +02:00
Kaxil Naik	6b451360c0	Add test_remove_unused_code to Quarantined test (#9268 ) This test is passing locally and on breeze. Not sure of a reason why it is failing on CI	2020-06-13 07:49:52 +02:00
Chao-Han Tsai	bacb05df38	Add task instance mutation hook (#8852 ) * Add task instance mutation hook * add merge * update docs * fix * add missing import * fix lint * test state as well * persist state * fix lint	2020-06-12 21:03:17 -07:00
crhyatt	c41192fa1f	Upgrade pendulum to latest major version ~2.0 (#9184 )	2020-06-10 17:12:27 +02:00
Tomek Urbaszek	533b14341c	Add run_type to DagRun (#8227 ) * Add run_type to DagRun fixup! Add run_type to DagRun fixup! fixup! Add run_type to DagRun fixup! fixup! fixup! Add run_type to DagRun fixup! fixup! fixup! Add run_type to DagRun fixup! Add run_type to DagRun fixup! Add run_type to DagRun Adjust TriggerDagRunOperator fixup! Adjust TriggerDagRunOperator Add index Make run_type not nullable Add type check for run_type fixup! Add type check for run_type * fixup! Add run_type to DagRun * fixup! fixup! Add run_type to DagRun * Fix migration * fixup! Fix migration	2020-06-04 16:20:26 +02:00

1 2 3 4

179 Коммитов