incubator-airflow

История

Ash Berlin-Taylor ee90807ace Massively speed up the query returned by TI.filter_for_tis (#11147 ) The previous query generated SQL like this: ``` WHERE (task_id = ? AND dag_id = ? AND execution_date = ?) OR (task_id = ? AND dag_id = ? AND execution_date = ?) ``` Which is fine for one or maybe even 100 TIs, but when testing DAGs at extreme size (over 21k tasks!) this query was taking for ever (162s on Postgres, 172s on MySQL 5.7) By changing this query to this ``` WHERE task_id IN (?,?) AND dag_id = ? AND execution_date = ? ``` the time is reduced to 1s! (1.03s on Postgres, 1.19s on MySQL) Even on 100 tis the reduction is large, but the overall time is not significant (0.01451s -> 0.00626s on Postgres). Times included SQLA query construction time (but not time for calling filter_for_tis. So a like-for-like comparison), not just DB query time: ```python ipdb> start_filter_20k = time.monotonic(); result_filter_20k = session.query(TI).filter(tis_filter).all(); end_filter_20k = time.monotonic() ipdb> end_filter_20k - start_filter_20k 172.30647455298458 ipdb> in_filter = TI.dag_id == self.dag_id, TI.execution_date == self.execution_date, TI.task_id.in_([o.task_id for o in old_states.keys()]); ipdb> start_20k_custom = time.monotonic(); result_custom_20k = session.query(TI).filter(in_filter).all(); end_20k_custom = time.monotonic() ipdb> end_20k_custom - start_20k_custom 1.1882996069907676 ``` I have also removed the check that was ensuring everything was of the same type (all TaskInstance or all TaskInstanceKey) as it felt needless - both types have the three required fields, so the "duck-typing" approach at runtime (crash if doesn't have the required property)+mypy checks felt Good Enough.		2020-09-25 20:49:11 +01:00
..
__init__.py	[AIRFLOW-3964][AIP-17] Consolidate and de-dup sensor tasks using Smart Sensor (#5499 )	2020-09-08 22:47:59 +01:00
base.py	PyDocStyle: No whitespaces allowed surrounding docstring text (#10533 )	2020-08-25 09:50:21 +01:00
baseoperator.py	Add template fields renderers for better UI rendering (#11061 )	2020-09-23 15:31:40 +02:00
connection.py	Add D204 pydocstyle check (#11031 )	2020-09-21 11:45:06 +01:00
crypto.py	Add D204 pydocstyle check (#11031 )	2020-09-21 11:45:06 +01:00
dag.py	Do not silently allow the use of undefined variables in jinja2 templates (#11016 )	2020-09-25 09:15:28 +02:00
dagbag.py	Add D202 pydocstyle check (#11032 )	2020-09-22 16:17:24 +01:00
dagcode.py	Add D204 pydocstyle check (#11031 )	2020-09-21 11:45:06 +01:00
dagpickle.py	[AIRFLOW-6714] Remove magic comments about UTF-8 (#7338 )	2020-02-02 22:18:19 +01:00
dagrun.py	Fix incorrect Usage of Optional[bool] (#11138 )	2020-09-24 23:18:19 +01:00
errors.py	Add D204 pydocstyle check (#11031 )	2020-09-21 11:45:06 +01:00
kubernetes.py	Add D204 pydocstyle check (#11031 )	2020-09-21 11:45:06 +01:00
log.py	[AIRFLOW-6946] Switch to MySQL 5.7 in 2.0 as base (#7570 )	2020-03-14 22:24:03 +01:00
pool.py	Add D204 pydocstyle check (#11031 )	2020-09-21 11:45:06 +01:00
renderedtifields.py	PyDocStyle: No whitespaces allowed surrounding docstring text (#10533 )	2020-08-25 09:50:21 +01:00
sensorinstance.py	[AIRFLOW-3964][AIP-17] Consolidate and de-dup sensor tasks using Smart Sensor (#5499 )	2020-09-08 22:47:59 +01:00
serialized_dag.py	Add D204 pydocstyle check (#11031 )	2020-09-21 11:45:06 +01:00
skipmixin.py	SkipMixin: Add missing session.commit() and test (#10421 )	2020-09-22 21:08:12 +01:00
slamiss.py	Add D204 pydocstyle check (#11031 )	2020-09-21 11:45:06 +01:00
taskfail.py	[AIRFLOW-6946] Switch to MySQL 5.7 in 2.0 as base (#7570 )	2020-03-14 22:24:03 +01:00
taskinstance.py	Massively speed up the query returned by TI.filter_for_tis (#11147 )	2020-09-25 20:49:11 +01:00
taskmixin.py	[AIP-34] TaskGroup: A UI task grouping concept as an alternative to SubDagOperator (#10153 )	2020-09-19 01:51:37 +01:00
taskreschedule.py	Query TaskReschedule only if task is UP_FOR_RESCHEDULE (#9087 )	2020-06-09 14:17:13 +02:00
variable.py	Add D202 pydocstyle check (#11032 )	2020-09-22 16:17:24 +01:00
xcom.py	Add D204 pydocstyle check (#11031 )	2020-09-21 11:45:06 +01:00
xcom_arg.py	[AIP-34] TaskGroup: A UI task grouping concept as an alternative to SubDagOperator (#10153 )	2020-09-19 01:51:37 +01:00