incubator-airflow

Граф коммитов

Автор	SHA1	Сообщение	Дата
Daniel Imberman	f3bb4c31b8	[AIRFLOW-6175] Fixes bug when tasks get stuck in "scheduled" state (#6732 ) There is a bug caused by scheduler_jobs refactor which leads to task failure and scheduler locking. Essentially when a there is an overflow of tasks going into the scheduler, the tasks are set back to scheduled, but are not removed from the executor's queued_tasks queue. This means that the executor will attempt to run tasks that are in the scheduled state, but those tasks will fail dependency checks. Eventually the queue is filled with scheduled tasks, and the scheduler can no longer run. Co-Authored-By: Kaxil Naik <kaxilnaik@gmail.com>, Kevin Yang <kevin.yang@airbnb.com>	2019-12-10 11:17:30 +00:00
Kamil Breguła	9412f59073	[AIRFLOW-6081] Refactor test_connection_command.py (#6676 )	2019-12-09 10:13:10 +01:00
Kamil Breguła	63d23ed1da	[AIRFLOW-6196] Use new syntax for NamedTuple (#6751 )	2019-12-09 09:48:48 +01:00
Tomek	e61025e1ac	[AIRFLOW-6058] Running tests with pytest (#6472 ) This commit runs Airflow's test suite using pytest.	2019-12-05 10:40:28 +01:00
Jarek Potiuk	2f2f89c148	[AIRFLOW-6139] Consistent spaces in pylint enable/disable (#6701 )	2019-12-01 12:26:10 +01:00
Tomek	27ce7bdc43	[AIRFLOW-6060] Improve conf_vars context manager (#6658 ) This commit adds try / finally clause to conf_vars context manager to assure that initail values are reseted in case of an exception in yield.	2019-11-25 15:16:04 +01:00
Tomek	5c2c0f003d	[AIRFLOW-5923] Use absolute paths in GCP system tests (#6571 )	2019-11-17 17:52:23 +01:00
Kamil Breguła	c481d705a5	[AIRFLOW-5895] Move HDFS stuff from tests/core.py (#6544 )	2019-11-13 00:18:35 +01:00
Jarek Potiuk	f2473b320f	[AIRFLOW-5885] List of tests is generated dynamically after you enter Breeze (#6536 ) The list of tests for autocomplete is now generated automatically when you enter Breeze. It will take some 40 seconds or so to generate the list and until it's done there are no autocompletions but they appear right after the list is ready.	2019-11-12 16:20:07 +01:00
Tomek	ffe7ba9819	[AIRFLOW-5631] Change way of running GCP system tests (#6299 ) * [AIRFLOW-5631] Change way of running GCP system tests This commit proposes a new way of running GCP related system tests. It uses SystemTests base class and authentication is provided by a context manager thus it's easier to understand what's going on.	2019-10-28 14:26:37 +01:00
Kamil Breguła	4440d5e56d	[AIRFLOW-5764][depends on #6434 ] Avoid loading corrupted DAGs in a breeze environment (#6436 )	2019-10-26 22:54:00 +02:00
Michael R. Crusoe	adfcf67d65	[AIRFLOW-5746] move FakeDateTime into the only place it is used (#6416 ) Co-authored-by: Jarek Potiuk <jarek.potiuk@polidea.com>	2019-10-24 19:49:47 +02:00
Kamil Breguła	4903c9730c	[AIRFLOW-5702] Fix common docstring issues (#6372 )	2019-10-20 07:28:29 +02:00
Tomek	965c902cdb	[AIRFLOW-5580] Add base class for system test (#6229 ) * [AIRFLOW-5580] Add base class for system test This commit proposes base class for running system test in Airflow. The main concepts is to create an example DAG and run it for test purpose. This is especially important in case of integration with third party services.	2019-10-07 14:57:22 +02:00
Kevin Yang	d719e1fd67	[AIRFLOW-5362] Reorder imports (#5944 )	2019-10-02 16:30:03 +01:00
Ash Berlin-Taylor	5f9ab7a1d5	[AIRFLOW-774] Fix long-broken DAG parsing Statsd metrics (#6157 ) Since we switched to using sub-processes to parse the DAG files sometime back in 2016(!) the metrics we have been emitting about dag bag size and parsing have been incorrect. We have also been emitting metrics from the webserver which is going to be become wrong as we move towards a stateless webserver. To fix both of these issues I have stopped emitting the metrics from models.DagBag and only emit them from inside the DagFileProcessorManager. (There was also a bug in the `dag.loading-duration.*` we were emitting from the DagBag code where the "dag_file" part of that metric was empty. I have fixed that even though I have now deprecated that metric. The webserver was emitting the right metric though so many people wouldn't notice)	2019-09-24 10:23:51 +01:00
Jarek Potiuk	286aa7a581	[AIRFLOW-3611] Simplified development environment (#4932 )	2019-08-27 14:39:36 -04:00
Jarek Potiuk	e090744787	[AIRFLOW-5206] Common licence in all .md files, TOC + removed TODO.md (#5809 )	2019-08-21 23:27:54 -04:00
Qingping Hou	e6a20acb13	[AIRFLOW-5140] fix all missing type annotation errors from dmypy (#5664 )	2019-08-16 18:49:32 +01:00
Jarek Potiuk	2d086d77f1	[AIRFLOW-4117] Travis CI uses multi-stage images to run tests (#4938 )	2019-07-17 22:42:43 +02:00
Joshua Carp	496d7c9695	[AIRFLOW-4865] Add context manager to set temporary config values in tests. (#5569 )	2019-07-16 10:24:12 +01:00
Chao-Han Tsai	2c99ec624b	[AIRFLOW-4591] Make default_pool a real pool (#5349 ) `non_pooled_task_slot_count` and `non_pooled_backfill_task_slot_count` are removed in favor of a real pool, e.g. `default_pool`. By default tasks are running in `default_pool`. `default_pool` is initialized with 128 slots and user can change the number of slots through UI/CLI. `default_pool` cannot be removed.	2019-06-20 10:16:50 -07:00
Chao-Han Tsai	577f893672	[AIRFLOW-4215] Replace mock with unittest.mock (#5292 )	2019-05-17 15:40:03 +02:00
Bas Harenslak	e76020c0a0	[AIRFLOW-4197] Remove Python2 CI jobs (#5022 ) (#5021 )	2019-05-06 09:31:50 +02:00
Kevin Yang	d63b6c9a5d	[AIRFLOW-4419] Refine concurrency check in scheduler (#5194 )	2019-04-29 13:51:54 -07:00
BasPH	6970b23396	[AIRFLOW-4259] Move models out of models.py (#5056 )	2019-04-09 14:08:34 +02:00
Peter van 't Hof	6618dcb39b	[AIRFLOW-4016] Clear runs for BackfillJobTest (#4839 ) * Clear runs for BackfillJobTest * Fixing import * Fixing flake8	2019-03-05 22:28:59 +01:00
Ash Berlin-Taylor	b9fc03ea1a	[AIRFLOW-2779] Add license headers to doc files (#4178 ) This adds ASF license headers to all the .rst and .md files with the exception of the Pull Request template (as that is included verbatim when opening a Pull Request on Github which would be messy)	2018-11-13 15:01:44 +01:00
Fokko Driesprong	79f8ee1415	[AIRFLOW-2918] Fix Flake8 violations (#3931 )	2018-09-21 15:25:54 +01:00
bolkedebruin	6fd4e6055e	[AIRFLOW-2859] Implement own UtcDateTime (#3708 ) The different UtcDateTime implementations all have issues. Either they replace tzinfo directly without converting or they do not convert to UTC at all. We also ensure all mysql connections are in UTC in order to keep sanity, as mysql will ignore the timezone of a field when inserting/updating.	2018-08-08 08:07:08 +02:00
Bolke de Bruin	c7a472ed6b	[AIRFLOW-2287] Fix incorrect ASF headers Closes #3219 from bolkedebruin/fix_header	2018-04-14 09:13:23 +02:00
Edgar Rodriguez	9d9727a80a	[AIRFLOW-1893][AIRFLOW-1901] Propagate PYTHONPATH when using impersonation When using impersonation via `run_as_user`, the PYTHONPATH environment variable is not propagated hence there may be issues when depending on specific custom packages used in DAGs. This PR propagates only the PYTHONPATH in the process creating the sub-process with impersonation, if any. Tested in staging environment; impersonation tests in airflow are not very portable and fixing them would take additional work, leaving as TODO and tracking with jira ticket: https://issues.apache.o rg/jira/browse/AIRFLOW-1901. Closes #2860 from edgarRd/erod- pythonpath_run_as_user	2017-12-11 16:47:49 -08:00
Fokko Driesprong	eb2f589099	[AIRFLOW-1604] Rename logger to log In all the popular languages the variable name log is the de facto standard for the logging. Rename LoggingMixin.py to logging_mixin.py to comply with the Python standard. When using the .logger a deprecation warning will be emitted. Closes #2604 from Fokko/AIRFLOW-1604-logger-to-log	2017-09-19 10:17:14 +02:00
Dan Davydov	f360414774	[AIRFLOW-149] Task Dependency Engine + Why Isn't My Task Running View Here is the original PR with Max's LGTM: https://github.com/aoen/incubator-airflow/pull/1 Since then I have made some fixes but this PR is essentially the same. It could definitely use more eyes as there are likely still issues. Goals - Simplify, consolidate, and make consistent the logic of whether or not a task should be run - Provide a view/better logging that gives insight into why a task instance is not currently running (no more viewing the scheduler logs to find out why a task instance isn't running for the majority of cases): ![image](https://cloud.githubusercontent.com/assets/1592778/17637621/aa669f5e-6099-11e6-81c2-d988d2073aac.png) Notable Functional Changes - Webserver view + task_failing_deps CLI command to explain why a given task instance isn't being run by the scheduler - Running a backfill in the command line and running a task in the UI will now display detailed error messages based on which dependencies were not met for a task instead of appearing to succeed but actually failing silently - Maximum task concurrency and pools are not respected by backfills - Backfill now has the equivalent of the old force flag to run even for successful tasks This will break one use case: Using pools to restrict some resource on airflow executors themselves (rather than an external resource like a DB), e.g. some task uses 60% of cpu on a worker so we restrict that task's pool size to 1 to prevent two of the tasks from running on the same host. When backfilling a task of this type, now the backfill will wait on the pool to have slots open up before running the task even though we don't need to do this if backfilling on a different host outside of the pool. I think breaking this use case is OK since the use case is a hack due to not having a proper resource isolation solution (e.g. mesos should be used in this case instead). - To make things less confusing for users, there is now a "ignore all dependencies" option for running tasks, "ignore dependencies" has been renamed to "ignore task dependencies", and "force" has been renamed to "ignore task instance state". The new "Ignore all dependencies" flag will ignore the following: - task instance's pool being full - execution date for a task instance being in the future - a task instance being in the retry waiting period - the task instance's task ending prior to the task instance's execution date - task instance is already queued - task instance has already completed - task instance is in the shutdown state - WILL NOT IGNORE task instance is already running - SLA miss emails will now include all tasks that did not finish for a particular DAG run, even if the tasks didn't run because depends_on_past was not met for a task - Tasks with pools won't get queued automatically the first time they reach a worker; if they are ready to run they will be run immediately - Running a task via the UI or via the command line (backfill/run commands) will now log why a task could not get run if one if it's dependencies isn't met. For tasks kicked off via the web UI this means that tasks don't silently fail to get queued despite a successful message in the UI. - Queuing a task into a pool that doesn't exist will now get stopped in the scheduler instead of a worker Follow Up Items - Update the docs to reference the new explainer views/CLI command Closes #1729 from aoen/ddavydov/blockedTIExplainerRebasedMaster	2016-08-26 15:07:44 -07:00

34 Коммитов