Граф коммитов

217 Коммитов

Автор SHA1 Сообщение Дата
Kaxil Naik 4e32546faf
Mask Password in Log table when using the CLI (#11468) 2020-10-12 19:27:01 +01:00
John Bampton b786327041
Fix spelling in CeleryExecutor (#11407) 2020-10-11 18:06:26 +02:00
Jarek Potiuk 5bc5994c2c
Split tests to more sub-types (#11402)
We seem to have a problem with running all tests at once - most
likely due to some resource problems in our CI, therefore it makes
sense to split the tests into more batches. This is not yet full
implementation of selective tests but it is going in this direction
by splitting to Core/Providers/API/CLI tests. The full selective
tests approach will be implemented as part of #10507 issue.

This split is possible thanks to #10422 which moved building image
to a separate workflow - this way each image is only built once
and it is uploaded to a shared registry, where it is quickly
downloaded from rather than built by all the jobs separately - this
way we can have many more jobs as there is very little per-job
overhead before the tests start runnning.
2020-10-11 07:40:31 -07:00
Ash Berlin-Taylor 73b9163a8f
Fully support running more than one scheduler concurrently (#10956)
* Fully support running more than one scheduler concurrently.

This PR implements scheduler HA as proposed in AIP-15. The high level
design is as follows:

- Move all scheduling decisions into SchedulerJob (requiring DAG
  serialization in the scheduler)
- Use row-level locks to ensure schedulers don't stomp on each other
  (`SELECT ... FOR UPDATE`)
- Use `SKIP LOCKED` for better performance when multiple schedulers are
  running. (Mysql < 8 and MariaDB don't support this)
- Scheduling decisions are not tied to the parsing speed, but can
  operate just on the database

*DagFileProcessorProcess*:

Previously this component was responsible for more than just parsing the
DAG files as it's name might imply. It also was responsible for creating
DagRuns, and also making scheduling decisions of TIs, sending them from
"None" to "scheduled" state.

This commit changes it so that the DagFileProcessorProcess now will
update the SerializedDAG row for this DAG, and make no scheduling
decisions itself.

To make the scheduler's job easier (so that it can make as many
decisions as possible without having to load the possibly-large
SerializedDAG row) we store/update some columns on the DagModel table:

- `next_dagrun`: The execution_date of the next dag run that should be created (or
  None)
- `next_dagrun_create_after`: The earliest point at which the next dag
  run can be created

Pre-computing these values (and updating them every time the DAG is
parsed) reduce the overall load on the DB as many decisions can be taken
by selecting just these two columns/the small DagModel row.

In case of max_active_runs, or `@once` these columns will be set to
null, meaning "don't create any dag runs"

*SchedulerJob*

The SchedulerJob used to only queue/send tasks to the executor after
they were parsed, and returned from the DagFileProcessorProcess.

This PR breaks the link between parsing and enqueuing of tasks, instead
of looking at DAGs as they are parsed, we now:

-  store a new datetime column, `last_scheduling_decision` on DagRun
  table, signifying when a scheduler last examined a DagRun
- Each time around the loop the scheduler will get (and lock) the next
  _n_ DagRuns via `DagRun.next_dagruns_to_examine`, prioritising DagRuns
  which haven't been touched by a scheduler in the longest period
- SimpleTaskInstance etc have been almost entirely removed now, as we
  use the serialized versions

* Move callbacks execution from Scheduler loop to DagProcessorProcess

* Don’t run verify_integrity if the Serialized DAG hasn’t changed

dag_run.verify_integrity is slow, and we don't want to call it every time, just when the dag structure changes (which we can know now thanks to DAG Serialization)

* Add escape hatch to disable newly added "SELECT ... FOR UPDATE" queries

We are worried that these extra uses of row-level locking will cause
problems on MySQL 5.x (most likely deadlocks) so we are providing users
an "escape hatch" to be able to make these queries non-locking -- this
means that only a singe scheduler should be run, but being able to run
one is better than having the scheduler crash.

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-10-09 22:44:27 +01:00
Tomek Urbaszek f697ff2381
Move test tools from tests.utils to tests.test_utils (#10889) 2020-10-03 14:27:06 +02:00
Daniel Imberman 9860719c72
[AIRFLOW-5545] Fixes recursion in DAG cycle testing (#6175)
* Fixes an issue where cycle detection uses recursion

and stack overflows after about 1000 tasks

(cherry picked from commit 63f1a180a17729aa937af642cfbf4ddfeccd1b9f)

* reduce test length

* slightly more efficient

* Update airflow/utils/dag_cycle_tester.py

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>

* slightly more efficient

* actually works this time

Co-authored-by: Daniel Imberman <daniel@astronomer.io>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-09-29 11:34:55 -07:00
yuqian90 49c193fb87
[AIP-34] TaskGroup: A UI task grouping concept as an alternative to SubDagOperator (#10153)
This commit introduces TaskGroup, which is a simple UI task grouping concept.

- TaskGroups can be collapsed/expanded in Graph View when clicked
- TaskGroups can be nested
- TaskGroups can be put upstream/downstream of tasks or other TaskGroups with >> and << operators
- Search box, hovering, focusing in Graph View treats TaskGroup properly. E.g. searching for tasks also highlights TaskGroup that contains matching task_id. When TaskGroup is expanded/collapsed, the affected TaskGroup is put in focus and moved to the centre of the graph.


What this commit does not do:

- This commit does not change or remove SubDagOperator. Although TaskGroup is intended as an alternative for SubDagOperator, deprecating SubDagOperator will need to be discussed/implemented in the future.
- This PR only implemented TaskGroup handling in the Graph View. In places such as Tree View, it will look like as-if 
- TaskGroup does not exist and all tasks are in the same flat DAG.

GitHub Issue: #8078
AIP: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
2020-09-19 01:51:37 +01:00
Ash Berlin-Taylor 59dad1a4ea
Allow CeleryExecutor to "adopt" an orphaned queued or running task (#10949)
This can happen when a task is enqueued by one executor, and then that
scheduler dies/exits.

The default fallback behaviour is unchanged -- that queued tasks are
cleared and then and then later rescheduled.

But for Celery, we can do better -- if we record the Celery-generated
task_id, we can then re-create the AsyncResult objects for orphaned
tasks at a later date.

However, since Celery just reports all AsyncResult as "PENDING", even if
they aren't tasks currently in the broker queue, we need to apply a
timeout to "unblock" these tasks in case they never actually made it to
the Celery broker.

This all means that we can adopt tasks that have been enqueued another
CeleryExecutor if it dies, without having to clear the task and slow
down. This is especially useful as the task may have already started
running, and while clearing it would stop it, it's better if we don't
have to reset it!

Co-authored-by: Kaxil Naik <kaxilnaik@apache.org>
2020-09-16 20:10:30 +01:00
Robert Grizzell 2aec99c228
Fix empty asctime field in JSON formatted logs (#10515) 2020-09-16 17:50:27 +01:00
Ping Zhang 96165185f1
Add CeleryKubernetesExecutor (#10901)
it consists of CeleryExecutor and KubernetesExecutor, which allows users
to route their tasks to either Kubernetes or Celery based on the queue
defined on a task
2020-09-15 09:42:55 +02:00
Ash Berlin-Taylor 1a95361122
Fix and unquarantine TestDagFileProcessorAgent.test_parse_once (#10862)
The SmartSensor PR introduces slightly different behaviour on
list_py_files happens when given a file path directly.

Prior to that PR, if given a file path it would not include examples.

After that PR was merged, it would return that path and the example dags
(assuming they were enabled.)
2020-09-10 17:04:14 +01:00
Jarek Potiuk ff72327614
Move parse_once to quarantine (#10857) 2020-09-10 13:20:23 +01:00
Yingbo Wang ac943c9e18
[AIRFLOW-3964][AIP-17] Consolidate and de-dup sensor tasks using Smart Sensor (#5499)
Co-authored-by: Yingbo Wang <yingbo.wang@airbnb.com>
2020-09-08 22:47:59 +01:00
Jarek Potiuk b746f33fc6
Removes stable tests from quarantine (#10768)
We've observed the tests for last couple of weeks and it seems
most of the tests marked with "quarantine" marker are succeeding
in a stable way (https://github.com/apache/airflow/issues/10118)
The removed tests have success ratio of > 95% (20 runs without
problems) and this has been verified a week ago as well,
so it seems they are rather stable.

There are literally few that are either failing or causing
the Quarantined builds to hang. I manually reviewed the
master tests that failed for last few weeks and added the
tests that are causing the build to hang.

Seems that stability has improved - which might be casued
by some temporary problems when we marked the quarantined builds
or too "generous" way of marking test as quarantined, or
maybe improvement comes from the #10368 as the docker engine
and machines used to run the builds in GitHub experience far
less load (image builds are executed in separate builds) so
it might be that resource usage is decreased. Another reason
might be Github Actions stability improvements.

Or simply those tests are more stable when run isolation.

We might still add failing tests back as soon we see them behave
in a flaky way.

The remaining quarantined tests that need to be fixed:
 * test_local_run (often hangs the build)
 * test_retry_handling_job
 * test_clear_multiple_external_task_marker
 * test_should_force_kill_process
 * test_change_state_for_tis_without_dagrun
 * test_cli_webserver_background

We also move some of those tests to "heisentests" category
Those testst run fine in isolation but fail
the builds when run with all other tests:
 * TestImpersonation tests

We might find that those heisentest can be fixed but for
now we are going to run them in isolation.

Also - since those quarantined tests are failing more often
the "num runs" to track for those has been decreased to 10
to keep track of 10 last runs only.
2020-09-08 07:36:12 +02:00
Kaxil Naik 9ac882e6cc
[AIRFLOW-5948] Replace SimpleDag with SerializedDag (#7694) 2020-09-03 16:52:27 +01:00
Tomek Urbaszek 913397c1c6
Make Cloud Build system tests setup runnable (#10692)
This change fixes error: open(quickstart.sh): Permission denied
that was rised during git add.
2020-09-03 13:20:10 +02:00
Kaxil Naik 725bf330ef
Revert Clean up DAG serializations based on last_updated (#7424) (#10613)
This PR reverts the behavior of https://github.com/apache/airflow/pull/7424
2020-08-27 20:56:41 +01:00
Jarek Potiuk 2f2d8dbfaf
Remove all "noinspection" comments native to IntelliJ (#10525)
We have already fixed a lot of problems that were marked
with those, also IntelluiJ gotten a bit smarter on not
detecting false positives as well as understand more
pylint annotation. Wherever the problem remained
we replaced it with # noqa comments - as it is
also well understood by IntelliJ.
2020-08-25 00:01:37 +02:00
Jarek Potiuk 82369fadde
Removed the prerequisite for perf-kit path augmentation (#10492) 2020-08-23 15:50:25 +02:00
Jarek Potiuk 7ee7d7cf3f
Move perf_kit to tests.utils (#10470)
Perf_kit was a separate folder and it was a problem when we tried to
build it from Docker-embedded sources, because there was a hidden,
implicit dependency between tests (conftest) and perf.

Perf_kit is now moved to tests to be avaiilable in the CI image
also when we run tests without the sources mounted.
This is changing back in #10441 and we need to move perf_kit
for it to work.
2020-08-22 21:53:07 +02:00
Kaxil Naik 44a36b9ab3
Use assertEqual instead of assertTrue in tests/utils/test_dates.py for proper diff (#10457)
assertEqual will show show the proper diff instead of just "False is not True" error
2020-08-22 10:43:26 +02:00
Ignacio Peluffo 27d08b76a2
Amazon SES Hook (#10391)
* Add Amazon SES hook

* Add SES Hook to operators-and-hooks documentation.

* Fix arguments for parent class constructor call (PR feedback)

* Fix indentation in operators-and-hooks documentation

* Fix mypy error for argument on call to parent class constructor

* Simplify logic on constructor (PR feedback)

* Add custom headers and other relevant options to hook

* Change pylint exception rule to apply it only to function instead of module (PR feedback)

* Fix spellcheck error

* Vendorize airflow.utils.emaail

* fixup! Vendorize airflow.utils.emaail

Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com>
2020-08-21 09:32:25 +02:00
Jarek Potiuk 19bc97d0ce
Revert "Add Amazon SES hook (#10004)" (#10276)
This reverts commit f06fe616e6.
2020-08-10 16:30:40 +02:00
Ignacio Peluffo f06fe616e6
Add Amazon SES hook (#10004)
- refactor airflow.utils.email and add typing
2020-08-10 11:58:55 +02:00
Leon Yuan 24c8e4c2d6
Changes to all the constructors to remove the args argument (#10163) 2020-08-06 13:42:51 +01:00
Kamil Breguła 9126f7061f
Deprecate experimental API (#9888) 2020-07-20 12:03:46 +02:00
Ephraim Anierobi a79e2d4c4a
Move provider's log task handlers to the provider package (#9604) 2020-07-06 09:05:40 +02:00
Mauricio De Diana 01044ff549
Fix use of GCP credentials in StackdriverTaskHandler (#9668) 2020-07-05 21:58:21 +02:00
Ephraim Anierobi ee20086b8c
Move S3TaskHandler to the AWS provider package (#9602) 2020-07-02 12:45:58 +02:00
Mauricio De Diana e50e94613a
Task logging handlers can provide custom log links (#9354)
Use a mixin to define log handlers based on remote services. The main
changes are:
 - Create RemoteLoggingMixin to define remote log handlers.
 - Remove explicit mentions to Elasticsearch in dag.html.
 - Rename the /elasticsearch endpoint in views.py to
   /redirect_to_remote_log and dispatch the remote URL building to the
   log handler.

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-07-02 11:15:53 +02:00
Kamil Breguła a97400d0d8
Move out sendgrid emailer from airflow.contrib (#9355) 2020-06-28 12:59:27 +02:00
Kaxil Naik 87fdbd0708
Use literal syntax instead of function calls to create data structure (#9516)
It is slower to call e.g. dict() than using the empty literal, because the name dict must be looked up in the global scope in case it has been rebound. Same for the other two types like list() and tuple().
2020-06-25 16:35:37 +01:00
Ignacio Peluffo d7de735e52
Move out weekday from airflow.contrib (#9388)
* Move out weekday from airflow.contrib

* Add changelog about weekday enum refactor into UPDATING.md
2020-06-22 10:14:13 +02:00
Ignacio Peluffo 2190e5036a
Move modules in `airflow.contrib.utils.log` to `airflow.utils.log` (#9395) 2020-06-21 22:08:06 +02:00
Ephraim Anierobi eb8683a725
Extract TaskLogReader from views.py (#9391)
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-06-19 16:38:42 +02:00
Kamil Breguła 5b680e27e8
Don't use connection to store task handler credentials (#9381) 2020-06-19 16:27:25 +02:00
Kaxil Naik 34d0c2d981
Fix Failing test for JSON Formatter on Python 3.8 (#9278) 2020-06-13 19:09:12 +01:00
crhyatt c41192fa1f
Upgrade pendulum to latest major version ~2.0 (#9184) 2020-06-10 17:12:27 +02:00
Ash Berlin-Taylor 6350fd6ebb
Don't use the term "whitelist" - language matters (#9174)
It's fairly common to say whitelisting and blacklisting to describe
desirable and undesirable things in cyber security. However just because
it is common doesn't mean it's right.

However, there's an issue with the terminology. It only makes sense if
you equate white with 'good, permitted, safe' and black with 'bad,
dangerous, forbidden'. There are some obvious problems with this.

You may not see why this matters. If you're not adversely affected by
racial stereotyping yourself, then please count yourself lucky. For some
of your friends and colleagues (and potential future colleagues), this
really is a change worth making.

From now on, we will use 'allow list' and 'deny list' in place of
'whitelist' and 'blacklist' wherever possible. Which, in fact, is
clearer and less ambiguous. So as well as being more inclusive of all,
this is a net benefit to our understandability.

(Words mostly borrowed from
<https://www.ncsc.gov.uk/blog-post/terminology-its-not-black-and-white>)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2020-06-08 10:01:46 +01:00
Ash Berlin-Taylor 82de6f74ae
Spend less time waiting for DagFileProcessor processes to complete (#8814)
In debugging another test I noticed that the scheduler was spending a
long time waiting for a "simple" dag to be parsed. But upon closer
inspection the parsing process itself was done in a few milliseconds,
but we just weren't harvesting the results in a timely fashion.

This change uses the `sentinel` attribute of multiprocessing.Connection
(added in Python 3.3) to be able to wait for all the processes, so that
as soon as one has finished we get woken up and can immediately harvest
and pass on the parsed dags.

This makes test_scheduler_job.py about twice as quick, and also reduces
the time the scheduler spends between tasks .

In real work loads, or where there are lots of dags this likely won't
equate to much such a huge speed up, but for our (synthetic) elastic
performance test dag.

These were the timings for the dag to run all the tasks in a single dag
run to completion., with PERF_SCHEDULE_INTERVAL='1d' PERF_DAGS_COUNT=1

I also have

PERF_SHAPE=linear PERF_TASKS_COUNT=12:

**Before**: 45.4166s

**After**: 16.9499s

PERF_SHAPE=linear PERF_TASKS_COUNT=24:

**Before**: 82.6426s

**After**: 34.0672s

PERF_SHAPE=binary_tree PERF_TASKS_COUNT=24:

**Before**: 20.3802s

**After**: 9.1400s

PERF_SHAPE=grid PERF_TASKS_COUNT=24:

**Before**: 27.4735s

**After**: 11.5607s

If you have many more dag **files**, this likely won't be your bottleneck.
2020-05-15 22:17:55 +01:00
Ash Berlin-Taylor fe4219112a
Don't use ProcessorAgent to test ProcessorManager (#8871)
Some of our tests (when I was looking at another change) were using the
ProcessorAgent to run and test the behaviour of our ProcessorManager in
certain cases. Having that extra process in the middle is not critical
for the tests, and makes it harder to debug the problem when if
something breaks.

To make this possible I have made a small refactor to the loop of
DagFileProcessorManager (to give us a method we can call in tests that
doesn't do `os.setsid`).
2020-05-14 16:49:12 +01:00
Kallam Reddy 78a48db75b
Add support for non-default orientation in `dag show` command (#8834) 2020-05-12 05:02:26 +02:00
James Timmins 4375607410
Fix typo. 'zobmies' => 'zombies'. (#8832) 2020-05-12 04:34:16 +02:00
Ramiro Charriol b59adaba36
Support cron presets in date_range function (#7777) 2020-05-11 09:57:32 +02:00
jhtimmins bd29ee3ad1
Ensure test_logging_config.test_reload_module works in spawn mode. (#8741)
Co-authored-by: James Timmins <james@astronomer.io>
2020-05-06 20:47:05 +01:00
jhtimmins 520aeedec8
Fix pickling failure when spawning processes (#8671)
* Pull processor_factory out of _execute and move to the class scope.

* Change the value of pickle_dags from True to False which is the default value of pickle_dags to be passed.

* Fix how to reference FakeDagFileProcessorRunner class due to the place where it's defined is changed.

* Fix configuration inheritance issue using  multiprocessing with spawn mode.
* Add a new CI entry for spawn-multiprocessing method.

* Add testcases for multiprocessing with spawn mode.

Co-authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Co-authored-by: James Timmins <james@astronomer.io>
2020-05-06 10:43:28 +01:00
Tomek Urbaszek caa60b1141
Remove config side effects from tests (#8607)
* Remove config side effects

* Fix LatestOnlyOperator return type to be json serializable

* Fix tests/test_configuration.py

* Fix tests/executors/test_dask_executor.py

* Fix tests/jobs/test_scheduler_job.py

* Fix tests/models/test_cleartasks.py

* Fix tests/models/test_taskinstance.py

* Fix tests/models/test_xcom.py

* Fix tests/security/test_kerberos.py

* Fix tests/test_configuration.py

* Fix tests/test_logging_config.py

* Fix tests/utils/test_dag_processing.py

* Apply isort

* Fix tests/utils/test_email.py

* Fix tests/utils/test_task_handler_with_custom_formatter.py

* Fix tests/www/api/experimental/test_kerberos_endpoints.py

* Fix tests/www/test_views.py

* Code refactor

* Fix tests/www/api/experimental/test_kerberos_endpoints.py

* Fix requirements

* fixup! Fix tests/www/test_views.py
2020-05-04 12:29:09 +02:00
QP Hou 379a884d64
fix: aws hook should work without conn id (#8534)
This patch makes behavior of hook consistent with documentation.

AWS hooks should support falling back to using default credential chain
lookup behavior when connection id is not specified.
add test for conn_id equals None
more elegant way to set role session name
2020-04-28 10:59:57 +08:00
MatthewRBruce 6450834d97
[AIRFLOW-6796] Clean up DAG serializations based on last_updated (#7424)
DAG serializations were previous deleted based on whether the
DagFileProcessorManager had processed a particular python file.  This
changes that to be based on the last time a DAG was processed by the
scheduler.

Also moves cleaning up of stale dags to the DagFileProcessorManager to
support long running schedulers
2020-04-27 18:51:31 +01:00
Kamil Breguła 5a864f0e45
User-friendly error messages when the configuration is incorrect (#8463)
* Clearer error messages when the configuration is incorrect
2020-04-27 09:37:07 +02:00