Граф коммитов

9012 Коммитов

Автор SHA1 Сообщение Дата
QP Hou dd57ec9e26
Fix task and dag stats on home page (#8865)
d.dag_id is not a valid attribute. in order to use dag_id variable
in a closure callback, it needs to be passed in as a fuction so the
right value can be captured for each for loop.
2020-05-19 10:25:39 +01:00
crazy-2020 841d816647
Allow setting the pooling time in DLPHook (#8824)
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
2020-05-19 04:55:41 +02:00
Jarek Potiuk 2121f494c3
Avoid failure on transient requirements in CI image (#8892)
When you build from the scratch and some transient requirements
fail, the initial step of installation might fail.

We are now using latest valid constraints from the DEFAULT_BRANCH
branch to avoid it.
2020-05-17 22:41:48 +02:00
Jarek Potiuk 12c5e5d8ae
Prepare release candidate for backport packages (#8891)
After preparing the 2020.5.19 release candidate and
reviewing the packages, some changes turned out to be necessary.

Therefore the date was changed to 2020.5.20 with the folowing
fixes:

* cncf.kubernetes.example_dags were hard-coded and added for all
  packagesa and they were removed
* Version suffix is only used to rename the binary packages not for
  the version itself
* Release process description is updated with the release process
* Package version is consistent - leading 0s are skipped in month
  and day
2020-05-17 20:38:46 +02:00
Pranjal Mittal ff342fc230
Added SalesforceHook missing method to return only dataframe (#8565) (#8644)
* add feature for skipping writing to file

* add SalesforceHook missing method to return dataframe only

function write_object_to_file is divided to object_to_df which returns df and then the write_object_to_file can uses object_to_df as the first step before exporting to file

* fixed exception message

* fix review comments - removed filename check for None
2020-05-17 17:09:04 +02:00
Daniel Imberman 8985df0bfc
Monitor pods by labels instead of names (#6377)
* Monitor k8sPodOperator pods by labels

To prevent situations where the scheduler starts a
second k8sPodOperator pod after a restart, we now check
for existing pods using kubernetes labels

* Update airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>

* Update airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>

* add docs

* Update airflow/kubernetes/pod_launcher.py

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>

Co-authored-by: Daniel Imberman <daniel@astronomer.io>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-05-16 14:13:58 -07:00
Daniel Huang a546a10b13
Add Snowflake system test (#8422) 2020-05-16 14:04:12 -07:00
Jonathan Stern 707bb0c725
[AIRFLOW-6535] Add AirflowFailException to fail without any retry (#7133)
* use preferred boolean check idiom

Co-Authored-By: Jarek Potiuk <jarek@potiuk.com>

* add test coverage for AirflowFailException

* add docs for some exception usage patterns

* autoformatting

* remove extraneous newline, poke travis build

* clean up TaskInstance.handle_failure

Try to reduce nesting and repetition of logic for different conditions.
Also try to tighten up the scope of the exception handling ... it looks
like the large block that catches an Exception and logs it as a failure
to send an email may have been swallowing some TypeErrors coming out
of trying to compose a log info message and calling strftime on
start_date and end_date when they're set to None; this is why I've added
lines in the test to set those values on the TaskInstance objects.

* let sphinx generate docs for exceptions module

* keep session kwarg last in handle_failure

* explain allowed_top_level

* add black-box tests for retry/fail immediately cases

* don't lose safety measures in logging date attrs

* fix flake8 too few blank lines

* grammar nitpick

* add import to AirflowFailException example

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2020-05-16 18:53:12 +01:00
S S Rohit f6d591747e
Updated docs for experimental API /dags/<DAG_ID>/dag_runs (#8800) 2020-05-16 19:13:04 +02:00
Jarek Potiuk f3521fb0e3
Regenerate readme files for backport package release (#8886) 2020-05-16 14:03:45 +02:00
Jarek Potiuk a3a3411838
Fix master failing on generating requirements (#8885)
By default github actions checks out only latest commit but in order to
see if there are any changes since the last readme generated
we need to see the whole history so we need to fetch it all.

We also skip generating the new README in case there is only one
commit in the history since the last release. The nature of readme
generation is that the commit with the README itself will never
be in the list of commits for the previous release so there is
always at least one commit more than the one listed in the readme.
2020-05-16 12:53:30 +02:00
Kaxil Naik 15273f0ea0
Check for same task instead of Equality to detect Duplicate Tasks (#8828) 2020-05-16 11:21:12 +01:00
Ash Berlin-Taylor f4edd90a94
Speed up TestAwsLambdaHook by not actually running a function (#8882)
Moto's mock_lambda _actually runs the code_ in a docker container. This
is useful if you are testing a Lambda function but is massively overkill
for testing that we make a request to a function -- Airflow doesn't care
what the function does.

This is our slowest individual test in CI right now, taking 20s on
Github Actions.
2020-05-16 10:41:17 +02:00
mschickensoup a3a4bac446
JIRA and Github issues explanation (#8539) 2020-05-16 10:38:31 +02:00
Ash Berlin-Taylor 82de6f74ae
Spend less time waiting for DagFileProcessor processes to complete (#8814)
In debugging another test I noticed that the scheduler was spending a
long time waiting for a "simple" dag to be parsed. But upon closer
inspection the parsing process itself was done in a few milliseconds,
but we just weren't harvesting the results in a timely fashion.

This change uses the `sentinel` attribute of multiprocessing.Connection
(added in Python 3.3) to be able to wait for all the processes, so that
as soon as one has finished we get woken up and can immediately harvest
and pass on the parsed dags.

This makes test_scheduler_job.py about twice as quick, and also reduces
the time the scheduler spends between tasks .

In real work loads, or where there are lots of dags this likely won't
equate to much such a huge speed up, but for our (synthetic) elastic
performance test dag.

These were the timings for the dag to run all the tasks in a single dag
run to completion., with PERF_SCHEDULE_INTERVAL='1d' PERF_DAGS_COUNT=1

I also have

PERF_SHAPE=linear PERF_TASKS_COUNT=12:

**Before**: 45.4166s

**After**: 16.9499s

PERF_SHAPE=linear PERF_TASKS_COUNT=24:

**Before**: 82.6426s

**After**: 34.0672s

PERF_SHAPE=binary_tree PERF_TASKS_COUNT=24:

**Before**: 20.3802s

**After**: 9.1400s

PERF_SHAPE=grid PERF_TASKS_COUNT=24:

**Before**: 27.4735s

**After**: 11.5607s

If you have many more dag **files**, this likely won't be your bottleneck.
2020-05-15 22:17:55 +01:00
Jarek Potiuk 92585ca4cb
Added automated release notes generation for backport operators (#8807)
We have now mechanism to keep release notes updated for the
backport operators in an automated way.

It really nicely generates all the necessary information:

* summary of requirements for each backport package
* list of dependencies (including extras to install them) when package
  depends on other providers packages
* table of new hooks/operators/sensors/protocols/secrets
* table of moved hooks/operators/sensors/protocols/secrets with
  information where they were moved from
* changelog of all the changes to the provider package (this will be
  automatically updated with incremental changelog whenever we decide to
  release separate packages.

The system is fully automated - we will be able to produce release notes
automatically (per-package) whenever we decide to release new version of
the package in the future.
2020-05-15 19:00:15 +02:00
Daniel Saiz f82ad452b0
Fix KubernetesPodOperator pod name length validation (#8829)
* Fix KubernetesPodOperator pod name length validation

* Add test, verify Exception is raised
2020-05-15 08:41:52 -07:00
Xinbin Huang 85bbab27db
Add EMR operators howto docs (#8863) 2020-05-15 10:53:48 +02:00
Ash Berlin-Taylor 35c523fa11
Fix list formatting of plugins doc. (#8873)
This was causing it to be picked up as a `<dl>/<dd>` containing a list,
instead of a paragraph and a list.

```
<dl class="simple">
  <dt>This will create a hook, and an operator accessible at:</dt>
  <dd>
    <ul class="simple">
      <li><p><code><span class="pre">airflow.hooks.my_namespace.MyHook</span></code></p></li>
      <li><p><code><span class="pre">airflow.operators.my_namespace.MyOperator</span></code></p></li>
    </ul>
  </dd>
</dl>
```
2020-05-15 10:14:51 +02:00
James Timmins 4813b94ec5
Create log file w/abs path so tests pass on MacOS (#8820)
* Set conf vals as env vars so spawned process can access values.

* Create custom env_vars context manager to control simple environment variables.

* Use env_vars instead of conf_vars when using .

* When creating temporary environment variables, remove them if they didn't exist.
2020-05-14 23:17:54 +01:00
Ash Berlin-Taylor fe4219112a
Don't use ProcessorAgent to test ProcessorManager (#8871)
Some of our tests (when I was looking at another change) were using the
ProcessorAgent to run and test the behaviour of our ProcessorManager in
certain cases. Having that extra process in the middle is not critical
for the tests, and makes it harder to debug the problem when if
something breaks.

To make this possible I have made a small refactor to the loop of
DagFileProcessorManager (to give us a method we can call in tests that
doesn't do `os.setsid`).
2020-05-14 16:49:12 +01:00
Tomek Urbaszek 961c710526
Make Custom XCom backend a subsection of XCom docs (#8869) 2020-05-14 16:29:04 +02:00
Kamil Breguła fc862a3edd
Do not create a separate process for one task in CeleryExecutor (#8855) 2020-05-14 06:01:13 +02:00
Xinbin Huang e61b9bb9bb
Add AWS EMR System tests (#8618)
- add create_emr_default_roles to amazon_system_helpers.py
2020-05-13 21:14:45 +02:00
Felix Uellendall 2878f17630
Relax Flask-Appbuilder version to ~=2.3.4 (#8857)
"Bump jQuery to 3.5" was reverted. And so we can upgrade and remove email_validator dependency
See also: https://github.com/dpgaspar/Flask-AppBuilder/blob/master/CHANGELOG.rst#improvements-and-bug-fixes-on-234
2020-05-13 19:42:51 +01:00
QP Hou 81fb9d64ad
Add metric for monitoring email notification failures (#8771) 2020-05-13 19:41:26 +01:00
Ash Berlin-Taylor c3af681edf
Convert tests/jobs/test_base_job.py to pytest (#8856)
I would like to (create) and use a pytest fixture as a parameter, but
they cannot be used on unittest.TestCase functions:

> unittest.TestCase methods cannot directly receive fixture arguments as
> implementing that is likely to inflict on the ability to run general
> unittest.TestCase test suites.
2020-05-13 14:00:46 +01:00
Jarek Potiuk f1dc2e0b0e
The librabbitmq library stopped installing for python3.7 (#8853)
When preparing backport relases I found that rabbitmq was not
included in the "devel_ci" extras. It turned out that librabbitmq was
not installing in python3.7 and the reason it turned out to be
that librabbitmq is not maintained for 2 years already and it
has been replaced by py-amqp library. The pythhon py-amqp
library has been improved using cython compilation, so it
became production ready and librabbitmq has been abandoned.

We are switching to the py-amqp library here and adding
rabbitmq back to "devel_ci" dependencies.

Details in: https://github.com/celery/librabbitmq/issues/153
2020-05-13 11:16:15 +02:00
Ash Berlin-Taylor ed3f5131a2
Correctly pass sleep time from AWSAthenaOperator down to the hook. (#8845)
Sleep time in AthenaHook was defined as a kwarg-only key.

This one change makes tests go from 270s to 0.3s :D
2020-05-13 09:37:37 +01:00
Kaxil Naik 8a94d18c04
Fix Environment Variable in perf/scheduler_dag_execution_timing.py (#8847) 2020-05-13 09:28:13 +01:00
Wai Yan e1e833bb26
Update GoogleBaseHook to not follow 308 and use 60s timeout (#8816) 2020-05-13 07:04:54 +02:00
Kaxil Naik 7d69987edd
Remove duplicate code from perf_kit (#8843) 2020-05-12 16:55:45 +01:00
Tomek Urbaszek 8b54919711
Refactor BigQuery hook methods to use python library (#8631)
* Refactor create_external_table

* Refactor patch_table and add update_table

* Refactor run_grant_dataset_view_access

* Refactor run_upsert_table

* Refactor insert_all

* Refactor delete_table

* Refactor list_rows

* Fix types

* Fix test

* Refactor poll_job_complete

* Refactor cancel_query

* Refactor run_with_configuration

* Refactor run_* methods

* Fix self.project_id issue

* Fixup run_table_upsert
2020-05-12 17:02:33 +02:00
abdulbasitds 7236862a1f
[AIRFLOW-2310] Enable AWS Glue Job Integration (#6007)
Co-authored-by: olalekanelesin <elesin.olalekan@gmail.com>
Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-05-12 15:54:39 +01:00
Sergio Kef 578fc514cd
[AIRFLOW-4543] Update slack operator to support slackclient v2 (#5519)
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Kaxil Naik <8811558+kaxil@users.noreply.github.com>
2020-05-12 15:15:18 +01:00
Jarek Potiuk 01db738ded
Azure storage 0.37.0 is not installable any more (#8833)
2020.05.12 release of the azure-storage is not installable any more
(it is deprecated). For now we should switch to latest working
version
2020-05-12 13:28:55 +02:00
Tomek Urbaszek 6911dfe837
Fix template fields in Google operators (#8840) 2020-05-12 12:48:49 +02:00
Kaxil Naik 4b06fde0f1
Fix Flake8 errors (#8841) 2020-05-12 11:07:29 +01:00
Tomek Urbaszek 1d12c347cb
Refactor BigQuery check operators (#8813)
* Refactor BigQuery check operators

This commit applies some code formatting to existing BigQuery
check operators. It also adds location parameter to
BigQueryIntervalCheckOperator and BigQueryValueCheckOperator.

* fixup! Refactor BigQuery check operators
2020-05-12 11:37:20 +02:00
James Timmins 7533378df9
Access function to be pickled as attribute, not method, to avoid error. (#8823)
* Access function to be pickled as attribute, not method, to avoid error.

* Access type attribute to allow pickling.

* Use getattr instead of type(self) to fix linting error.
2020-05-12 08:55:39 +01:00
Kallam Reddy 78a48db75b
Add support for non-default orientation in `dag show` command (#8834) 2020-05-12 05:02:26 +02:00
James Timmins 4375607410
Fix typo. 'zobmies' => 'zombies'. (#8832) 2020-05-12 04:34:16 +02:00
Kaxil Naik 3ad4f96bae
[AIRFLOW-1156] BugFix: Unpausing a DAG with catchup=False creates an extra DAG run (#8776) 2020-05-11 22:25:45 +01:00
James Timmins f410d64de5
Use fork when test relies on mock.patch in parent process. (#8794)
* Use 'fork' in test bc 'spawn' breaks mocks.

* Use fork when making process w test_scheduler_executor_overflow.
2020-05-11 21:42:38 +01:00
João Ponte d590e5e767
Add option to propagate tags in ECSOperator (#8811)
Co-authored-by: Joao Ponte <jpe@plista.com>
2020-05-11 19:48:13 +02:00
Jarek Potiuk 1fb9f0722a
Synchronize extras between airflow and providers (#8819) 2020-05-11 19:25:15 +02:00
Teddy Hartanto 2ec0130099
[AIRFLOW-4549] Allow skipped tasks to satisfy wait_for_downstream (#7735)
Previously, tasks that were in SUCCESS or SKIPPED state satisfy the
depends_on_past check, but only tasks that were in the SUCCESS state
satisfy the wait_for_downstream check. The inconsistency in behavior
made the API less intuitive to users.
2020-05-11 15:22:31 +01:00
Ash Berlin-Taylor 5ae76d8cc0
Option to set end_date for performance testing dag. (#8817)
I wanted an option to run a specific number of dag runs to completion,
so this feature lets me control the end_date of the dag without having
to know exactly what value it would have.

The "is string" check is more simplistic than it needs to be, but it's
Good Enough for now for an optional feature.
2020-05-11 14:42:16 +01:00
Mustafa Gök 0c3db84c3c
[AIRFLOW-7068] Create EC2 Hook, Operator and Sensor (#7731)
Co-Authored-By: Ash Berlin-Taylor <ash_github@firemirror.com>
Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>
2020-05-11 15:18:05 +02:00
Ash Berlin-Taylor a6434a5287
Fix bash command in performance test dag (#8812) 2020-05-11 11:21:07 +01:00