Граф коммитов

7071 Коммитов

Автор SHA1 Сообщение Дата
Tobiasz Kędzierski 70bf307f38
Add How To Guide for Dataflow (#13461) 2021-01-21 11:41:36 +01:00
Kaxil Naik f7fe363255
Fix Deprecation for configuration.getsection (#13804) 2021-01-21 06:26:34 +01:00
Ashmeet Lamba 3e25795099
BaseBranchOperator will push to xcom by default. (#13704) (#13763)
This change will BaseBranchOperator to do xcom push of the branch it choose to follow.
It will also add support to use the do_xcom_push parameter.

The added change returns the result received by running choose_branch().

Closes: #13704
2021-01-21 01:16:32 +00:00
Griffin Cosgrove 3fd5ef3555
Add missing logos for integrations (#13717)
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2021-01-21 01:22:34 +01:00
Andrii Soldatenko 29730d7200
Add acl_policy to S3CopyObjectOperator (#13773)
closes https://github.com/apache/airflow/issues/13774

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2021-01-20 15:16:25 +00:00
Jennifer Melot 9923d606d2
Use DAG context manager in examples (#13297) 2021-01-20 13:16:12 +01:00
Kaxil Naik b4c8a0406e
Fix SQL syntax to check duplicate connections (#13783)
closes https://github.com/apache/airflow/issues/13679
2021-01-20 08:23:28 +01:00
André Amaral 1602ec97c8
Add a new argument for HttpSensor to accept a list of http status code to Continue Poking (#13499)
closes: #13451
2021-01-20 00:02:08 +00:00
drago-f5a 7a742cb033
Change log level from debug to info when spawning new gunicorn workers (#13780) 2021-01-19 23:38:55 +00:00
Brent Bovenzi d65cf77552
Add description to hint if conn_type is missing (#13778)
- add plaintext description to add/edit conn_type to make sure people remember to install necessary provider packages
2021-01-19 23:38:29 +00:00
drago-f5a 8a4bd3c73e
Fix webserver exiting when gunicorn master crashes (#13518)
* Correct the logic for webserver choosing number of workers to spawn (#13469)

A key consequence of this fix is that webserver will properly
exit when gunicorn master dies and stops responding to signals.
2021-01-19 22:23:40 +00:00
JavierLopezT c065d32189
AllowDiskUse parameter and docs in MongotoS3Operator (#12033)
Co-authored-by: RosterIn <48057736+RosterIn@users.noreply.github.com>
Co-authored-by: javier.lopez <javier.lopez@promocionesfarma.com>
2021-01-19 13:25:53 +01:00
QP Hou f1d4f54b34
Fix race conditions in task callback invocations (#10917)
This race condition resulted in task success and failure callbacks being
called more than once. Here is the order of events that could lead to
this issue:

* task started running within process 2
* (process 1) local_task_job checked for task return code, returns None
* (process 2) task exited with failure state, task state updated as failed in DB
* (process 2) task failure callback invoked through taskinstance.handle_failure method
* (process 1) local_task_job heartbeat noticed task state set to
  failure, mistoken it as state bing updated externally, also invoked task
  failure callback

To avoid this race condition, we need to make sure task callbacks are
only invoked within a single process.
2021-01-18 23:39:41 +00:00
Kaxil Naik 6410f07106
Add __repr__ for Executors (#13753)
Before:

```python
>>> from airflow.executors.local_executor import LocalExecutor
>>> LocalExecutor()
<airflow.executors.local_executor.LocalExecutor object at 0x7f49b47f8d68>
```

After:

```python
>>> from airflow.executors.local_executor import LocalExecutor
>>> LocalExecutor()
LocalExecutor(parallelism=32)
```
2021-01-18 22:10:18 +00:00
Ash Berlin-Taylor 31d31adb58
Setting `max_tis_per_query` to 0 now correctly removes the limit (#13512)
This config setting is documented as 0==unlimited, but in my HA
scheduler work I rewrote the code that used this and mistakenly didn't
keep this behaviour.

This re-introduces the correct behaviour and also adds a test so that it
is stays working in the future.

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2021-01-18 21:24:37 +00:00
Kengo Seki 85a3ce1a47
Fix S3ToSnowflakeOperator to support uploading all files in the specified stage (#12505)
* Fix S3ToSnowflakeOperator to support uploading all files in the specified stage

Currently, users have to specify each file to upload as
the "s3_keys" parameter when using S3ToSnowflakeOperator.
But the `COPY INTO` statement, which S3ToSnowflakeOperator
leverages internally, allows omitting this parameter
so that users can upload whole files in the specified stage.
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#syntax

This PR makes S3ToSnowflakeOperator's s3_keys parameter optional
so as to support this functionality.
2021-01-18 18:39:43 +01:00
Tomek Urbaszek 309788e5e2
Refactor DataprocOperators to support google-cloud-dataproc 2.0 (#13256) 2021-01-18 17:49:19 +01:00
Jarek Potiuk f74da5025d
Disables provider's manager warning for source-installed prod image. (#13729)
When production image is built for development purpose, by default
it installs all providers from sources, but not all dependencies
are installed for all providers. Many providers require more
dependencies and when you try to import those packages via
provider's manager, they fail to import and print warnings.

Those warnings are now turned into debug messages, in case
AIRFLOW_INSTALLATION_METHOD=".", which is set when
production image is built locally from sources. This is helpful
especially when you use locally build production image to
run K8S tests - otherwise the logs are flooded with
warnings.

This problem does not happe in CI, because there by default
production image is built from locally prepared packages
and it does not contain sources from providers that are not
installed via packages.
2021-01-18 09:00:32 +01:00
QP Hou 1ec63123c4
Fix backfill crash on task retry or reschedule (#13712)
When a retry happens, task key needs to be recorded with try number + 1
to avoid KeyError exception.
2021-01-17 19:18:36 -08:00
Sara Hamilton 7ec858c452
updated Google DV360 Hook to fix SDF issue (#13703)
Co-authored-by: Sara Hamilton <sarahamilton@google.com>
2021-01-17 13:47:35 +01:00
phucbui95 ab5fe56ac4
Fix bug in GCSToS3Operator (#13718) 2021-01-16 21:47:16 +00:00
JavierLopezT dbf751112f
Add connection arguments in S3ToSnowflakeOperator (#12564)
* Add connection arguments in S3ToSnowflakeOperator

* delete database

* add database

* indent

Co-authored-by: javier.lopez <javier.lopez@promocionesfarma.com>
2021-01-16 10:42:41 -08:00
Kaxil Naik 1ab19b40fd
Add Missing Email configs in Configuration doc (#13709)
closes https://github.com/apache/airflow/issues/13697
2021-01-16 01:11:35 +00:00
Kaxil Naik 32f59534cb
Stop creating duplicate Dag File Processors (#13662)
When a dag file is executed via Dag File Processors and multiple callbacks are
created either via zombies or executor events, the dag file is added to
the _file_path_queue and the manager will launch a new process to
process it, which it should not since the dag file is currently under
processing. This will bypass the _parallelism eventually especially when
it takes a long time to process some dag files and since self._processors
is just a dict with file path as the key. So multiple processors with the same key
count as one and hence parallelism is bypassed.

This address the same issue as https://github.com/apache/airflow/pull/11875
but instead does not exclude file paths that are recently processed and that
run at the limit (which is only used in tests) when Callbacks are sent by the
Agent. This is by design as the execution of Callbacks is critical. This is done
with a caveat to avoid duplicate processor -- i.e. if a processor exists,
the file path is removed from the queue. This means that the processor with
the file path to run callback will be still run when the file path is added again in the
next loop

Tests are added to check the same.

closes https://github.com/apache/airflow/issues/13047 
closes https://github.com/apache/airflow/pull/11875
2021-01-15 16:40:20 +00:00
Jun 614b70805a
Add verify_ssl config for kubernetes (#13516) 2021-01-15 15:59:55 +00:00
Jyoti Dhiman 3558538883
Support tables in DAG docs (#13533) 2021-01-15 12:13:22 +01:00
Kaxil Naik dc80fa4cbc
Bugfix: Return XCom Value in the XCom Endpoint API (#13684)
* Bugfix: Return XCom Value in the XCom Endpoint API

closes https://github.com/apache/airflow/issues/13676
2021-01-15 10:18:44 +00:00
Brent Bovenzi 2fef2ab1bf
Add JSON linter to DAG Trigger UI (#13551)
* Add JSON linter to Variable/DAG Trigger UIs

Adding codemirror and jshint to lint the text input for add/edit a Variable and for config when triggering a DAG.

variable_add whitespace

Remove JSON linter for add/edit Variables

Variable values can be either plain text or json which makes linting more complicated and not worth it for now.

* Add JSON linter to DAG Trigger UI

Adding codemirror and jshint to lint the text input for config when triggering a DAG.

variable_add whitespace

Add JSON linter to Variable/DAG Trigger UIs

Adding codemirror and jshint to lint the text input for add/edit a Variable and for config when triggering a DAG.

variable_add whitespace

Remove JSON linter for add/edit Variables

Variable values can be either plain text or json which makes linting more complicated and not worth it for now.

update trigger dag conf test

Fixed failing test by adding `id="json"` to the  expected html in the `test_trigger_dag_params_conf` test
2021-01-14 15:26:40 -05:00
Ryan Hamilton 87645b331a
Configurable API response (CORS) headers (#13620)
* Allow setting of API response (CORS) headers via config

* Fix RST syntax

* Register function to only API instead of all views in app

* Add missing/required property

* Update spelling dictionary
2021-01-14 15:17:43 -05:00
Kanthi 1d2977f6a4
Add Neo4j hook and operator (#13324)
Close: #12873
2021-01-14 16:27:50 +00:00
Kaxil Naik c128aa744e
BugFix: Dag-level Callback Requests were not run (#13651)
In https://github.com/apache/airflow/pull/13163 - I attempted to only run
Callback requests when they are defined on DAG. But I just found out
that while we were storing the task-level callbacks as string in Serialized
JSON, we were not storing DAG level callbacks and hence it default to None
when the Serialized DAG was deserialized which meant that the DAG callbacks
were not run.

This PR fixes it, we don't need to store DAG level callbacks as string, as
we don't display them in the Webserver and the actual contents are not used anywhere
in the Scheduler itself. Scheduler just checks if the callbacks are defined and sends
it to DagFileProcessorProcess to run with the actual DAG file. So instead of storing
the actual callback as string which would have resulted in larger JSON blob, I have
added properties to determine whether a callback is defined or not.

(`dag.has_on_success_callback` and `dag.has_on_failure_callback`)

Note: SLA callbacks don't have issue, as we currently check that SLAs are defined on
any tasks are not, if yes, we send it to DagFileProcessorProcess which then executes
the SLA callback defined on DAG.
2021-01-14 15:46:58 +00:00
Kaxil Naik e4b8ee63b0
Increase the default ``min_file_process_interval`` to decrease CPU Usage (#13664)
With the previous default of `0`, the CPU Usage mostly stays around 100.
As in Airflow 2.0.0, the scheduling decisions have been moved out from
DagFileProcessor to Scheduler, we can keep this number high.

closes https://github.com/apache/airflow/issues/13637
2021-01-14 13:08:12 +00:00
Kamil Breguła ef8617ec9d
Support google-cloud-tasks>=2.0.0 (#13347) 2021-01-14 12:18:49 +01:00
Kaxil Naik 61b1ea368d
Update outdated docs in scheduler_job.py (#13663)
As part of Airflow 2.0.0 and Scheduler HA, we updated the logic
of what happens in DagFileProcessor and SchedulerJob.

This PR updates the docstrings to match the code.
2021-01-14 10:48:48 +00:00
Kaxil Naik aef89478e4
Add missing Dag Tag for Example DAGs (#13665)
`example_dag_decorator` and `tutorial_taskflow_api_etl` were missing
`example` dag tag. All the other example DAGs had it.

This makes it consistent.
2021-01-14 10:48:14 +00:00
JavierLopezT 04d278f93f
Add S3ToFTPOperator (#11747)
Co-authored-by: javier.lopez <javier.lopez@promocionesfarma.com>
Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>
Co-authored-by: Tobiasz Kędzierski <tobiaszkedzierski@gmail.com>
2021-01-13 16:50:08 +01:00
Jun 475f1ab267
Fix invalid continue_token for cleanup list pods (#13563) 2021-01-13 14:52:01 +00:00
Griffin Cosgrove 548d082008
Update external docs URL for Segment (#13645) 2021-01-13 13:07:17 +01:00
Kamil Breguła 189af54043
Add system tests for Stackdriver operators (#13644) 2021-01-13 12:45:22 +01:00
Jarek Potiuk b007fc33d4
Fixes problems with extras for custom connection types (#13640)
The custom providers with custom connections can define
extra widgets and fields, however there were problems with
those custom fields in Aiflow 2.0.0:

* When connection type was a subset of another connection
  type (for example jdbc and jdbcx) widgets from the
  'subset' connection type appeared also in the 'superset'
  one due to prefix matching in javascript.

* Each connection when saved received 'full set' of extra
  fields from other connection types (with empty values).
  This problem is likely present in Airflow 1.10 but due
  to limited number of connections supported it had no
  real implications besides slightly bigger dictionary
  stored in 'extra' field.

* The extra field values were not saved for custom connections.
  Only the predefined connection types could save extras in
  extras field.

This PR fixes it by:

* adding __ matching for javascript to match only full connection
  types not prefixes
* saving only the fields matching extra__<conn_type> when the
  connection is saved
* removing filtering on 'known' connection types (the above
  filtering on `extra__` results in empty extra for
  connections that do not have any extra field defined.

Fixes #13597
2021-01-13 00:32:49 +01:00
Kaxil Naik c4112e2e9d
Make the tooltip to Pause / Unpause a DAG clearer (#13642)
closes https://github.com/apache/airflow/issues/13624
2021-01-12 19:14:31 +00:00
Kaxil Naik 8ecdef3e50
Audit Log records View should not contain link if dag_id is None (#13619)
closes https://github.com/apache/airflow/issues/13602
2021-01-12 10:16:01 +00:00
Kaxil Naik 6c458f29c0
Change the default celery worker_concurrency to 16 (#13612)
This change was unintentional -- https://github.com/apache/airflow/pull/7205

That PR just changed it to work with breeze. Since we had `16` as default in 1.10.x
and to get better performance and keep in line with `dag_concurrency` and
`max_active_runs_per_dag` -- I think `16` makes more sense.
2021-01-11 23:40:58 +00:00
Jarek Potiuk ad2a030b9e
Introduces separate runtime provider schema (#13488)
The provider.yaml contains more information that required at
runtime (specifically about documentation building). Those
fields are not needed at runtime and their presence is optional.
Also the runtime check for provider information should be more
relexed and allow for future compatibility (with
additional properties set to false). This way we can add new,
optional fields to provider.yaml without worrying about breaking
future-compatibility of providers with future airflow versions.

This changei restores 'additionalProperties': false in the
main, development-focused provider.yaml schema and introduced
new runtime schema that is used to verify the provider info when
providers are discovered by airflow.

This 'runtime' version should change very rarely as change to
add a new required property in it breaks compatibility of
providers with already released versions of Airflow.

We also trim-down the provider.yaml file when preparing provider
packages to only contain those fields that are required in the
runtime schema.
2021-01-11 23:10:44 +01:00
baxievski 8d42d9ed69
add xcom push for ECSOperator (#12096)
This pushes the last cloudwatch event to xcom when do_xcom_push is True

Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>
2021-01-11 10:07:10 +01:00
Kamil Breguła a6f999b62e
Support google-cloud-automl >=2.1.0 (#13505) 2021-01-11 09:39:44 +01:00
Kamil Breguła 947dbb73bb
Support google-cloud-datacatalog>=3.0.0 (#13534) 2021-01-11 09:39:19 +01:00
Ryan Hamilton 87a7557f8b
Display message and docs link when no plugins are loaded (#13599) 2021-01-10 12:55:07 -05:00
Kamil Breguła 5954ef5f41
Warn about precedence of env var when getting variables (#13501) 2021-01-10 10:35:09 +01:00
Xiaodong DENG 4f740db57a
Minor grammar fix in OpenAPI YAML (#13586) 2021-01-09 09:47:49 +00:00