Граф коммитов

132 Коммитов

Автор SHA1 Сообщение Дата
Zacharya c9e2d04fde [AIRFLOW-4085] FileSensor now takes glob patterns for `filepath` (#5358) 2019-09-04 11:02:28 +01:00
Jarek Potiuk 286aa7a581
[AIRFLOW-3611] Simplified development environment (#4932) 2019-08-27 14:39:36 -04:00
Tao Feng 45176c8d76
[AIRFLOW-5274] dag loading duration metric name too long (#5890) 2019-08-26 13:29:09 -07:00
Jarek Potiuk e090744787
[AIRFLOW-5206] Common licence in all .md files, TOC + removed TODO.md (#5809) 2019-08-21 23:27:54 -04:00
Felix Uellendall 6d27ced85a [AIRFLOW-5056] Add argument to filter mails in ImapHook and related operators (#5672)
- changes the order of arguments for `has_mail_attachment`, `retrieve_mail_attachments` and `download_mail_attachments`
- add `get_conn` function
- refactor code
- fix pylint issues
- add imap_mail_filter arg to ImapAttachmentToS3Operator
- add mail_filter arg to ImapAttachmentSensor
- remove superfluous tests
- changes the order of arguments in the sensors + operators __init__
2019-08-16 18:31:29 +01:00
Chao-Han Tsai 0be39219cd [AIRFLOW-4509] SubDagOperator using scheduler instead of backfill (#5498)
Change SubDagOperator to use Airflow scheduler to schedule
tasks in subdags instead of backfill.

In the past, SubDagOperator relies on backfill scheduler
to schedule tasks in the subdags. Tasks in parent DAG
are scheduled via Airflow scheduler while tasks in
a subdag are scheduled via backfill, which complicates
the scheduling logic and adds difficulties to maintain
the two scheduling code path.

This PR simplifies how tasks in subdags are scheduled.
SubDagOperator is reponsible for creating a DagRun for subdag
and wait until all the tasks in the subdag finish. Airflow
scheduler picks up the DagRun created by SubDagOperator,
create andschedule the tasks accordingly.
2019-08-07 21:17:50 +02:00
Bas Harenslak c2227fcf41 [AIRFLOW-4192] Remove end_date and latest_date from task context (#5725) 2019-08-07 16:46:21 +02:00
Kamil Breguła b229f78845 [AIRFLOW-5128] Move provide_gcp_credential_file decorator to GoogleCloudBaseHook (#5741) 2019-08-07 07:39:06 +02:00
Ash Berlin-Taylor 098b78d4b5 [AIRFLOW-XXX] Update changelog and updating for 1.10.4 (#5739) 2019-08-06 21:54:12 +01:00
Felix Uellendall fc99998212 [AIRFLOW-5057] Provide bucket name to functions in S3 Hook when none is specified (#5674)
Note: The order of arguments has changed for `check_for_prefix`.
The `bucket_name` is now optional. It falls back to the `connection schema` attribute.
- refactor code
- complete docs
2019-07-30 10:05:02 +02:00
Joshua Carp 30defe130d [AIRFLOW-3998] Use nested commands in cli. (#4821) 2019-07-19 08:40:14 +01:00
Xiaodong dd08ae3469 [AIRFLOW-3502] Update config template to reflect supporting different Celery pool implementation (#5477)
The support to different Celery pool implementation has been added
in https://github.com/apache/airflow/pull/4308.

But it's not reflected in the default_airflow.cfg yet, while it's
the main portal of config options to most users.
2019-06-26 12:08:33 +01:00
Kaxil Naik f520d02cc1 [AIRFLOW-XXX] Update Mailing List link for removing Mesos Executor (#5476) 2019-06-25 18:08:29 +08:00
Tomek 7d904467d6 [AIRFLOW-4782] Make GCP hooks Pylint compatible (#5431) 2019-06-24 17:44:13 +02:00
Chao-Han Tsai 2c99ec624b [AIRFLOW-4591] Make default_pool a real pool (#5349)
`non_pooled_task_slot_count` and `non_pooled_backfill_task_slot_count`
are removed in favor of a real pool, e.g. `default_pool`.

By default tasks are running in `default_pool`.
`default_pool` is initialized with 128 slots and user can change the
number of slots through UI/CLI. `default_pool` cannot be removed.
2019-06-20 10:16:50 -07:00
Tomek 62ebc7d61a [AIRFLOW-4784] Make GCP operators Pylint compatible (#5432) 2019-06-19 21:25:14 +02:00
Joshua Carp 929b8fd187 [AIRFLOW-4423] Improve date handling in mysql to gcs operator. (#5196)
* Handle TIME columns
* Ensure DATETIME and TIMESTAMP columns treated as UTC
2019-06-18 11:18:31 +02:00
Gordon Ball 201e67100c [AIRFLOW-4731] Fix GCS hook with google-storage-client 1.16 (#5368)
google-storage-client 1.16 introduced a breaking change where the
signature of client.get_bucket changed from (bucket_name) to
(bucket_or_name). Calls with named arguments to this method now fail.
This commit makes all calls positional to work around this.
2019-06-10 11:44:31 +01:00
Andrii Soldatenko 0da976a0e1 [AIRFLOW-3370] Add stdout output options to Elasticsearch task log handler (#5048)
When using potentially larger offets than javascript can handle, they can get parsed incorrectly on the client, resulting in the offset query getting stuck on a certain number. This patch ensures that we return a string to the client to avoid being parsed. When we run the query, we ensure the offset is set as an integer.

Add unnecesary prefix_ in config for elastic search section
2019-06-04 22:50:26 +01:00
Pier-Luc Caron St-Pierre 386ece44fc [AIRFLOW-XXX] Clarify documentation related to autodetect parameter in GCS_to_BQ Op (#5294) 2019-05-24 22:36:47 +01:00
Martijn van de Grift 3f276fd4fb [AIRFLOW-4471] Dataproc operator templated fields improvements (#5250) 2019-05-09 10:56:08 +02:00
Kaxil Naik 85899b3aee
[AIRFLOW-4334] Remove deprecated GCS features & Rename built-in params (#5087) 2019-04-18 15:38:41 +01:00
Kaxil Naik e26e340e7c [AIRFLOW-4313] Remove the Mesos executor (#5115)
* [AIRFLOW-4313] Remove the Mesos executor

* Update UPDATING.md
2019-04-17 18:28:58 +08:00
Fokko Driesprong c63ddccf8d [AIRFLOW-3934] Increase standard Dataproc PD size (#4749) 2019-04-15 19:02:47 +01:00
Kaxil Naik c6efd01264 [AIRFLOW-4255] Make GCS Hook Backwards compatible (#5089)
* [AIRFLOW-4255] Make GCS Hook Backwards compatible

* Update UPDATING.md

* Add option to stop warnings

* Update test_gcs_hook.py

* Add tests
2019-04-14 21:22:06 +02:00
Felix Uellendall 3eb2f547ac [AIRFLOW-3993] Add tests for salesforce hook (#4829)
- refactor code
- update docs
- change sign_in to get_conn
- add salesforce to devel_all packages
- add note to UPDATING.md

Co-Authored-By: mik-laj <mik-laj@users.noreply.github.com>
2019-04-14 21:07:43 +02:00
OmerJog 8ed8346c18 [AIRFLOW-2421] HTTPHook verifies HTTPS certificats by default (#4855)
Change the default value of verify from False to True
2019-04-11 13:33:33 +01:00
Kaxil Naik ec7c67ff95
[AIRFLOW-4255] Replace Discovery based api with client based for GCS (#5054) 2019-04-09 19:46:00 +01:00
Ash Berlin-Taylor e8cd3e23e0 [AIRFLOW-XXX] CHANGELOG and UPDATING for 1.10.3 2019-04-06 10:04:23 +01:00
Felix Uellendall b93f2649ae [AIRFLOW-4220] Change CloudantHook to a new major version and add tests (#5023)
- upgrade cloudant version from `>=0.5.9,<2.0` to `>=2.0`
- remove the use of the `schema` attribute in the connection
- remove `db` function since the database object can also be retrieved by calling `cloudant_session['database_name']`
- update docs
- refactor code
2019-04-05 23:12:13 +01:00
Kaxil Naik e732006fdd revert [AIRFLOW-4122] Remove chain function
Reverts 2 commits:
- ee71a8bb10
- 430efc9afb
2019-04-05 23:08:26 +01:00
Felix Uellendall 55aca52d1b [AIRFLOW-4014] Change DatastoreHook and add tests (#4842)
- update default used version for connecting to the Admin API from v1beta1 to v1
- move the establishment of the connection to the function calls instead of the hook init
- change get_conn signature to be able to pass an is_admin arg to set an admin connection
- rename GoogleCloudBaseHook._authorize function to GoogleCloudBaseHook.authorize
- rename the `partialKeys` argument of function `allocate_ids` to `partial_keys`.
- add tests
- update docs
- refactor code

Move version attribute from get_conn to __init__

- revert renaming of authorize function
- improve docs
- refactor code
2019-03-31 20:56:13 +02:00
saurabh gulati 06d2f53a32 [AIRFLOW-4172] Fix changes for driver class path option in Spark Subm… (#4992)
* [AIRFLOW-4172] Fix changes for driver class path option in Spark Submit Operator

* [AIRFLOW-4172] Fix changes for driver class path option in Spark Submit
2019-03-31 18:55:46 +02:00
Jiajie Zhong ffe1412d5e [AIRFLOW-4062] Improve docs on install extra package commands (#4966)
Some command for installing extra packages like
`pip install apache-airflow[devel]` cause error
in special situation/shell, We should clear them
by add quotation like
`pip install 'apache-airflow[devel]'`
2019-03-25 12:14:43 +00:00
Ash Berlin-Taylor 1c43cde65c
[AIRFLOW-3743] Unify different methods of working out AIRFLOW_HOME (#4705)
There were a few ways of getting the AIRFLOW_HOME directory used
throughout the code base, giving possibly conflicting answer if they
weren't kept in sync:

- the AIRFLOW_HOME environment variable
- core/airflow_home from the config
- settings.AIRFLOW_HOME
- configuration.AIRFLOW_HOME

Since the home directory is used to compute the default path of the
config file to load, specifying the home directory Again in the config
file didn't make any sense to me, and I have deprecated that.

This commit makes everything in the code base use
`settings.AIRFLOW_HOME` as the source of truth, and deprecates the
core/airflow_home config option.

There was an import cycle form settings -> logging_config ->
module_loading -> settings that needed to be broken on Python 2 - so I
have moved all adjusting of sys.path in to the settings module

(This issue caused me a problem where the RBAC UI wouldn't work as it
didn't find the right webserver_config.py)
2019-03-25 11:10:28 +00:00
Ryan Yuan e92f09b565 [AIRFLOW-3987] Unify GCP's Connection IDs (#4818) 2019-03-25 11:03:26 +00:00
Ash Berlin-Taylor c159e8e391
Revert "[AIRFLOW-4062] Improve docs on install extra package commands (#4897)" (#4965)
This reverts commit d4655c506e as it causes doc test warnings/failures.
2019-03-24 12:05:23 +00:00
Jiajie Zhong d4655c506e [AIRFLOW-4062] Improve docs on install extra package commands (#4897)
Some command for installing extra packages are
`pip install apache-airflow[devel]` we should
clear install extra package command to
`pip install 'apache-airflow[devel]'`

[ci skip]
2019-03-24 11:34:59 +00:00
Kamil Breguła 4c6a591a90 [AIRFLOW-3659] Create Google Cloud Transfer Service Operators (#4792)
Co-authored-by: Antoni Smolinski <antoni.smolinski@polidea.com>
2019-03-23 22:52:26 +00:00
Tao Feng 430efc9afb [AIRFLOW-XXX] Note removal, not deprecation of chain in UPDATING.md (#4953) 2019-03-21 16:27:29 +00:00
zhongjiajie ee71a8bb10 [AIRFLOW-4122] Remove chain function (#4940)
* [AIRFLOW-4122] Remove chain function

Bit operation like `>>` or `<<` are suggested
to set dependency, which visual and easier to
explain. and have multiple ways is confusion

* change UPDATING.md as recommend
[ci skip]
2019-03-19 19:27:39 -07:00
Kristian Yrjölä 781a82f638 [AIRFLOW-3997] Extend Variable.get so it can return None when var not found (#4819)
This will not change existing regular functions in the `Variable` class. If
variable `foo` doesn't exist:

```
foo = Variable.get("foo")
-> KeyError
```

For passing `default_var=None` to get, `None` is returned instead:
```
foo = Variable.get("foo", default_var=None)
if foo is None:
    handle_missing_foo()
```
2019-03-13 16:14:28 +00:00
Tao Feng dda309e662
[AIRFLOW-4020] Remove viewer DAG edit permissions (#4845) 2019-03-05 15:21:26 -08:00
Ash Berlin-Taylor 0230055190 [AIRFLOW-3353] Upgrade Redis client (#4834)
Now that Celery/Kombu have updated and work with RedisPy 3.x (they in
fact force us to use 3.2) we should re-introduce this change.
2019-03-04 19:04:14 +01:00
Xiaodong 6abcdfd496 [AIRFLOW-3793] Decommission configuration items for Flask-Admin web UI & related codes (#4637) 2019-03-04 15:13:29 +00:00
Kamil Breguła a84fb73627 [AIRFLOW-3867] Rename GCP's subpackage (#4690) 2019-02-27 15:21:57 +01:00
Joshua Carp 4e88726a8d [AIRFLOW-3932] Optionally skip dag discovery heuristic. (#4746) 2019-02-23 08:55:34 -08:00
Ryan Yuan 5c170f0594 [AIRFLOW-3933] Fix various typos (#4747)
Fix typos
2019-02-21 11:50:05 +01:00
marengaz 067a1e3f4a [AIRFLOW-3249] Make all take the same named `do_xcom_push` flag (#4345) 2019-02-15 15:25:41 +00:00
Felix 9c11f41d31 [AIRFLOW-XXX] Fix headlines in UPDATING.md (#4697) 2019-02-12 16:01:20 +01:00