Граф коммитов

147 Коммитов

Автор SHA1 Сообщение Дата
mislo 3cfe4a1c9d [AIRFLOW-5632] Rename ComputeEngine operators (#6306) 2019-10-22 15:20:24 +02:00
Kamil Breguła 0261ed755b
[AIRFLOW-5617] Add fallback for connection's project id in MLEngine hook (#6286)
* [AIRFLOW-5617] Add fallback for connection's project id in MLEngine hook

* fixup! [AIRFLOW-5617] Add fallback for connection's project id in MLEngine hook

* fixup! fixup! [AIRFLOW-5617] Add fallback for connection's project id in MLEngine hook

* fixup! fixup! fixup! [AIRFLOW-5617] Add fallback for connection's project id in MLEngine hook
2019-10-10 08:16:09 +02:00
Kamil Breguła 31db280fef [AIRFLOW-5614] Enable Fernet by default (#6282) 2019-10-09 13:18:51 +01:00
Dan MacTough debd164995 [AIRFLOW-XXX] Add message about breaking change in DAG#get_task_instances in 1.10.4 (#6226) 2019-10-04 11:09:33 +01:00
Ry Walker 426d3a9b5e [AIRFLOW-XXX] Make it clear that 1.10.5 wasn't accidentally omitted from UPDATING.md (#6240) 2019-10-03 08:11:14 +02:00
Tobiasz Kędzierski 265a1d5f14 [AIRFLOW-5477] Rewrite Google PubSub Hook to Google Cloud Python (#6096) 2019-10-02 16:16:30 +01:00
Marek Šuppa 781d001863 [AIRFLOW-5280] conn: Remove aws_default's default region name (#5879)
The `aws_default` by default specifies the `region_name` to be
`us-east-1` in its `extra` field. This causes trouble when the desired
AWS account uses a different region as this default value has priority
over the $AWS_REGION and $AWS_DEFAULT_REGION environment variables,
gets passed directly to `botocore` and does not seem to be documented.

This commit removes the default region name from the `aws_default`'s
extra field. This means that it will have to be set manually, which
would follow the "explicit is better than implicit" philosophy.
2019-09-30 09:52:09 +01:00
Kamil Breguła fd8de3e48e [AIRFLOW-5555] Remove Hipchat integration (#6184) 2019-09-26 10:45:14 +08:00
Ash Berlin-Taylor 5f9ab7a1d5
[AIRFLOW-774] Fix long-broken DAG parsing Statsd metrics (#6157)
Since we switched to using sub-processes to parse the DAG files sometime
back in 2016(!) the metrics we have been emitting about dag bag size and
parsing have been incorrect.

We have also been emitting metrics from the webserver which is going to
be become wrong as we move towards a stateless webserver.

To fix both of these issues I have stopped emitting the metrics from
models.DagBag and only emit them from inside the
DagFileProcessorManager.

(There was also a bug in the `dag.loading-duration.*` we were emitting
from the DagBag code where the "dag_file" part of that metric was empty.
I have fixed that even though I have now deprecated that metric. The
webserver was emitting the right metric though so many people wouldn't
notice)
2019-09-24 10:23:51 +01:00
Kamil Breguła 86b4caac9a
[AIRFLOW-5434] Use hook to provide credentials in GKEPodOperator (#6050) 2019-09-21 22:06:19 +02:00
TobKed 7e4330cce0 [AIRFLOW-5475] Normalize gcp_conn_id in operators and hooks (#6093)
Co-Authored-By: Kamil Breguła <kamil.bregula@polidea.com>
2019-09-20 11:08:50 +02:00
Andrei deec7548c2 [AIRFLOW-5147] extended character set for for k8s worker pods annotations (#5819)
* [AIRFLOW-5147] extended character set for for k8s worker pods annotations

* updated UPDATING.md with new breaking changes

* excluded pylint too-many-statement check from constructor due to its nature
2019-09-18 09:49:00 -07:00
Tomek 3a6f79b181 [AIRFLOW-XXX] Add note about moving GCP from contrib to core (#6119) 2019-09-16 20:59:30 +02:00
Fokko Driesprong dd175fa8db [AIRFLOW-5390] Remove provide context (#5990) 2019-09-10 15:17:03 +02:00
Tobias Kaymak 4ab6982f0b [AIRFLOW-5072] gcs_hook should download once (#5685)
When a user supplied a filename the expected behaviour is that airflow
downloads the file and does not return it's content as a string.
2019-09-05 17:54:07 +02:00
Zacharya c9e2d04fde [AIRFLOW-4085] FileSensor now takes glob patterns for `filepath` (#5358) 2019-09-04 11:02:28 +01:00
Jarek Potiuk 286aa7a581
[AIRFLOW-3611] Simplified development environment (#4932) 2019-08-27 14:39:36 -04:00
Tao Feng 45176c8d76
[AIRFLOW-5274] dag loading duration metric name too long (#5890) 2019-08-26 13:29:09 -07:00
Jarek Potiuk e090744787
[AIRFLOW-5206] Common licence in all .md files, TOC + removed TODO.md (#5809) 2019-08-21 23:27:54 -04:00
Felix Uellendall 6d27ced85a [AIRFLOW-5056] Add argument to filter mails in ImapHook and related operators (#5672)
- changes the order of arguments for `has_mail_attachment`, `retrieve_mail_attachments` and `download_mail_attachments`
- add `get_conn` function
- refactor code
- fix pylint issues
- add imap_mail_filter arg to ImapAttachmentToS3Operator
- add mail_filter arg to ImapAttachmentSensor
- remove superfluous tests
- changes the order of arguments in the sensors + operators __init__
2019-08-16 18:31:29 +01:00
Chao-Han Tsai 0be39219cd [AIRFLOW-4509] SubDagOperator using scheduler instead of backfill (#5498)
Change SubDagOperator to use Airflow scheduler to schedule
tasks in subdags instead of backfill.

In the past, SubDagOperator relies on backfill scheduler
to schedule tasks in the subdags. Tasks in parent DAG
are scheduled via Airflow scheduler while tasks in
a subdag are scheduled via backfill, which complicates
the scheduling logic and adds difficulties to maintain
the two scheduling code path.

This PR simplifies how tasks in subdags are scheduled.
SubDagOperator is reponsible for creating a DagRun for subdag
and wait until all the tasks in the subdag finish. Airflow
scheduler picks up the DagRun created by SubDagOperator,
create andschedule the tasks accordingly.
2019-08-07 21:17:50 +02:00
Bas Harenslak c2227fcf41 [AIRFLOW-4192] Remove end_date and latest_date from task context (#5725) 2019-08-07 16:46:21 +02:00
Kamil Breguła b229f78845 [AIRFLOW-5128] Move provide_gcp_credential_file decorator to GoogleCloudBaseHook (#5741) 2019-08-07 07:39:06 +02:00
Ash Berlin-Taylor 098b78d4b5 [AIRFLOW-XXX] Update changelog and updating for 1.10.4 (#5739) 2019-08-06 21:54:12 +01:00
Felix Uellendall fc99998212 [AIRFLOW-5057] Provide bucket name to functions in S3 Hook when none is specified (#5674)
Note: The order of arguments has changed for `check_for_prefix`.
The `bucket_name` is now optional. It falls back to the `connection schema` attribute.
- refactor code
- complete docs
2019-07-30 10:05:02 +02:00
Joshua Carp 30defe130d [AIRFLOW-3998] Use nested commands in cli. (#4821) 2019-07-19 08:40:14 +01:00
Xiaodong dd08ae3469 [AIRFLOW-3502] Update config template to reflect supporting different Celery pool implementation (#5477)
The support to different Celery pool implementation has been added
in https://github.com/apache/airflow/pull/4308.

But it's not reflected in the default_airflow.cfg yet, while it's
the main portal of config options to most users.
2019-06-26 12:08:33 +01:00
Kaxil Naik f520d02cc1 [AIRFLOW-XXX] Update Mailing List link for removing Mesos Executor (#5476) 2019-06-25 18:08:29 +08:00
Tomek 7d904467d6 [AIRFLOW-4782] Make GCP hooks Pylint compatible (#5431) 2019-06-24 17:44:13 +02:00
Chao-Han Tsai 2c99ec624b [AIRFLOW-4591] Make default_pool a real pool (#5349)
`non_pooled_task_slot_count` and `non_pooled_backfill_task_slot_count`
are removed in favor of a real pool, e.g. `default_pool`.

By default tasks are running in `default_pool`.
`default_pool` is initialized with 128 slots and user can change the
number of slots through UI/CLI. `default_pool` cannot be removed.
2019-06-20 10:16:50 -07:00
Tomek 62ebc7d61a [AIRFLOW-4784] Make GCP operators Pylint compatible (#5432) 2019-06-19 21:25:14 +02:00
Joshua Carp 929b8fd187 [AIRFLOW-4423] Improve date handling in mysql to gcs operator. (#5196)
* Handle TIME columns
* Ensure DATETIME and TIMESTAMP columns treated as UTC
2019-06-18 11:18:31 +02:00
Gordon Ball 201e67100c [AIRFLOW-4731] Fix GCS hook with google-storage-client 1.16 (#5368)
google-storage-client 1.16 introduced a breaking change where the
signature of client.get_bucket changed from (bucket_name) to
(bucket_or_name). Calls with named arguments to this method now fail.
This commit makes all calls positional to work around this.
2019-06-10 11:44:31 +01:00
Andrii Soldatenko 0da976a0e1 [AIRFLOW-3370] Add stdout output options to Elasticsearch task log handler (#5048)
When using potentially larger offets than javascript can handle, they can get parsed incorrectly on the client, resulting in the offset query getting stuck on a certain number. This patch ensures that we return a string to the client to avoid being parsed. When we run the query, we ensure the offset is set as an integer.

Add unnecesary prefix_ in config for elastic search section
2019-06-04 22:50:26 +01:00
Pier-Luc Caron St-Pierre 386ece44fc [AIRFLOW-XXX] Clarify documentation related to autodetect parameter in GCS_to_BQ Op (#5294) 2019-05-24 22:36:47 +01:00
Martijn van de Grift 3f276fd4fb [AIRFLOW-4471] Dataproc operator templated fields improvements (#5250) 2019-05-09 10:56:08 +02:00
Kaxil Naik 85899b3aee
[AIRFLOW-4334] Remove deprecated GCS features & Rename built-in params (#5087) 2019-04-18 15:38:41 +01:00
Kaxil Naik e26e340e7c [AIRFLOW-4313] Remove the Mesos executor (#5115)
* [AIRFLOW-4313] Remove the Mesos executor

* Update UPDATING.md
2019-04-17 18:28:58 +08:00
Fokko Driesprong c63ddccf8d [AIRFLOW-3934] Increase standard Dataproc PD size (#4749) 2019-04-15 19:02:47 +01:00
Kaxil Naik c6efd01264 [AIRFLOW-4255] Make GCS Hook Backwards compatible (#5089)
* [AIRFLOW-4255] Make GCS Hook Backwards compatible

* Update UPDATING.md

* Add option to stop warnings

* Update test_gcs_hook.py

* Add tests
2019-04-14 21:22:06 +02:00
Felix Uellendall 3eb2f547ac [AIRFLOW-3993] Add tests for salesforce hook (#4829)
- refactor code
- update docs
- change sign_in to get_conn
- add salesforce to devel_all packages
- add note to UPDATING.md

Co-Authored-By: mik-laj <mik-laj@users.noreply.github.com>
2019-04-14 21:07:43 +02:00
OmerJog 8ed8346c18 [AIRFLOW-2421] HTTPHook verifies HTTPS certificats by default (#4855)
Change the default value of verify from False to True
2019-04-11 13:33:33 +01:00
Kaxil Naik ec7c67ff95
[AIRFLOW-4255] Replace Discovery based api with client based for GCS (#5054) 2019-04-09 19:46:00 +01:00
Ash Berlin-Taylor e8cd3e23e0 [AIRFLOW-XXX] CHANGELOG and UPDATING for 1.10.3 2019-04-06 10:04:23 +01:00
Felix Uellendall b93f2649ae [AIRFLOW-4220] Change CloudantHook to a new major version and add tests (#5023)
- upgrade cloudant version from `>=0.5.9,<2.0` to `>=2.0`
- remove the use of the `schema` attribute in the connection
- remove `db` function since the database object can also be retrieved by calling `cloudant_session['database_name']`
- update docs
- refactor code
2019-04-05 23:12:13 +01:00
Kaxil Naik e732006fdd revert [AIRFLOW-4122] Remove chain function
Reverts 2 commits:
- ee71a8bb10
- 430efc9afb
2019-04-05 23:08:26 +01:00
Felix Uellendall 55aca52d1b [AIRFLOW-4014] Change DatastoreHook and add tests (#4842)
- update default used version for connecting to the Admin API from v1beta1 to v1
- move the establishment of the connection to the function calls instead of the hook init
- change get_conn signature to be able to pass an is_admin arg to set an admin connection
- rename GoogleCloudBaseHook._authorize function to GoogleCloudBaseHook.authorize
- rename the `partialKeys` argument of function `allocate_ids` to `partial_keys`.
- add tests
- update docs
- refactor code

Move version attribute from get_conn to __init__

- revert renaming of authorize function
- improve docs
- refactor code
2019-03-31 20:56:13 +02:00
saurabh gulati 06d2f53a32 [AIRFLOW-4172] Fix changes for driver class path option in Spark Subm… (#4992)
* [AIRFLOW-4172] Fix changes for driver class path option in Spark Submit Operator

* [AIRFLOW-4172] Fix changes for driver class path option in Spark Submit
2019-03-31 18:55:46 +02:00
Jiajie Zhong ffe1412d5e [AIRFLOW-4062] Improve docs on install extra package commands (#4966)
Some command for installing extra packages like
`pip install apache-airflow[devel]` cause error
in special situation/shell, We should clear them
by add quotation like
`pip install 'apache-airflow[devel]'`
2019-03-25 12:14:43 +00:00
Ash Berlin-Taylor 1c43cde65c
[AIRFLOW-3743] Unify different methods of working out AIRFLOW_HOME (#4705)
There were a few ways of getting the AIRFLOW_HOME directory used
throughout the code base, giving possibly conflicting answer if they
weren't kept in sync:

- the AIRFLOW_HOME environment variable
- core/airflow_home from the config
- settings.AIRFLOW_HOME
- configuration.AIRFLOW_HOME

Since the home directory is used to compute the default path of the
config file to load, specifying the home directory Again in the config
file didn't make any sense to me, and I have deprecated that.

This commit makes everything in the code base use
`settings.AIRFLOW_HOME` as the source of truth, and deprecates the
core/airflow_home config option.

There was an import cycle form settings -> logging_config ->
module_loading -> settings that needed to be broken on Python 2 - so I
have moved all adjusting of sys.path in to the settings module

(This issue caused me a problem where the RBAC UI wouldn't work as it
didn't find the right webserver_config.py)
2019-03-25 11:10:28 +00:00