incubator-airflow/UPDATING.md

<!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
-->
# Updating Airflow

This file documents any backwards-incompatible changes in Airflow and
assists users migrating to a new version.

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
**Table of contents**

- [Airflow Master](#airflow-master)
- [Airflow 1.10.4](#airflow-1104)
- [Airflow 1.10.3](#airflow-1103)
- [Airflow 1.10.2](#airflow-1102)
- [Airflow 1.10.1](#airflow-1101)
- [Airflow 1.10](#airflow-110)
- [Airflow 1.9](#airflow-19)
- [Airflow 1.8.1](#airflow-181)
- [Airflow 1.8](#airflow-18)
- [Airflow 1.7.1.2](#airflow-1712)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->
## Airflow Master

### Changes to FileSensor
FileSensor is now takes a glob pattern, not just a filename. If the filename you are looking for has `*`, `?`, or `[` in it then you should replace these with `[*]`, `[?]`, and `[[]`.

### Change dag loading duration metric name
Change DAG file loading duration metric from 
`dag.loading-duration.<dag_id>` to `dag.loading-duration.<dag_file>`. This is to 
better handle the case when a DAG file has multiple DAGs.

### Changes to ImapHook, ImapAttachmentSensor and ImapAttachmentToS3Operator

ImapHook:
* The order of arguments has changed for `has_mail_attachment`, 
`retrieve_mail_attachments` and `download_mail_attachments`.
* A new `mail_filter` argument has been added to each of those.

ImapAttachmentSensor:
* The order of arguments has changed for `__init__`.
* A new `mail_filter` argument has been added to `__init__`. 

ImapAttachmentToS3Operator:
* The order of arguments has changed for `__init__`.
* A new `imap_mail_filter` argument has been added to `__init__`. 

### Changes to `SubDagOperator`

`SubDagOperator` is changed to use Airflow scheduler instead of backfill
to schedule tasks in the subdag. User no longer need to specify the executor
in `SubDagOperator`.

### Variables removed from the task instance context

The following variables were removed from the task instance context:
- end_date
- latest_date
- tables

### Moved provide_gcp_credential_file decorator to GoogleCloudBaseHook

To simplify the code, the decorator has been moved from the inner-class.

Instead of `@GoogleCloudBaseHook._Decorators.provide_gcp_credential_file`,
you should write `@GoogleCloudBaseHook.provide_gcp_credential_file`

### Changes to S3Hook

Note: The order of arguments has changed for `check_for_prefix`. 
The `bucket_name` is now optional. It falls back to the `connection schema` attribute.

### Changes to Google Transfer Operator
To obtain pylint compatibility the `filter ` argument in `GcpTransferServiceOperationsListOperator` 
has been renamed to `request_filter`.

### Changes in  Google Cloud Transfer Hook
 To obtain pylint compatibility the `filter` argument in `GCPTransferServiceHook.list_transfer_job` and 
 `GCPTransferServiceHook.list_transfer_operations` has been renamed to `request_filter`.

### Export MySQL timestamps as UTC

`MySqlToGoogleCloudStorageOperator` now exports TIMESTAMP columns as UTC
by default, rather than using the default timezone of the MySQL server.
This is the correct behavior for use with BigQuery, since BigQuery
assumes that TIMESTAMP columns without time zones are in UTC. To
preserve the previous behavior, set `ensure_utc` to `False.`

### CLI reorganization

The Airflow CLI has been organized so that related commands are grouped
together as subcommands. The `airflow list_dags` command is now `airflow
dags list`, `airflow pause` is `airflow dags pause`, etc. For a complete
list of updated CLI commands, see https://airflow.apache.org/cli.html.

### Removal of Mesos Executor

The Mesos Executor is removed from the code base as it was not widely used and not maintained. [Mailing List Discussion on deleting it](https://lists.apache.org/thread.html/daa9500026b820c6aaadeffd66166eae558282778091ebbc68819fb7@%3Cdev.airflow.apache.org%3E).

### Increase standard Dataproc disk sizes

It is highly recommended to have 1TB+ disk size for Dataproc to have sufficient throughput:
https://cloud.google.com/compute/docs/disks/performance

Hence, the default value for `master_disk_size` in DataprocClusterCreateOperator has beeen changes from 500GB to 1TB.

### Changes to SalesforceHook

* renamed `sign_in` function to `get_conn` 

### HTTPHook verify default value changed from False to True.

The HTTPHook is now secured by default: `verify=True`.
This can be overwriten by using the extra_options param as `{'verify': False}`.

### Changes to GoogleCloudStorageHook

* The following parameters have been replaced in all the methods in GCSHook:
  * `bucket` is changed to `bucket_name`
  * `object` is changed to `object_name` 
  
* The `maxResults` parameter in `GoogleCloudStorageHook.list` has been renamed to `max_results` for consistency.

### Changes to CloudantHook

* upgraded cloudant version from `>=0.5.9,<2.0` to `>=2.0`
* removed the use of the `schema` attribute in the connection
* removed `db` function since the database object can also be retrieved by calling `cloudant_session['database_name']`

For example:
```python
from airflow.contrib.hooks.cloudant_hook import CloudantHook

with CloudantHook().get_conn() as cloudant_session:
    database = cloudant_session['database_name']
```

See the [docs](https://python-cloudant.readthedocs.io/en/latest/) for more information on how to use the new cloudant version.

### Unify default conn_id for Google Cloud Platform

Previously not all hooks and operators related to Google Cloud Platform use
``google_cloud_default`` as a default conn_id. There is currently one default
variant. Values like ``google_cloud_storage_default``, ``bigquery_default``,
``google_cloud_datastore_default`` have been deprecated. The configuration of
existing relevant connections in the database have been preserved. To use those
deprecated GCP conn_id, you need to explicitly pass their conn_id into
operators/hooks. Otherwise, ``google_cloud_default`` will be used as GCP's conn_id
by default.

### Viewer won't have edit permissions on DAG view.

### New `dag_discovery_safe_mode` config option

If `dag_discovery_safe_mode` is enabled, only check files for DAGs if
they contain the strings "airflow" and "DAG". For backwards
compatibility, this option is enabled by default.

### Removed deprecated import mechanism

The deprecated import mechanism has been removed so the import of modules becomes more consistent and explicit.

For example: `from airflow.operators import BashOperator`
becomes `from airflow.operators.bash_operator import BashOperator`

### Changes to sensor imports

Sensors are now accessible via `airflow.sensors` and no longer via `airflow.operators.sensors`.

For example: `from airflow.operators.sensors import BaseSensorOperator`
becomes `from airflow.sensors.base_sensor_operator import BaseSensorOperator`

### Renamed "extra" requirements for cloud providers

Subpackages for specific services have been combined into one variant for
each cloud provider. The name of the subpackage for the Google Cloud Platform
has changed to follow style.

If you want to install integration for Microsoft Azure, then instead of
```
pip install 'apache-airflow[azure_blob_storage,azure_data_lake,azure_cosmos,azure_container_instances]'
```
you should execute `pip install 'apache-airflow[azure]'`

If you want to install integration for Amazon Web Services, then instead of
`pip install 'apache-airflow[s3,emr]'`, you should execute `pip install 'apache-airflow[aws]'`

If you want to install integration for Google Cloud Platform, then instead of
`pip install 'apache-airflow[gcp_api]'`, you should execute `pip install 'apache-airflow[gcp]'`.
The old way will work until the release of Airflow 2.1.

### Deprecate legacy UI in favor of FAB RBAC UI
Previously we were using two versions of UI, which were hard to maintain as we need to implement/update the same feature
in both versions. With this change we've removed the older UI in favor of Flask App Builder RBAC UI. No need to set the
RBAC UI explicitly in the configuration now as this is the only default UI.
Please note that that custom auth backends will need re-writing to target new FAB based UI.

As part of this change, a few configuration items in `[webserver]` section are removed and no longer applicable,
including `authenticate`, `filter_by_owner`, `owner_mode`, and `rbac`.


#### Remove run_duration

We should not use the `run_duration` option anymore. This used to be for restarting the scheduler from time to time, but right now the scheduler is getting more stable and therefore using this setting is considered bad and might cause an inconsistent state.

### New `dag_processor_manager_log_location` config option

The DAG parsing manager log now by default will be log into a file, where its location is
controlled by the new `dag_processor_manager_log_location` config option in core section.

### min_file_parsing_loop_time config option temporarily disabled

The scheduler.min_file_parsing_loop_time config option has been temporarily removed due to
some bugs.

### CLI Changes

The ability to manipulate users from the command line has been changed. 'airflow create_user' and 'airflow delete_user' and 'airflow list_users' has been grouped to a single command `airflow users` with optional flags `--create`, `--list` and `--delete`.

Example Usage:

To create a new user:
```bash
airflow users --create --username jondoe --lastname doe --firstname jon --email jdoe@apache.org --role Viewer --password test
```

To list users:
```bash
airflow users --list
```

To delete a user:
```bash
airflow users --delete --username jondoe
```

To add a user to a role:
```bash
airflow users --add-role --username jondoe --role Public
```

To remove a user from a role:
```bash
airflow users --remove-role --username jondoe --role Public
```

### Unification of `do_xcom_push` flag
The `do_xcom_push` flag (a switch to push the result of an operator to xcom or not) was appearing in different incarnations in different operators. It's function has been unified under a common name (`do_xcom_push`) on `BaseOperator`. This way it is also easy to globally disable pushing results to xcom.

See [AIRFLOW-3249](https://jira.apache.org/jira/browse/AIRFLOW-3249) to check if your operator was affected.

### Changes to Dataproc related Operators
The 'properties' and 'jars' properties for the Dataproc related operators (`DataprocXXXOperator`) have been renamed from 
`dataproc_xxxx_properties` and `dataproc_xxx_jars`  to `dataproc_properties`
and `dataproc_jars`respectively. 
Arguments for dataproc_properties dataproc_jars 

## Airflow 1.10.4

### Python 2 support is going away

Airflow 1.10 will be the last release series to support Python 2. Airflow 2.0.0 will only support Python 3.5 and up.

If you have a specific task that still requires Python 2 then you can use the PythonVirtualenvOperator for this.

### Changes to DatastoreHook

* removed argument `version` from `get_conn` function and added it to the hook's `__init__` function instead and renamed it to `api_version`
* renamed the `partialKeys` argument of function `allocate_ids` to `partial_keys`

### Changes to GoogleCloudStorageHook

* the discovery-based api (`googleapiclient.discovery`) used in `GoogleCloudStorageHook` is now replaced by the recommended client based api (`google-cloud-storage`). To know the difference between both the libraries, read https://cloud.google.com/apis/docs/client-libraries-explained. PR: [#5054](https://github.com/apache/airflow/pull/5054) 
* as a part of this replacement, the `multipart` & `num_retries` parameters for `GoogleCloudStorageHook.upload` method have been deprecated.

  The client library uses multipart upload automatically if the object/blob size is more than 8 MB - [source code](https://github.com/googleapis/google-cloud-python/blob/11c543ce7dd1d804688163bc7895cf592feb445f/storage/google/cloud/storage/blob.py#L989-L997). The client also handles retries automatically

* the `generation` parameter is deprecated in `GoogleCloudStorageHook.delete` and `GoogleCloudStorageHook.insert_object_acl`. 

Updating to `google-cloud-storage >= 1.16` changes the signature of the upstream `client.get_bucket()` method from `get_bucket(bucket_name: str)` to `get_bucket(bucket_or_name: Union[str, Bucket])`. This method is not directly exposed by the airflow hook, but any code accessing the connection directly (`GoogleCloudStorageHook().get_conn().get_bucket(...)` or similar) will need to be updated.

### Changes in writing Logs to Elasticsearch

The `elasticsearch_` prefix has been removed from all config items under the `[elasticsearch]` section. For example `elasticsearch_host` is now just `host`.

### Removal of `non_pooled_task_slot_count` and `non_pooled_backfill_task_slot_count`

`non_pooled_task_slot_count` and `non_pooled_backfill_task_slot_count`
are removed in favor of a real pool, e.g. `default_pool`.

By default tasks are running in `default_pool`.
`default_pool` is initialized with 128 slots and user can change the
number of slots through UI/CLI. `default_pool` cannot be removed.

### `pool` config option in Celery section to support different Celery pool implementation

The new `pool` config option allows users to choose different pool
implementation. Default value is "prefork", while choices include "prefork" (default),
"eventlet", "gevent" or "solo". This may help users achieve better concurrency performance
in different scenarios.

For more details about Celery pool implementation, please refer to:
- https://docs.celeryproject.org/en/latest/userguide/workers.html#concurrency
- https://docs.celeryproject.org/en/latest/userguide/concurrency/eventlet.html

## Airflow 1.10.3

### RedisPy dependency updated to v3 series
If you are using the Redis Sensor or Hook you may have to update your code. See
[redis-py porting instructions] to check if your code might be affected (MSET,
MSETNX, ZADD, and ZINCRBY all were, but read the full doc).

[redis-py porting instructions]: https://github.com/andymccurdy/redis-py/tree/3.2.0#upgrading-from-redis-py-2x-to-30

### SLUGIFY_USES_TEXT_UNIDECODE or AIRFLOW_GPL_UNIDECODE no longer required

It is no longer required to set one of the environment variables to avoid
a GPL dependency. Airflow will now always use text-unidecode if unidecode
was not installed before.

### new `sync_parallelism` config option in celery section

The new `sync_parallelism` config option will control how many processes CeleryExecutor will use to
fetch celery task state in parallel. Default value is max(1, number of cores - 1)

### Rename of BashTaskRunner to StandardTaskRunner

BashTaskRunner has been renamed to StandardTaskRunner. It is the default task runner
so you might need to update your config.

`task_runner = StandardTaskRunner`

### Modification to config file discovery

If the `AIRFLOW_CONFIG` environment variable was not set and the
`~/airflow/airflow.cfg` file existed, airflow previously used
`~/airflow/airflow.cfg` instead of `$AIRFLOW_HOME/airflow.cfg`. Now airflow
will discover its config file using the `$AIRFLOW_CONFIG` and `$AIRFLOW_HOME`
environment variables rather than checking for the presence of a file.

### New `dag_discovery_safe_mode` config option

If `dag_discovery_safe_mode` is enabled, only check files for DAGs if
they contain the strings "airflow" and "DAG". For backwards
compatibility, this option is enabled by default.

### Changes in Google Cloud Platform related operators

Most GCP-related operators have now optional `PROJECT_ID` parameter. In case you do not specify it,
the project id configured in
[GCP Connection](https://airflow.apache.org/howto/manage-connections.html#connection-type-gcp) is used.
There will be an `AirflowException` thrown in case `PROJECT_ID` parameter is not specified and the
connection used has no project id defined. This change should be  backwards compatible as earlier version
of the operators had `PROJECT_ID` mandatory.

Operators involved:

  * GCP Compute Operators
    * GceInstanceStartOperator
    * GceInstanceStopOperator
    * GceSetMachineTypeOperator
  * GCP Function Operators
    * GcfFunctionDeployOperator
  * GCP Cloud SQL Operators
    * CloudSqlInstanceCreateOperator
    * CloudSqlInstancePatchOperator
    * CloudSqlInstanceDeleteOperator
    * CloudSqlInstanceDatabaseCreateOperator
    * CloudSqlInstanceDatabasePatchOperator
    * CloudSqlInstanceDatabaseDeleteOperator

Other GCP operators are unaffected.

### Changes in Google Cloud Platform related hooks

The change in GCP operators implies that GCP Hooks for those operators require now keyword parameters rather
than positional ones in all methods where `project_id` is used. The methods throw an explanatory exception
in case they are called using positional parameters.

Hooks involved:

  * GceHook
  * GcfHook
  * CloudSqlHook

Other GCP hooks are unaffected.

### Changed behaviour of using default value when accessing variables
It's now possible to use `None` as a default value with the `default_var` parameter when getting a variable, e.g.

```python
foo = Variable.get("foo", default_var=None)
if foo is None:
    handle_missing_foo()
```

(Note: there is already `Variable.setdefault()` which me be helpful in some cases.)

This changes the behaviour if you previously explicitly provided `None` as a default value. If your code expects a `KeyError` to be thrown, then don't pass the `default_var` argument.

### Removal of `airflow_home` config setting

There were previously two ways of specifying the Airflow "home" directory
(`~/airflow` by default): the `AIRFLOW_HOME` environment variable, and the
`airflow_home` config setting in the `[core]` section.

If they had two different values different parts of the code base would end up
with different values. The config setting has been deprecated, and you should
remove the value from the config file and set `AIRFLOW_HOME` environment
variable if you need to use a non default value for this.

(Since this setting is used to calculate what config file to load, it is not
possible to keep just the config option)

### Change of two methods signatures in `GCPTransferServiceHook`

The signature of the `create_transfer_job` method in `GCPTransferServiceHook`
class has changed. The change does not change the behavior of the method.

Old signature:
```python
def create_transfer_job(self, description, schedule, transfer_spec, project_id=None):
```
New signature:
```python
def create_transfer_job(self, body):
```

It is necessary to rewrite calls to method. The new call looks like this:
```python
body = {
  'status': 'ENABLED',
  'projectId': project_id,
  'description': description,
  'transferSpec': transfer_spec,
  'schedule': schedule,
}
gct_hook.create_transfer_job(body)
```
The change results from the unification of all hooks and adjust to
[the official recommendations](https://lists.apache.org/thread.html/e8534d82be611ae7bcb21ba371546a4278aad117d5e50361fd8f14fe@%3Cdev.airflow.apache.org%3E)
for the Google Cloud Platform.

The signature of `wait_for_transfer_job` method in `GCPTransferServiceHook` has changed.

Old signature:
```python
def wait_for_transfer_job(self, job):
```
New signature:
```python
def wait_for_transfer_job(self, job, expected_statuses=(GcpTransferOperationStatus.SUCCESS, )):
```

The behavior of `wait_for_transfer_job` has changed:

Old behavior:

`wait_for_transfer_job` would wait for the SUCCESS status in specified jobs operations.

New behavior:

You can now specify an array of expected statuses. `wait_for_transfer_job` now waits for any of them.

The default value of `expected_statuses` is SUCCESS so that change is backwards compatible.

### Moved two classes to different modules

The class `GoogleCloudStorageToGoogleCloudStorageTransferOperator` has been moved from
`airflow.contrib.operators.gcs_to_gcs_transfer_operator` to `airflow.contrib.operators.gcp_transfer_operator`

the class `S3ToGoogleCloudStorageTransferOperator` has been moved from
`airflow.contrib.operators.s3_to_gcs_transfer_operator` to `airflow.contrib.operators.gcp_transfer_operator`

The change was made to keep all the operators related to GCS Transfer Services in one file.

The previous imports will continue to work until Airflow 2.0

### Fixed typo in --driver-class-path in SparkSubmitHook

The `driver_classapth` argument  to SparkSubmit Hook and Operator was
generating `--driver-classpath` on the spark command line, but this isn't a
valid option to spark.

The argument has been renamed to `driver_class_path`  and  the option it
generates has been fixed.


## Airflow 1.10.2

### DAG level Access Control for new RBAC UI

Extend and enhance new Airflow RBAC UI to support DAG level ACL. Each dag now has two permissions(one for write, one for read) associated('can_dag_edit', 'can_dag_read').
The admin will create new role, associate the dag permission with the target dag and assign that role to users. That user can only access / view the certain dags on the UI
that he has permissions on. If a new role wants to access all the dags, the admin could associate dag permissions on an artificial view(``all_dags``) with that role.

We also provide a new cli command(``sync_perm``) to allow admin to auto sync permissions.

### Modification to `ts_nodash` macro
`ts_nodash` previously contained TimeZone information along with execution date. For Example: `20150101T000000+0000`. This is not user-friendly for file or folder names which was a popular use case for `ts_nodash`. Hence this behavior has been changed and using `ts_nodash` will no longer contain TimeZone information, restoring the pre-1.10 behavior of this macro. And a new macro `ts_nodash_with_tz` has been added which can be used to get a string with execution date and timezone info without dashes.

Examples:
  * `ts_nodash`: `20150101T000000`
  * `ts_nodash_with_tz`: `20150101T000000+0000`

### Semantics of next_ds/prev_ds changed for manually triggered runs

next_ds/prev_ds now map to execution_date instead of the next/previous schedule-aligned execution date for DAGs triggered in the UI.

### User model changes
This patch changes the `User.superuser` field from a hardcoded boolean to a `Boolean()` database column. `User.superuser` will default to `False`, which means that this privilege will have to be granted manually to any users that may require it.

For example, open a Python shell and
```python
from airflow import models, settings

session = settings.Session()
users = session.query(models.User).all()  # [admin, regular_user]

users[1].superuser  # False

admin = users[0]
admin.superuser = True
session.add(admin)
session.commit()
```

### Custom auth backends interface change

We have updated the version of flask-login we depend upon, and as a result any
custom auth backends might need a small change: `is_active`,
`is_authenticated`, and `is_anonymous` should now be properties. What this means is if
previously you had this in your user class

    def is_active(self):
      return self.active

then you need to change it like this

    @property
    def is_active(self):
      return self.active
      
### Support autodetected schemas to GoogleCloudStorageToBigQueryOperator

GoogleCloudStorageToBigQueryOperator is now support schema auto-detection is available when you load data into BigQuery. Unfortunately, changes can be required.

If BigQuery tables are created outside of airflow and the schema is not defined in the task, multiple options are available:

define a schema_fields:

    gcs_to_bq.GoogleCloudStorageToBigQueryOperator(
      ...
      schema_fields={...})
      
or define a schema_object:

    gcs_to_bq.GoogleCloudStorageToBigQueryOperator(
      ...
      schema_object='path/to/schema/object)

or enabled autodetect of schema:

    gcs_to_bq.GoogleCloudStorageToBigQueryOperator(
      ...
      autodetect=True)

## Airflow 1.10.1

### StatsD Metrics

The `scheduler_heartbeat` metric has been changed from a gauge to a counter. Each loop of the scheduler will increment the counter by 1. This provides a higher degree of visibility and allows for better integration with Prometheus using the [StatsD Exporter](https://github.com/prometheus/statsd_exporter). The scheduler's activity status can be determined by graphing and alerting using a rate of change of the counter. If the scheduler goes down, the rate will drop to 0.

### EMRHook now passes all of connection's extra to CreateJobFlow API

EMRHook.create_job_flow has been changed to pass all keys to the create_job_flow API, rather than
just specific known keys for greater flexibility.

However prior to this release the "emr_default" sample connection that was created had invalid
configuration, so creating EMR clusters might fail until your connection is updated. (Ec2KeyName,
Ec2SubnetId, TerminationProtection and KeepJobFlowAliveWhenNoSteps were all top-level keys when they
should be inside the "Instances" dict)

### LDAP Auth Backend now requires TLS

Connecting to an LDAP server over plain text is not supported anymore. The
certificate presented by the LDAP server must be signed by a trusted
certificate, or you must provide the `cacert` option under `[ldap]` in the
config file.

If you want to use LDAP auth backend without TLS then you will have to create a
custom-auth backend based on
https://github.com/apache/airflow/blob/1.10.0/airflow/contrib/auth/backends/ldap_auth.py

## Airflow 1.10

Installation and upgrading requires setting `SLUGIFY_USES_TEXT_UNIDECODE=yes` in your environment or
`AIRFLOW_GPL_UNIDECODE=yes`. In case of the latter a GPL runtime dependency will be installed due to a
dependency (python-nvd3 -> python-slugify -> unidecode).

### Replace DataProcHook.await calls to DataProcHook.wait

The method name was changed to be compatible with the Python 3.7 async/await keywords

### Setting UTF-8 as default mime_charset in email utils

### Add a configuration variable(default_dag_run_display_number) to control numbers of dag run for display

Add a configuration variable(default_dag_run_display_number) under webserver section to control the number of dag runs to show in UI.

### Default executor for SubDagOperator is changed to SequentialExecutor

### New Webserver UI with Role-Based Access Control

The current webserver UI uses the Flask-Admin extension. The new webserver UI uses the [Flask-AppBuilder (FAB)](https://github.com/dpgaspar/Flask-AppBuilder) extension. FAB has built-in authentication support and Role-Based Access Control (RBAC), which provides configurable roles and permissions for individual users.

To turn on this feature, in your airflow.cfg file (under [webserver]), set the configuration variable `rbac = True`, and then run `airflow` command, which will generate the `webserver_config.py` file in your $AIRFLOW_HOME.

#### Setting up Authentication

FAB has built-in authentication support for DB, OAuth, OpenID, LDAP, and REMOTE_USER. The default auth type is `AUTH_DB`.

For any other authentication type (OAuth, OpenID, LDAP, REMOTE_USER), see the [Authentication section of FAB docs](http://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-methods) for how to configure variables in webserver_config.py file.

Once you modify your config file, run `airflow db init` to generate new tables for RBAC support (these tables will have the prefix `ab_`).

#### Creating an Admin Account

Once configuration settings have been updated and new tables have been generated, create an admin account with `airflow create_user` command.

#### Using your new UI

Run `airflow webserver` to start the new UI. This will bring up a log in page, enter the recently created admin username and password.

There are five roles created for Airflow by default: Admin, User, Op, Viewer, and Public. To configure roles/permissions, go to the `Security` tab and click `List Roles` in the new UI.

#### Breaking changes

- AWS Batch Operator renamed property queue to job_queue to prevent conflict with the internal queue from CeleryExecutor - AIRFLOW-2542
- Users created and stored in the old users table will not be migrated automatically. FAB's built-in authentication support must be reconfigured.
- Airflow dag home page is now `/home` (instead of `/admin`).
- All ModelViews in Flask-AppBuilder follow a different pattern from Flask-Admin. The `/admin` part of the URL path will no longer exist. For example: `/admin/connection` becomes `/connection/list`, `/admin/connection/new` becomes `/connection/add`, `/admin/connection/edit` becomes `/connection/edit`, etc.
- Due to security concerns, the new webserver will no longer support the features in the `Data Profiling` menu of old UI, including `Ad Hoc Query`, `Charts`, and `Known Events`.
- HiveServer2Hook.get_results() always returns a list of tuples, even when a single column is queried, as per Python API 2.
- **UTC is now the default timezone**: Either reconfigure your workflows scheduling in UTC or set `default_timezone` as explained in https://airflow.apache.org/timezone.html#default-time-zone

### airflow.contrib.sensors.hdfs_sensors renamed to airflow.contrib.sensors.hdfs_sensor

We now rename airflow.contrib.sensors.hdfs_sensors to airflow.contrib.sensors.hdfs_sensor for consistency purpose.

### MySQL setting required

We now rely on more strict ANSI SQL settings for MySQL in order to have sane defaults. Make sure
to have specified `explicit_defaults_for_timestamp=1` in your my.cnf under `[mysqld]`

### Celery config

To make the config of Airflow compatible with Celery, some properties have been renamed:

```
celeryd_concurrency -> worker_concurrency
celery_result_backend -> result_backend
celery_ssl_active -> ssl_active
celery_ssl_cert -> ssl_cert
celery_ssl_key -> ssl_key
```

Resulting in the same config parameters as Celery 4, with more transparency.

### GCP Dataflow Operators

Dataflow job labeling is now supported in Dataflow{Java,Python}Operator with a default
"airflow-version" label, please upgrade your google-cloud-dataflow or apache-beam version
to 2.2.0 or greater.

### Google Cloud Storage Hook

The `GoogleCloudStorageDownloadOperator` can either write to a supplied `filename` or return the content of a file via xcom through `store_to_xcom_key` - both options are mutually exclusive.

### BigQuery Hooks and Operator

The `bql` parameter passed to `BigQueryOperator` and `BigQueryBaseCursor.run_query` has been deprecated and renamed to `sql` for consistency purposes. Using `bql` will still work (and raise a `DeprecationWarning`), but is no longer
supported and will be removed entirely in Airflow 2.0

### Redshift to S3 Operator

With Airflow 1.9 or lower, Unload operation always included header row. In order to include header row,
we need to turn off parallel unload. It is preferred to perform unload operation using all nodes so that it is
faster for larger tables. So, parameter called `include_header` is added and default is set to False.
Header row will be added only if this parameter is set True and also in that case parallel will be automatically turned off (`PARALLEL OFF`)

### Google cloud connection string

With Airflow 1.9 or lower, there were two connection strings for the Google Cloud operators, both `google_cloud_storage_default` and `google_cloud_default`. This can be confusing and therefore the `google_cloud_storage_default` connection id has been replaced with `google_cloud_default` to make the connection id consistent across Airflow.

### Logging Configuration

With Airflow 1.9 or lower, `FILENAME_TEMPLATE`, `PROCESSOR_FILENAME_TEMPLATE`, `LOG_ID_TEMPLATE`, `END_OF_LOG_MARK` were configured in `airflow_local_settings.py`. These have been moved into the configuration file, and hence if you were using a custom configuration file the following defaults need to be added.

```
[core]
fab_logging_level = WARN
log_filename_template = {{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ try_number }}}}.log
log_processor_filename_template = {{{{ filename }}}}.log

[elasticsearch]
elasticsearch_log_id_template = {{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}}
elasticsearch_end_of_log_mark = end_of_log
```

The previous setting of `log_task_reader` is not needed in many cases now when using the default logging config with remote storages. (Previously it needed to be set to `s3.task` or similar. This is not needed with the default config anymore)

#### Change of per-task log path

With the change to Airflow core to be timezone aware the default log path for task instances will now include timezone information. This will by default mean all previous task logs won't be found. You can get the old behaviour back by setting the following config options:

```
[core]
log_filename_template = {{ ti.dag_id }}/{{ ti.task_id }}/{{ execution_date.strftime("%%Y-%%m-%%dT%%H:%%M:%%S") }}/{{ try_number }}.log
```

## Airflow 1.9

### SSH Hook updates, along with new SSH Operator & SFTP Operator

SSH Hook now uses the Paramiko library to create an ssh client connection, instead of the sub-process based ssh command execution previously (<1.9.0), so this is backward incompatible.

- update SSHHook constructor
- use SSHOperator class in place of SSHExecuteOperator which is removed now. Refer to test_ssh_operator.py for usage info.
- SFTPOperator is added to perform secure file transfer from serverA to serverB. Refer to test_sftp_operator.py for usage info.
- No updates are required if you are using ftpHook, it will continue to work as is.

### S3Hook switched to use Boto3

The airflow.hooks.S3_hook.S3Hook has been switched to use boto3 instead of the older boto (a.k.a. boto2). This results in a few backwards incompatible changes to the following classes: S3Hook:

- the constructors no longer accepts `s3_conn_id`. It is now called `aws_conn_id`.
- the default connection is now "aws_default" instead of "s3_default"
- the return type of objects returned by `get_bucket` is now boto3.s3.Bucket
- the return type of `get_key`, and `get_wildcard_key` is now an boto3.S3.Object.

If you are using any of these in your DAGs and specify a connection ID you will need to update the parameter name for the connection to "aws_conn_id": S3ToHiveTransfer, S3PrefixSensor, S3KeySensor, RedshiftToS3Transfer.

### Logging update

The logging structure of Airflow has been rewritten to make configuration easier and the logging system more transparent.

#### A quick recap about logging

A logger is the entry point into the logging system. Each logger is a named bucket to which messages can be written for processing. A logger is configured to have a log level. This log level describes the severity of the messages that the logger will handle. Python defines the following log levels: DEBUG, INFO, WARNING, ERROR or CRITICAL.

Each message that is written to the logger is a Log Record. Each log record contains a log level indicating the severity of that specific message. A log record can also contain useful metadata that describes the event that is being logged. This can include details such as a stack trace or an error code.

When a message is given to the logger, the log level of the message is compared to the log level of the logger. If the log level of the message meets or exceeds the log level of the logger itself, the message will undergo further processing. If it doesn’t, the message will be ignored.

Once a logger has determined that a message needs to be processed, it is passed to a Handler. This configuration is now more flexible and can be easily be maintained in a single file.

#### Changes in Airflow Logging

Airflow's logging mechanism has been refactored to use Python’s built-in `logging` module to perform logging of the application. By extending classes with the existing `LoggingMixin`, all the logging will go through a central logger. Also the `BaseHook` and `BaseOperator` already extend this class, so it is easily available to do logging.

The main benefit is easier configuration of the logging by setting a single centralized python file. Disclaimer; there is still some inline configuration, but this will be removed eventually. The new logging class is defined by setting the dotted classpath in your `~/airflow/airflow.cfg` file:

```
# Logging class
# Specify the class that will specify the logging configuration
# This class has to be on the python classpath
logging_config_class = my.path.default_local_settings.LOGGING_CONFIG
```

The logging configuration file needs to be on the `PYTHONPATH`, for example `$AIRFLOW_HOME/config`. This directory is loaded by default. Any directory may be added to the `PYTHONPATH`, this might be handy when the config is in another directory or a volume is mounted in case of Docker.

The config can be taken from `airflow/config_templates/airflow_local_settings.py` as a starting point. Copy the contents to `${AIRFLOW_HOME}/config/airflow_local_settings.py`,  and alter the config as is preferred.

```
# -*- coding: utf-8 -*-
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

import os

from airflow import configuration as conf

# TODO: Logging format and level should be configured
# in this file instead of from airflow.cfg. Currently
# there are other log format and level configurations in
# settings.py and cli.py. Please see AIRFLOW-1455.

LOG_LEVEL = conf.get('core', 'LOGGING_LEVEL').upper()
LOG_FORMAT = conf.get('core', 'log_format')

BASE_LOG_FOLDER = conf.get('core', 'BASE_LOG_FOLDER')
PROCESSOR_LOG_FOLDER = conf.get('scheduler', 'child_process_log_directory')

FILENAME_TEMPLATE = '{{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log'
PROCESSOR_FILENAME_TEMPLATE = '{{ filename }}.log'

DEFAULT_LOGGING_CONFIG = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'airflow.task': {
            'format': LOG_FORMAT,
        },
        'airflow.processor': {
            'format': LOG_FORMAT,
        },
    },
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
            'formatter': 'airflow.task',
            'stream': 'ext://sys.stdout'
        },
        'file.task': {
            'class': 'airflow.utils.log.file_task_handler.FileTaskHandler',
            'formatter': 'airflow.task',
            'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
            'filename_template': FILENAME_TEMPLATE,
        },
        'file.processor': {
            'class': 'airflow.utils.log.file_processor_handler.FileProcessorHandler',
            'formatter': 'airflow.processor',
            'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
            'filename_template': PROCESSOR_FILENAME_TEMPLATE,
        }
        # When using s3 or gcs, provide a customized LOGGING_CONFIG
        # in airflow_local_settings within your PYTHONPATH, see UPDATING.md
        # for details
        # 's3.task': {
        #     'class': 'airflow.utils.log.s3_task_handler.S3TaskHandler',
        #     'formatter': 'airflow.task',
        #     'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
        #     's3_log_folder': S3_LOG_FOLDER,
        #     'filename_template': FILENAME_TEMPLATE,
        # },
        # 'gcs.task': {
        #     'class': 'airflow.utils.log.gcs_task_handler.GCSTaskHandler',
        #     'formatter': 'airflow.task',
        #     'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
        #     'gcs_log_folder': GCS_LOG_FOLDER,
        #     'filename_template': FILENAME_TEMPLATE,
        # },
    },
    'loggers': {
        '': {
            'handlers': ['console'],
            'level': LOG_LEVEL
        },
        'airflow': {
            'handlers': ['console'],
            'level': LOG_LEVEL,
            'propagate': False,
        },
        'airflow.processor': {
            'handlers': ['file.processor'],
            'level': LOG_LEVEL,
            'propagate': True,
        },
        'airflow.task': {
            'handlers': ['file.task'],
            'level': LOG_LEVEL,
            'propagate': False,
        },
        'airflow.task_runner': {
            'handlers': ['file.task'],
            'level': LOG_LEVEL,
            'propagate': True,
        },
    }
}
```

To customize the logging (for example, use logging rotate), define one or more of the logging handles that [Python has to offer](https://docs.python.org/3/library/logging.handlers.html). For more details about the Python logging, please refer to the [official logging documentation](https://docs.python.org/3/library/logging.html).

Furthermore, this change also simplifies logging within the DAG itself:

```
root@ae1bc863e815:/airflow# python
Python 3.6.2 (default, Sep 13 2017, 14:26:54)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from airflow.settings import *
>>>
>>> from datetime import datetime
>>> from airflow import DAG
>>> from airflow.operators.dummy_operator import DummyOperator
>>>
>>> dag = DAG('simple_dag', start_date=datetime(2017, 9, 1))
>>>
>>> task = DummyOperator(task_id='task_1', dag=dag)
>>>
>>> task.log.error('I want to say something..')
[2017-09-25 20:17:04,927] {<stdin>:1} ERROR - I want to say something..
```

#### Template path of the file_task_handler

The `file_task_handler` logger has been made more flexible. The default format can be changed, `{dag_id}/{task_id}/{execution_date}/{try_number}.log` by supplying Jinja templating in the `FILENAME_TEMPLATE` configuration variable. See the `file_task_handler` for more information.

#### I'm using S3Log or GCSLogs, what do I do!?

If you are logging to Google cloud storage, please see the [Google cloud platform documentation](https://airflow.apache.org/integration.html#gcp-google-cloud-platform) for logging instructions.

If you are using S3, the instructions should be largely the same as the Google cloud platform instructions above. You will need a custom logging config. The `REMOTE_BASE_LOG_FOLDER` configuration key in your airflow config has been removed, therefore you will need to take the following steps:

- Copy the logging configuration from [`airflow/config_templates/airflow_logging_settings.py`](https://github.com/apache/airflow/blob/master/airflow/config_templates/airflow_local_settings.py).
- Place it in a directory inside the Python import path `PYTHONPATH`. If you are using Python 2.7, ensuring that any `__init__.py` files exist so that it is importable.
- Update the config by setting the path of `REMOTE_BASE_LOG_FOLDER` explicitly in the config. The `REMOTE_BASE_LOG_FOLDER` key is not used anymore.
- Set the `logging_config_class` to the filename and dict. For example, if you place `custom_logging_config.py` on the base of your `PYTHONPATH`, you will need to set `logging_config_class = custom_logging_config.LOGGING_CONFIG` in your config as Airflow 1.8.

### New Features

#### Dask Executor

A new DaskExecutor allows Airflow tasks to be run in Dask Distributed clusters.

### Deprecated Features

These features are marked for deprecation. They may still work (and raise a `DeprecationWarning`), but are no longer
supported and will be removed entirely in Airflow 2.0

- If you're using the `google_cloud_conn_id` or `dataproc_cluster` argument names explicitly in `contrib.operators.Dataproc{*}Operator`(s), be sure to rename them to `gcp_conn_id` or `cluster_name`, respectively. We've renamed these arguments for consistency. (AIRFLOW-1323)

- `post_execute()` hooks now take two arguments, `context` and `result`
  (AIRFLOW-886)

  Previously, post_execute() only took one argument, `context`.

- `contrib.hooks.gcp_dataflow_hook.DataFlowHook` starts to use `--runner=DataflowRunner` instead of `DataflowPipelineRunner`, which is removed from the package `google-cloud-dataflow-0.6.0`.

- The pickle type for XCom messages has been replaced by json to prevent RCE attacks.
  Note that JSON serialization is stricter than pickling, so if you want to e.g. pass
  raw bytes through XCom you must encode them using an encoding like base64.
  By default pickling is still enabled until Airflow 2.0. To disable it
  set enable_xcom_pickling = False in your Airflow config.

## Airflow 1.8.1

The Airflow package name was changed from `airflow` to `apache-airflow` during this release. You must uninstall
a previously installed version of Airflow before installing 1.8.1.

## Airflow 1.8

### Database

The database schema needs to be upgraded. Make sure to shutdown Airflow and make a backup of your database. To
upgrade the schema issue `airflow upgradedb`.

### Upgrade systemd unit files

Systemd unit files have been updated. If you use systemd please make sure to update these.

> Please note that the webserver does not detach properly, this will be fixed in a future version.

### Tasks not starting although dependencies are met due to stricter pool checking

Airflow 1.7.1 has issues with being able to over subscribe to a pool, ie. more slots could be used than were
available. This is fixed in Airflow 1.8.0, but due to past issue jobs may fail to start although their
dependencies are met after an upgrade. To workaround either temporarily increase the amount of slots above
the amount of queued tasks or use a new pool.

### Less forgiving scheduler on dynamic start_date

Using a dynamic start_date (e.g. `start_date = datetime.now()`) is not considered a best practice. The 1.8.0 scheduler
is less forgiving in this area. If you encounter DAGs not being scheduled you can try using a fixed start_date and
renaming your DAG. The last step is required to make sure you start with a clean slate, otherwise the old schedule can
interfere.

### New and updated scheduler options

Please read through the new scheduler options, defaults have changed since 1.7.1.

#### child_process_log_directory

In order to increase the robustness of the scheduler, DAGS are now processed in their own process. Therefore each
DAG has its own log file for the scheduler. These log files are placed in `child_process_log_directory` which defaults to
`<AIRFLOW_HOME>/scheduler/latest`. You will need to make sure these log files are removed.

> DAG logs or processor logs ignore and command line settings for log file locations.

#### run_duration

Previously the command line option `num_runs` was used to let the scheduler terminate after a certain amount of
loops. This is now time bound and defaults to `-1`, which means run continuously. See also num_runs.

#### num_runs

Previously `num_runs` was used to let the scheduler terminate after a certain amount of loops. Now num_runs specifies
the number of times to try to schedule each DAG file within `run_duration` time. Defaults to `-1`, which means try
indefinitely. This is only available on the command line.

#### min_file_process_interval

After how much time should an updated DAG be picked up from the filesystem.

#### min_file_parsing_loop_time
CURRENTLY DISABLED DUE TO A BUG
How many seconds to wait between file-parsing loops to prevent the logs from being spammed.

#### dag_dir_list_interval

The frequency with which the scheduler should relist the contents of the DAG directory. If while developing +dags, they are not being picked up, have a look at this number and decrease it when necessary.

#### catchup_by_default

By default the scheduler will fill any missing interval DAG Runs between the last execution date and the current date.
This setting changes that behavior to only execute the latest interval. This can also be specified per DAG as
`catchup = False / True`. Command line backfills will still work.

### Faulty DAGs do not show an error in the Web UI

Due to changes in the way Airflow processes DAGs the Web UI does not show an error when processing a faulty DAG. To
find processing errors go the `child_process_log_directory` which defaults to `<AIRFLOW_HOME>/scheduler/latest`.

### New DAGs are paused by default

Previously, new DAGs would be scheduled immediately. To retain the old behavior, add this to airflow.cfg:

```
[core]
dags_are_paused_at_creation = False
```

### Airflow Context variable are passed to Hive config if conf is specified

If you specify a hive conf to the run_cli command of the HiveHook, Airflow add some
convenience variables to the config. In case you run a secure Hadoop setup it might be
required to whitelist these variables by adding the following to your configuration:

```
<property>
     <name>hive.security.authorization.sqlstd.confwhitelist.append</name>
     <value>airflow\.ctx\..*</value>
</property>
```

### Google Cloud Operator and Hook alignment

All Google Cloud Operators and Hooks are aligned and use the same client library. Now you have a single connection
type for all kinds of Google Cloud Operators.

If you experience problems connecting with your operator make sure you set the connection type "Google Cloud Platform".

Also the old P12 key file type is not supported anymore and only the new JSON key files are supported as a service
account.

### Deprecated Features

These features are marked for deprecation. They may still work (and raise a `DeprecationWarning`), but are no longer
supported and will be removed entirely in Airflow 2.0

- Hooks and operators must be imported from their respective submodules

  `airflow.operators.PigOperator` is no longer supported; `from airflow.operators.pig_operator import PigOperator` is.
  (AIRFLOW-31, AIRFLOW-200)

- Operators no longer accept arbitrary arguments

  Previously, `Operator.__init__()` accepted any arguments (either positional `*args` or keyword `**kwargs`) without
  complaint. Now, invalid arguments will be rejected. (https://github.com/apache/airflow/pull/1285)

- The config value secure_mode will default to True which will disable some insecure endpoints/features

### Known Issues

There is a report that the default of "-1" for num_runs creates an issue where errors are reported while parsing tasks.
It was not confirmed, but a workaround was found by changing the default back to `None`.

To do this edit `cli.py`, find the following:

```
        'num_runs': Arg(
            ("-n", "--num_runs"),
            default=-1, type=int,
            help="Set the number of runs to execute before exiting"),
```

and change `default=-1` to `default=None`. If you have this issue please report it on the mailing list.

## Airflow 1.7.1.2

### Changes to Configuration

#### Email configuration change

To continue using the default smtp email backend, change the email_backend line in your config file from:

```
[email]
email_backend = airflow.utils.send_email_smtp
```

to:

```
[email]
email_backend = airflow.utils.email.send_email_smtp
```

#### S3 configuration change

To continue using S3 logging, update your config file so:

```
s3_log_folder = s3://my-airflow-log-bucket/logs
```

becomes:

```
remote_base_log_folder = s3://my-airflow-log-bucket/logs
remote_log_conn_id = <your desired s3 connection>
```
-												[AIRFLOW-2779] Add license headers to doc files (#4178)

This adds ASF license headers to all the .rst and .md files with the
exception of the Pull Request template (as that is included verbatim
when opening a Pull Request on Github which would be messy)
											
										
										
											2018-11-13 17:01:44 +03:00
+								<!--
-												[AIRFLOW-5206] Common licence in all .md files, TOC + removed TODO.md (#5809)


											
										
										
											2019-08-22 06:27:54 +03:00
+								 Licensed to the Apache Software Foundation (ASF) under one
 								 or more contributor license agreements.  See the NOTICE file
 								 distributed with this work for additional information
 								 regarding copyright ownership.  The ASF licenses this file
 								 to you under the Apache License, Version 2.0 (the
 								 "License"); you may not use this file except in compliance
 								 with the License.  You may obtain a copy of the License at
 								   http://www.apache.org/licenses/LICENSE-2.0
 								 Unless required by applicable law or agreed to in writing,
 								 software distributed under the License is distributed on an
 								 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 								 KIND, either express or implied.  See the License for the
 								 specific language governing permissions and limitations
 								 under the License.
-												[AIRFLOW-2779] Add license headers to doc files (#4178)

This adds ASF license headers to all the .rst and .md files with the
exception of the Pull Request template (as that is included verbatim
when opening a Pull Request on Github which would be messy)
											
										
										
											2018-11-13 17:01:44 +03:00
+								-->
-												Set dags_are_paused_at_creation's default value to True

											
										
										
											2016-03-31 00:32:18 +03:00
+								# Updating Airflow
-												Deprecate *args and **kwargs in BaseOperator

BaseOperator silently accepts any arguments. This deprecates the
behavior with a warning that says it will be forbidden in Airflow 2.0.

This PR also turns on DeprecationWarnings by default, which in turn
revealed that inspect.getargspec is deprecated. Here it is replaced by
`inspect.signature` (Python 3) or `funcsigs.signature` (Python 2).

Lastly, this brought to attention that example_http_operator was
passing an illegal argument.


											
										
										
											2016-04-05 11:04:55 +03:00
+								This file documents any backwards-incompatible changes in Airflow and
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								assists users migrating to a new version.
-												Set dags_are_paused_at_creation's default value to True

											
										
										
											2016-03-31 00:32:18 +03:00
-												[AIRFLOW-5206] Common licence in all .md files, TOC + removed TODO.md (#5809)


											
										
										
											2019-08-22 06:27:54 +03:00
+								<!-- START doctoc generated TOC please keep comment here to allow auto update -->
 								<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
 								**Table of contents**
 								- [Airflow Master](#airflow-master)
 								- [Airflow 1.10.4](#airflow-1104)
 								- [Airflow 1.10.3](#airflow-1103)
 								- [Airflow 1.10.2](#airflow-1102)
 								- [Airflow 1.10.1](#airflow-1101)
 								- [Airflow 1.10](#airflow-110)
 								- [Airflow 1.9](#airflow-19)
 								- [Airflow 1.8.1](#airflow-181)
 								- [Airflow 1.8](#airflow-18)
 								- [Airflow 1.7.1.2](#airflow-1712)
 								<!-- END doctoc generated TOC please keep comment here to allow auto update -->
-												[AIRFLOW-1895] Fix primary key integrity for mysql

sla_miss and task_instances cannot have NULL
execution_dates. The timezone
 migration scripts forgot to set this properly. In
addition to make sure
MySQL does not set "ON UPDATE CURRENT_TIMESTAMP"
or MariaDB "DEFAULT
0000-00-00 00:00:00" we now check if
explicit_defaults_for_timestamp is turned
on and otherwise fail an database upgrade.

Closes #2969, #2857

Closes #2979 from bolkedebruin/AIRFLOW-1895

											
										
										
											2018-01-27 11:01:10 +03:00
+								## Airflow Master
-												[AIRFLOW-4085] FileSensor now takes glob patterns for `filepath` (#5358)


											
										
										
											2019-09-04 13:02:28 +03:00
+								### Changes to FileSensor
 								FileSensor is now takes a glob pattern, not just a filename. If the filename you are looking for has `*`, `?`, or `[` in it then you should replace these with `[*]`, `[?]`, and `[[]`.
-												[AIRFLOW-5274] dag loading duration metric name too long (#5890)


											
										
										
											2019-08-26 23:29:09 +03:00
+								### Change dag loading duration metric name
 								Change DAG file loading duration metric from
 								`dag.loading-duration.<dag_id>` to `dag.loading-duration.<dag_file>`. This is to
 								better handle the case when a DAG file has multiple DAGs.
-												[AIRFLOW-5056] Add argument to filter mails in ImapHook and related operators (#5672)

- changes the order of arguments for `has_mail_attachment`, `retrieve_mail_attachments` and `download_mail_attachments`
- add `get_conn` function
- refactor code
- fix pylint issues
- add imap_mail_filter arg to ImapAttachmentToS3Operator
- add mail_filter arg to ImapAttachmentSensor
- remove superfluous tests
- changes the order of arguments in the sensors + operators __init__
											
										
										
											2019-08-16 20:31:29 +03:00
+								### Changes to ImapHook, ImapAttachmentSensor and ImapAttachmentToS3Operator
 								ImapHook:
 								* The order of arguments has changed for `has_mail_attachment`,
 								`retrieve_mail_attachments` and `download_mail_attachments`.
 								* A new `mail_filter` argument has been added to each of those.
 								ImapAttachmentSensor:
 								* The order of arguments has changed for `__init__`.
 								* A new `mail_filter` argument has been added to `__init__`.
 								ImapAttachmentToS3Operator:
 								* The order of arguments has changed for `__init__`.
 								* A new `imap_mail_filter` argument has been added to `__init__`.
-												[AIRFLOW-4509] SubDagOperator using scheduler instead of backfill (#5498)

Change SubDagOperator to use Airflow scheduler to schedule
tasks in subdags instead of backfill.

In the past, SubDagOperator relies on backfill scheduler
to schedule tasks in the subdags. Tasks in parent DAG
are scheduled via Airflow scheduler while tasks in
a subdag are scheduled via backfill, which complicates
the scheduling logic and adds difficulties to maintain
the two scheduling code path.

This PR simplifies how tasks in subdags are scheduled.
SubDagOperator is reponsible for creating a DagRun for subdag
and wait until all the tasks in the subdag finish. Airflow
scheduler picks up the DagRun created by SubDagOperator,
create andschedule the tasks accordingly.
											
										
										
											2019-08-07 22:17:50 +03:00
+								### Changes to `SubDagOperator`
 								`SubDagOperator` is changed to use Airflow scheduler instead of backfill
 								to schedule tasks in the subdag. User no longer need to specify the executor
 								in `SubDagOperator`.
-												[AIRFLOW-4192] Remove end_date and latest_date from task context (#5725)


											
										
										
											2019-08-07 17:46:21 +03:00
+								### Variables removed from the task instance context
 								The following variables were removed from the task instance context:
 								- end_date
 								- latest_date
 								- tables
-												[AIRFLOW-5128] Move provide_gcp_credential_file decorator to GoogleCloudBaseHook (#5741)


											
										
										
											2019-08-07 08:39:06 +03:00
+								### Moved provide_gcp_credential_file decorator to GoogleCloudBaseHook
 								To simplify the code, the decorator has been moved from the inner-class.
 								Instead of `@GoogleCloudBaseHook._Decorators.provide_gcp_credential_file`,
 								you should write `@GoogleCloudBaseHook.provide_gcp_credential_file`
-												[AIRFLOW-5057] Provide bucket name to functions in S3 Hook when none is specified (#5674)

Note: The order of arguments has changed for `check_for_prefix`.
The `bucket_name` is now optional. It falls back to the `connection schema` attribute.
- refactor code
- complete docs
											
										
										
											2019-07-30 11:05:02 +03:00
+								### Changes to S3Hook
 								Note: The order of arguments has changed for `check_for_prefix`.
 								The `bucket_name` is now optional. It falls back to the `connection schema` attribute.
-												[AIRFLOW-4784] Make GCP operators Pylint compatible (#5432)


											
										
										
											2019-06-19 22:25:14 +03:00
+								### Changes to Google Transfer Operator
 								To obtain pylint compatibility the `filter ` argument in `GcpTransferServiceOperationsListOperator`
 								has been renamed to `request_filter`.
-												[AIRFLOW-4782] Make GCP hooks Pylint compatible (#5431)


											
										
										
											2019-06-24 18:44:13 +03:00
+								### Changes in  Google Cloud Transfer Hook
 								 To obtain pylint compatibility the `filter` argument in `GCPTransferServiceHook.list_transfer_job` and
 								 `GCPTransferServiceHook.list_transfer_operations` has been renamed to `request_filter`.
-												[AIRFLOW-4423] Improve date handling in mysql to gcs operator. (#5196)

* Handle TIME columns
* Ensure DATETIME and TIMESTAMP columns treated as UTC
											
										
										
											2019-06-18 12:18:31 +03:00
+								### Export MySQL timestamps as UTC
 								`MySqlToGoogleCloudStorageOperator` now exports TIMESTAMP columns as UTC
 								by default, rather than using the default timezone of the MySQL server.
 								This is the correct behavior for use with BigQuery, since BigQuery
 								assumes that TIMESTAMP columns without time zones are in UTC. To
 								preserve the previous behavior, set `ensure_utc` to `False.`
-												[AIRFLOW-3998] Use nested commands in cli. (#4821)


											
										
										
											2019-07-19 10:40:14 +03:00
+								### CLI reorganization
 								The Airflow CLI has been organized so that related commands are grouped
 								together as subcommands. The `airflow list_dags` command is now `airflow
 								dags list`, `airflow pause` is `airflow dags pause`, etc. For a complete
 								list of updated CLI commands, see https://airflow.apache.org/cli.html.
-												[AIRFLOW-4313] Remove the Mesos executor (#5115)

* [AIRFLOW-4313] Remove the Mesos executor

* Update UPDATING.md

											
										
										
											2019-04-17 13:28:58 +03:00
+								### Removal of Mesos Executor
-												[AIRFLOW-4423] Improve date handling in mysql to gcs operator. (#5196)

* Handle TIME columns
* Ensure DATETIME and TIMESTAMP columns treated as UTC
											
										
										
											2019-06-18 12:18:31 +03:00
-												[AIRFLOW-XXX] Update Mailing List link for removing Mesos Executor (#5476)


											
										
										
											2019-06-25 13:08:29 +03:00
+								The Mesos Executor is removed from the code base as it was not widely used and not maintained. [Mailing List Discussion on deleting it](https://lists.apache.org/thread.html/daa9500026b820c6aaadeffd66166eae558282778091ebbc68819fb7@%3Cdev.airflow.apache.org%3E).
-												[AIRFLOW-4313] Remove the Mesos executor (#5115)

* [AIRFLOW-4313] Remove the Mesos executor

* Update UPDATING.md

											
										
										
											2019-04-17 13:28:58 +03:00
-												[AIRFLOW-3934] Increase standard Dataproc PD size (#4749)


											
										
										
											2019-04-15 21:02:47 +03:00
+								### Increase standard Dataproc disk sizes
 								It is highly recommended to have 1TB+ disk size for Dataproc to have sufficient throughput:
 								https://cloud.google.com/compute/docs/disks/performance
 								Hence, the default value for `master_disk_size` in DataprocClusterCreateOperator has beeen changes from 500GB to 1TB.
-												[AIRFLOW-3993] Add tests for salesforce hook (#4829)

- refactor code
- update docs
- change sign_in to get_conn
- add salesforce to devel_all packages
- add note to UPDATING.md

Co-Authored-By: mik-laj <mik-laj@users.noreply.github.com>
											
										
										
											2019-04-14 22:07:43 +03:00
+								### Changes to SalesforceHook
 								* renamed `sign_in` function to `get_conn`
-												[AIRFLOW-2421] HTTPHook verifies HTTPS certificats by default (#4855)

Change the default value of verify from False to True

											
										
										
											2019-04-11 15:33:33 +03:00
+								### HTTPHook verify default value changed from False to True.
 								The HTTPHook is now secured by default: `verify=True`.
 								This can be overwriten by using the extra_options param as `{'verify': False}`.
-												[AIRFLOW-4255] Replace Discovery based api with client based for GCS (#5054)


											
										
										
											2019-04-09 21:46:00 +03:00
+								### Changes to GoogleCloudStorageHook
-												[AIRFLOW-4334] Remove deprecated GCS features & Rename built-in params (#5087)


											
										
										
											2019-04-18 17:38:41 +03:00
+								* The following parameters have been replaced in all the methods in GCSHook:
 								  * `bucket` is changed to `bucket_name`
 								  * `object` is changed to `object_name`
 								* The `maxResults` parameter in `GoogleCloudStorageHook.list` has been renamed to `max_results` for consistency.
-												[AIRFLOW-4255] Replace Discovery based api with client based for GCS (#5054)


											
										
										
											2019-04-09 21:46:00 +03:00
-												[AIRFLOW-4220] Change CloudantHook to a new major version and add tests (#5023)

- upgrade cloudant version from `>=0.5.9,<2.0` to `>=2.0`
- remove the use of the `schema` attribute in the connection
- remove `db` function since the database object can also be retrieved by calling `cloudant_session['database_name']`
- update docs
- refactor code
											
										
										
											2019-04-06 00:49:25 +03:00
+								### Changes to CloudantHook
 								* upgraded cloudant version from `>=0.5.9,<2.0` to `>=2.0`
 								* removed the use of the `schema` attribute in the connection
 								* removed `db` function since the database object can also be retrieved by calling `cloudant_session['database_name']`
 								For example:
 								```python
 								from airflow.contrib.hooks.cloudant_hook import CloudantHook
 								with CloudantHook().get_conn() as cloudant_session:
 								    database = cloudant_session['database_name']
 								```
 								See the [docs](https://python-cloudant.readthedocs.io/en/latest/) for more information on how to use the new cloudant version.
 								### Unify default conn_id for Google Cloud Platform
-												[AIRFLOW-3987] Unify GCP's Connection IDs (#4818)


											
										
										
											2019-03-25 14:03:26 +03:00
 								Previously not all hooks and operators related to Google Cloud Platform use
 								``google_cloud_default`` as a default conn_id. There is currently one default
 								variant. Values like ``google_cloud_storage_default``, ``bigquery_default``,
 								``google_cloud_datastore_default`` have been deprecated. The configuration of
 								existing relevant connections in the database have been preserved. To use those
 								deprecated GCP conn_id, you need to explicitly pass their conn_id into
 								operators/hooks. Otherwise, ``google_cloud_default`` will be used as GCP's conn_id
 								by default.
-												[AIRFLOW-4020] Remove viewer DAG edit permissions (#4845)


											
										
										
											2019-03-06 02:21:26 +03:00
+								### Viewer won't have edit permissions on DAG view.
-												[AIRFLOW-3932] Optionally skip dag discovery heuristic. (#4746)


											
										
										
											2019-02-23 19:55:34 +03:00
+								### New `dag_discovery_safe_mode` config option
 								If `dag_discovery_safe_mode` is enabled, only check files for DAGs if
 								they contain the strings "airflow" and "DAG". For backwards
 								compatibility, this option is enabled by default.
-												[AIRFLOW-XXX] Add notes for imports and sensors (#4698)

- Add note for the removal of deprecated import mechanism
- Add note for changes to sensor imports
											
										
										
											2019-02-12 18:00:13 +03:00
+								### Removed deprecated import mechanism
 								The deprecated import mechanism has been removed so the import of modules becomes more consistent and explicit.
-												[AIRFLOW-3934] Increase standard Dataproc PD size (#4749)


											
										
										
											2019-04-15 21:02:47 +03:00
+								For example: `from airflow.operators import BashOperator`
-												[AIRFLOW-XXX] Add notes for imports and sensors (#4698)

- Add note for the removal of deprecated import mechanism
- Add note for changes to sensor imports
											
										
										
											2019-02-12 18:00:13 +03:00
+								becomes `from airflow.operators.bash_operator import BashOperator`
 								### Changes to sensor imports
 								Sensors are now accessible via `airflow.sensors` and no longer via `airflow.operators.sensors`.
-												[AIRFLOW-3934] Increase standard Dataproc PD size (#4749)


											
										
										
											2019-04-15 21:02:47 +03:00
+								For example: `from airflow.operators.sensors import BaseSensorOperator`
-												[AIRFLOW-XXX] Add notes for imports and sensors (#4698)

- Add note for the removal of deprecated import mechanism
- Add note for changes to sensor imports
											
										
										
											2019-02-12 18:00:13 +03:00
+								becomes `from airflow.sensors.base_sensor_operator import BaseSensorOperator`
-												[AIRFLOW-3933] Fix various typos (#4747)

Fix typos

											
										
										
											2019-02-21 13:50:05 +03:00
+								### Renamed "extra" requirements for cloud providers
-												[AIRFLOW-3707] Group subpackages/extras by cloud providers (#4524)


											
										
										
											2019-02-08 13:23:32 +03:00
 								Subpackages for specific services have been combined into one variant for
-												[AIRFLOW-3867] Rename GCP's subpackage (#4690)


											
										
										
											2019-02-27 17:21:57 +03:00
+								each cloud provider. The name of the subpackage for the Google Cloud Platform
 								has changed to follow style.
-												[AIRFLOW-3707] Group subpackages/extras by cloud providers (#4524)


											
										
										
											2019-02-08 13:23:32 +03:00
 								If you want to install integration for Microsoft Azure, then instead of
 								```
-												[AIRFLOW-4062] Improve docs on install extra package commands (#4966)

Some command for installing extra packages like
`pip install apache-airflow[devel]` cause error
in special situation/shell, We should clear them
by add quotation like
`pip install 'apache-airflow[devel]'`
											
										
										
											2019-03-25 15:14:43 +03:00
+								pip install 'apache-airflow[azure_blob_storage,azure_data_lake,azure_cosmos,azure_container_instances]'
-												[AIRFLOW-3707] Group subpackages/extras by cloud providers (#4524)


											
										
										
											2019-02-08 13:23:32 +03:00
+								```
-												[AIRFLOW-4062] Improve docs on install extra package commands (#4966)

Some command for installing extra packages like
`pip install apache-airflow[devel]` cause error
in special situation/shell, We should clear them
by add quotation like
`pip install 'apache-airflow[devel]'`
											
										
										
											2019-03-25 15:14:43 +03:00
+								you should execute `pip install 'apache-airflow[azure]'`
-												[AIRFLOW-3707] Group subpackages/extras by cloud providers (#4524)


											
										
										
											2019-02-08 13:23:32 +03:00
 								If you want to install integration for Amazon Web Services, then instead of
-												[AIRFLOW-4062] Improve docs on install extra package commands (#4966)

Some command for installing extra packages like
`pip install apache-airflow[devel]` cause error
in special situation/shell, We should clear them
by add quotation like
`pip install 'apache-airflow[devel]'`
											
										
										
											2019-03-25 15:14:43 +03:00
+								`pip install 'apache-airflow[s3,emr]'`, you should execute `pip install 'apache-airflow[aws]'`
-												[AIRFLOW-3707] Group subpackages/extras by cloud providers (#4524)


											
										
										
											2019-02-08 13:23:32 +03:00
-												[AIRFLOW-3867] Rename GCP's subpackage (#4690)


											
										
										
											2019-02-27 17:21:57 +03:00
+								If you want to install integration for Google Cloud Platform, then instead of
-												[AIRFLOW-4062] Improve docs on install extra package commands (#4966)

Some command for installing extra packages like
`pip install apache-airflow[devel]` cause error
in special situation/shell, We should clear them
by add quotation like
`pip install 'apache-airflow[devel]'`
											
										
										
											2019-03-25 15:14:43 +03:00
+								`pip install 'apache-airflow[gcp_api]'`, you should execute `pip install 'apache-airflow[gcp]'`.
-												[AIRFLOW-3867] Rename GCP's subpackage (#4690)


											
										
										
											2019-02-27 17:21:57 +03:00
+								The old way will work until the release of Airflow 2.1.
-												[AIRFLOW-3707] Group subpackages/extras by cloud providers (#4524)


											
										
										
											2019-02-08 13:23:32 +03:00
-												[AIRFLOW-3303] Deprecate old UI in favor of FAB (#4339)


											
										
										
											2019-01-14 17:33:45 +03:00
+								### Deprecate legacy UI in favor of FAB RBAC UI
 								Previously we were using two versions of UI, which were hard to maintain as we need to implement/update the same feature
-												[AIRFLOW-3713] Updated documentation for GCP optional project_id (#4541)


											
										
										
											2019-01-17 20:37:07 +03:00
+								in both versions. With this change we've removed the older UI in favor of Flask App Builder RBAC UI. No need to set the
 								RBAC UI explicitly in the configuration now as this is the only default UI.
-												[AIRFLOW-3303] Deprecate old UI in favor of FAB (#4339)


											
										
										
											2019-01-14 17:33:45 +03:00
+								Please note that that custom auth backends will need re-writing to target new FAB based UI.
-												[AIRFLOW-3793] Decommission configuration items for Flask-Admin web UI & related codes (#4637)


											
										
										
											2019-03-04 18:13:29 +03:00
+								As part of this change, a few configuration items in `[webserver]` section are removed and no longer applicable,
 								including `authenticate`, `filter_by_owner`, `owner_mode`, and `rbac`.
-												[AIRFLOW-3303] Deprecate old UI in favor of FAB (#4339)


											
										
										
											2019-01-14 17:33:45 +03:00
-												[AIRFLOW-3515] Remove the run_duration option (#4320)


											
										
										
											2019-01-08 13:40:10 +03:00
+								#### Remove run_duration
 								We should not use the `run_duration` option anymore. This used to be for restarting the scheduler from time to time, but right now the scheduler is getting more stable and therefore using this setting is considered bad and might cause an inconsistent state.
-												[Airflow-2760] Decouple DAG parsing loop from scheduler loop (#3873)


											
										
										
											2018-10-26 11:37:10 +03:00
+								### New `dag_processor_manager_log_location` config option
 								The DAG parsing manager log now by default will be log into a file, where its location is
 								controlled by the new `dag_processor_manager_log_location` config option in core section.
-												[AIRFLOW-2895] Prevent scheduler from spamming heartbeats/logs

Reverts most of AIRFLOW-2027 until the issues with
it can be fixed.

Closes #3747 from
aoen/revert_min_file_parsing_time_commit

											
										
										
											2018-08-20 16:14:22 +03:00
+								### min_file_parsing_loop_time config option temporarily disabled
 								The scheduler.min_file_parsing_loop_time config option has been temporarily removed due to
 								some bugs.
-												[AIRFLOW-3130] Add CLI docs for users command

											
										
										
											2018-09-28 18:46:57 +03:00
+								### CLI Changes
 								The ability to manipulate users from the command line has been changed. 'airflow create_user' and 'airflow delete_user' and 'airflow list_users' has been grouped to a single command `airflow users` with optional flags `--create`, `--list` and `--delete`.
 								Example Usage:
 								To create a new user:
 								```bash
 								airflow users --create --username jondoe --lastname doe --firstname jon --email jdoe@apache.org --role Viewer --password test
 								```
 								To list users:
 								```bash
 								airflow users --list
 								```
 								To delete a user:
 								```bash
-												[AIRFLOW-XXX] Add Delete for CLI Example in UPDATING.md
											
										
										
											2018-09-28 19:40:33 +03:00
+								airflow users --delete --username jondoe
-												[AIRFLOW-3130] Add CLI docs for users command

											
										
										
											2018-09-28 18:46:57 +03:00
+								```
-												[AIRFLOW-3752] Add/remove user from role via CLI (#4572)

* [AIRFLOW-3752] Add/remove user from role via the CLI

Update the `users` subcommand to enable 2 new actions:

- `--add-role`: Make the user a member of the given role
- `--remove-role`: Remove the user's membership in the given role

For installations that use an external identity provider (e.g., Google
OAuth) the username is typically a long ID string. For the sake of
convenience, we allow the CLI operator to reference the target user
via either their `username` or their `email` (but not both).

* Update argparse spec

Accidentally left off this update to the argparse spec in the last
commit.

* Add unit tests

* Fix lint failures

											
										
										
											2019-01-23 01:29:15 +03:00
+								To add a user to a role:
 								```bash
 								airflow users --add-role --username jondoe --role Public
 								```
 								To remove a user from a role:
 								```bash
 								airflow users --remove-role --username jondoe --role Public
 								```
-												[AIRFLOW-3249] Make all take the same named `do_xcom_push` flag (#4345)


											
										
										
											2019-02-15 18:25:41 +03:00
+								### Unification of `do_xcom_push` flag
 								The `do_xcom_push` flag (a switch to push the result of an operator to xcom or not) was appearing in different incarnations in different operators. It's function has been unified under a common name (`do_xcom_push`) on `BaseOperator`. This way it is also easy to globally disable pushing results to xcom.
 								See [AIRFLOW-3249](https://jira.apache.org/jira/browse/AIRFLOW-3249) to check if your operator was affected.
-												[AIRFLOW-4471] Dataproc operator templated fields improvements (#5250)


											
										
										
											2019-05-09 11:56:08 +03:00
+								### Changes to Dataproc related Operators
 								The 'properties' and 'jars' properties for the Dataproc related operators (`DataprocXXXOperator`) have been renamed from
 								`dataproc_xxxx_properties` and `dataproc_xxx_jars`  to `dataproc_properties`
 								and `dataproc_jars`respectively.
 								Arguments for dataproc_properties dataproc_jars
-												[AIRFLOW-XXX] Update changelog and updating for 1.10.4 (#5739)


											
										
										
											2019-08-06 23:54:12 +03:00
+								## Airflow 1.10.4
 								### Python 2 support is going away
 								Airflow 1.10 will be the last release series to support Python 2. Airflow 2.0.0 will only support Python 3.5 and up.
 								If you have a specific task that still requires Python 2 then you can use the PythonVirtualenvOperator for this.
 								### Changes to DatastoreHook
 								* removed argument `version` from `get_conn` function and added it to the hook's `__init__` function instead and renamed it to `api_version`
 								* renamed the `partialKeys` argument of function `allocate_ids` to `partial_keys`
 								### Changes to GoogleCloudStorageHook
 								* the discovery-based api (`googleapiclient.discovery`) used in `GoogleCloudStorageHook` is now replaced by the recommended client based api (`google-cloud-storage`). To know the difference between both the libraries, read https://cloud.google.com/apis/docs/client-libraries-explained. PR: [#5054](https://github.com/apache/airflow/pull/5054)
 								* as a part of this replacement, the `multipart` & `num_retries` parameters for `GoogleCloudStorageHook.upload` method have been deprecated.
 								  The client library uses multipart upload automatically if the object/blob size is more than 8 MB - [source code](https://github.com/googleapis/google-cloud-python/blob/11c543ce7dd1d804688163bc7895cf592feb445f/storage/google/cloud/storage/blob.py#L989-L997). The client also handles retries automatically
 								* the `generation` parameter is deprecated in `GoogleCloudStorageHook.delete` and `GoogleCloudStorageHook.insert_object_acl`.
 								Updating to `google-cloud-storage >= 1.16` changes the signature of the upstream `client.get_bucket()` method from `get_bucket(bucket_name: str)` to `get_bucket(bucket_or_name: Union[str, Bucket])`. This method is not directly exposed by the airflow hook, but any code accessing the connection directly (`GoogleCloudStorageHook().get_conn().get_bucket(...)` or similar) will need to be updated.
 								### Changes in writing Logs to Elasticsearch
 								The `elasticsearch_` prefix has been removed from all config items under the `[elasticsearch]` section. For example `elasticsearch_host` is now just `host`.
 								### Removal of `non_pooled_task_slot_count` and `non_pooled_backfill_task_slot_count`
 								`non_pooled_task_slot_count` and `non_pooled_backfill_task_slot_count`
 								are removed in favor of a real pool, e.g. `default_pool`.
 								By default tasks are running in `default_pool`.
 								`default_pool` is initialized with 128 slots and user can change the
 								number of slots through UI/CLI. `default_pool` cannot be removed.
 								### `pool` config option in Celery section to support different Celery pool implementation
 								The new `pool` config option allows users to choose different pool
 								implementation. Default value is "prefork", while choices include "prefork" (default),
 								"eventlet", "gevent" or "solo". This may help users achieve better concurrency performance
 								in different scenarios.
 								For more details about Celery pool implementation, please refer to:
 								- https://docs.celeryproject.org/en/latest/userguide/workers.html#concurrency
 								- https://docs.celeryproject.org/en/latest/userguide/concurrency/eventlet.html
-												[AIRFLOW-XXX] CHANGELOG and UPDATING for 1.10.3

											
										
										
											2019-04-06 12:04:23 +03:00
+								## Airflow 1.10.3
 								### RedisPy dependency updated to v3 series
 								If you are using the Redis Sensor or Hook you may have to update your code. See
 								[redis-py porting instructions] to check if your code might be affected (MSET,
 								MSETNX, ZADD, and ZINCRBY all were, but read the full doc).
 								[redis-py porting instructions]: https://github.com/andymccurdy/redis-py/tree/3.2.0#upgrading-from-redis-py-2x-to-30
 								### SLUGIFY_USES_TEXT_UNIDECODE or AIRFLOW_GPL_UNIDECODE no longer required
 								It is no longer required to set one of the environment variables to avoid
 								a GPL dependency. Airflow will now always use text-unidecode if unidecode
 								was not installed before.
 								### new `sync_parallelism` config option in celery section
 								The new `sync_parallelism` config option will control how many processes CeleryExecutor will use to
 								fetch celery task state in parallel. Default value is max(1, number of cores - 1)
 								### Rename of BashTaskRunner to StandardTaskRunner
 								BashTaskRunner has been renamed to StandardTaskRunner. It is the default task runner
 								so you might need to update your config.
 								`task_runner = StandardTaskRunner`
 								### Modification to config file discovery
 								If the `AIRFLOW_CONFIG` environment variable was not set and the
 								`~/airflow/airflow.cfg` file existed, airflow previously used
 								`~/airflow/airflow.cfg` instead of `$AIRFLOW_HOME/airflow.cfg`. Now airflow
 								will discover its config file using the `$AIRFLOW_CONFIG` and `$AIRFLOW_HOME`
 								environment variables rather than checking for the presence of a file.
 								### New `dag_discovery_safe_mode` config option
 								If `dag_discovery_safe_mode` is enabled, only check files for DAGs if
 								they contain the strings "airflow" and "DAG". For backwards
 								compatibility, this option is enabled by default.
 								### Changes in Google Cloud Platform related operators
 								Most GCP-related operators have now optional `PROJECT_ID` parameter. In case you do not specify it,
 								the project id configured in
 								[GCP Connection](https://airflow.apache.org/howto/manage-connections.html#connection-type-gcp) is used.
 								There will be an `AirflowException` thrown in case `PROJECT_ID` parameter is not specified and the
 								connection used has no project id defined. This change should be  backwards compatible as earlier version
 								of the operators had `PROJECT_ID` mandatory.
 								Operators involved:
 								  * GCP Compute Operators
 								    * GceInstanceStartOperator
 								    * GceInstanceStopOperator
 								    * GceSetMachineTypeOperator
 								  * GCP Function Operators
 								    * GcfFunctionDeployOperator
 								  * GCP Cloud SQL Operators
 								    * CloudSqlInstanceCreateOperator
 								    * CloudSqlInstancePatchOperator
 								    * CloudSqlInstanceDeleteOperator
 								    * CloudSqlInstanceDatabaseCreateOperator
 								    * CloudSqlInstanceDatabasePatchOperator
 								    * CloudSqlInstanceDatabaseDeleteOperator
 								Other GCP operators are unaffected.
 								### Changes in Google Cloud Platform related hooks
 								The change in GCP operators implies that GCP Hooks for those operators require now keyword parameters rather
 								than positional ones in all methods where `project_id` is used. The methods throw an explanatory exception
 								in case they are called using positional parameters.
 								Hooks involved:
 								  * GceHook
 								  * GcfHook
 								  * CloudSqlHook
 								Other GCP hooks are unaffected.
-												[AIRFLOW-3997] Extend Variable.get so it can return None when var not found (#4819)

This will not change existing regular functions in the `Variable` class. If
variable `foo` doesn't exist:

```
foo = Variable.get("foo")
-> KeyError
```

For passing `default_var=None` to get, `None` is returned instead:
```
foo = Variable.get("foo", default_var=None)
if foo is None:
    handle_missing_foo()
```
											
										
										
											2019-03-13 19:14:28 +03:00
+								### Changed behaviour of using default value when accessing variables
 								It's now possible to use `None` as a default value with the `default_var` parameter when getting a variable, e.g.
 								```python
 								foo = Variable.get("foo", default_var=None)
 								if foo is None:
 								    handle_missing_foo()
 								```
 								(Note: there is already `Variable.setdefault()` which me be helpful in some cases.)
-												[AIRFLOW-XXX] CHANGELOG and UPDATING for 1.10.3

											
										
										
											2019-04-06 12:04:23 +03:00
+								This changes the behaviour if you previously explicitly provided `None` as a default value. If your code expects a `KeyError` to be thrown, then don't pass the `default_var` argument.
-												[AIRFLOW-3997] Extend Variable.get so it can return None when var not found (#4819)

This will not change existing regular functions in the `Variable` class. If
variable `foo` doesn't exist:

```
foo = Variable.get("foo")
-> KeyError
```

For passing `default_var=None` to get, `None` is returned instead:
```
foo = Variable.get("foo", default_var=None)
if foo is None:
    handle_missing_foo()
```
											
										
										
											2019-03-13 19:14:28 +03:00
-												[AIRFLOW-3743] Unify different methods of working out AIRFLOW_HOME (#4705)

There were a few ways of getting the AIRFLOW_HOME directory used
throughout the code base, giving possibly conflicting answer if they
weren't kept in sync:

- the AIRFLOW_HOME environment variable
- core/airflow_home from the config
- settings.AIRFLOW_HOME
- configuration.AIRFLOW_HOME

Since the home directory is used to compute the default path of the
config file to load, specifying the home directory Again in the config
file didn't make any sense to me, and I have deprecated that.

This commit makes everything in the code base use
`settings.AIRFLOW_HOME` as the source of truth, and deprecates the
core/airflow_home config option.

There was an import cycle form settings -> logging_config ->
module_loading -> settings that needed to be broken on Python 2 - so I
have moved all adjusting of sys.path in to the settings module

(This issue caused me a problem where the RBAC UI wouldn't work as it
didn't find the right webserver_config.py)
											
										
										
											2019-03-25 14:10:28 +03:00
+								### Removal of `airflow_home` config setting
 								There were previously two ways of specifying the Airflow "home" directory
 								(`~/airflow` by default): the `AIRFLOW_HOME` environment variable, and the
 								`airflow_home` config setting in the `[core]` section.
 								If they had two different values different parts of the code base would end up
 								with different values. The config setting has been deprecated, and you should
 								remove the value from the config file and set `AIRFLOW_HOME` environment
 								variable if you need to use a non default value for this.
 								(Since this setting is used to calculate what config file to load, it is not
 								possible to keep just the config option)
-												[AIRFLOW-3249] Make all take the same named `do_xcom_push` flag (#4345)


											
										
										
											2019-02-15 18:25:41 +03:00
-												[AIRFLOW-XXX] CHANGELOG and UPDATING for 1.10.3

											
										
										
											2019-04-06 12:04:23 +03:00
+								### Change of two methods signatures in `GCPTransferServiceHook`
 								The signature of the `create_transfer_job` method in `GCPTransferServiceHook`
 								class has changed. The change does not change the behavior of the method.
 								Old signature:
 								```python
 								def create_transfer_job(self, description, schedule, transfer_spec, project_id=None):
 								```
 								New signature:
 								```python
 								def create_transfer_job(self, body):
 								```
 								It is necessary to rewrite calls to method. The new call looks like this:
 								```python
 								body = {
 								  'status': 'ENABLED',
 								  'projectId': project_id,
 								  'description': description,
 								  'transferSpec': transfer_spec,
 								  'schedule': schedule,
 								}
 								gct_hook.create_transfer_job(body)
 								```
 								The change results from the unification of all hooks and adjust to
 								[the official recommendations](https://lists.apache.org/thread.html/e8534d82be611ae7bcb21ba371546a4278aad117d5e50361fd8f14fe@%3Cdev.airflow.apache.org%3E)
 								for the Google Cloud Platform.
 								The signature of `wait_for_transfer_job` method in `GCPTransferServiceHook` has changed.
 								Old signature:
 								```python
 								def wait_for_transfer_job(self, job):
 								```
 								New signature:
 								```python
 								def wait_for_transfer_job(self, job, expected_statuses=(GcpTransferOperationStatus.SUCCESS, )):
 								```
 								The behavior of `wait_for_transfer_job` has changed:
 								Old behavior:
 								`wait_for_transfer_job` would wait for the SUCCESS status in specified jobs operations.
 								New behavior:
 								You can now specify an array of expected statuses. `wait_for_transfer_job` now waits for any of them.
 								The default value of `expected_statuses` is SUCCESS so that change is backwards compatible.
 								### Moved two classes to different modules
 								The class `GoogleCloudStorageToGoogleCloudStorageTransferOperator` has been moved from
 								`airflow.contrib.operators.gcs_to_gcs_transfer_operator` to `airflow.contrib.operators.gcp_transfer_operator`
 								the class `S3ToGoogleCloudStorageTransferOperator` has been moved from
 								`airflow.contrib.operators.s3_to_gcs_transfer_operator` to `airflow.contrib.operators.gcp_transfer_operator`
 								The change was made to keep all the operators related to GCS Transfer Services in one file.
 								The previous imports will continue to work until Airflow 2.0
 								### Fixed typo in --driver-class-path in SparkSubmitHook
 								The `driver_classapth` argument  to SparkSubmit Hook and Operator was
 								generating `--driver-classpath` on the spark command line, but this isn't a
 								valid option to spark.
 								The argument has been renamed to `driver_class_path`  and  the option it
 								generates has been fixed.
-												[AIRFLOW-XXX] Update the UPDATING.md file for 1.10.2

											
										
										
											2019-01-23 04:03:45 +03:00
+								## Airflow 1.10.2
-												[AIRFLOW-3771] Minor refactor securityManager (#4594)


											
										
										
											2019-01-27 09:49:58 +03:00
+								### DAG level Access Control for new RBAC UI
 								Extend and enhance new Airflow RBAC UI to support DAG level ACL. Each dag now has two permissions(one for write, one for read) associated('can_dag_edit', 'can_dag_read').
 								The admin will create new role, associate the dag permission with the target dag and assign that role to users. That user can only access / view the certain dags on the UI
 								that he has permissions on. If a new role wants to access all the dags, the admin could associate dag permissions on an artificial view(``all_dags``) with that role.
 								We also provide a new cli command(``sync_perm``) to allow admin to auto sync permissions.
-												[AIRFLOW-XXX] Update the UPDATING.md file for 1.10.2

											
										
										
											2019-01-23 04:03:45 +03:00
+								### Modification to `ts_nodash` macro
-												[AIRFLOW-3933] Fix various typos (#4747)

Fix typos

											
										
										
											2019-02-21 13:50:05 +03:00
+								`ts_nodash` previously contained TimeZone information along with execution date. For Example: `20150101T000000+0000`. This is not user-friendly for file or folder names which was a popular use case for `ts_nodash`. Hence this behavior has been changed and using `ts_nodash` will no longer contain TimeZone information, restoring the pre-1.10 behavior of this macro. And a new macro `ts_nodash_with_tz` has been added which can be used to get a string with execution date and timezone info without dashes.
-												[AIRFLOW-XXX] Update the UPDATING.md file for 1.10.2

											
										
										
											2019-01-23 04:03:45 +03:00
 								Examples:
 								  * `ts_nodash`: `20150101T000000`
 								  * `ts_nodash_with_tz`: `20150101T000000+0000`
 								### Semantics of next_ds/prev_ds changed for manually triggered runs
 								next_ds/prev_ds now map to execution_date instead of the next/previous schedule-aligned execution date for DAGs triggered in the UI.
-												[AIRFLOW-1552] Airflow Filter_by_owner not working with password_auth (#4276)

Local users were always a superuser, this adds a column to the DB (and defaults to false,
which is going to cause a bit of an upgrade pain for people, but defaulting to not being an
admin is the only secure default.)
											
										
										
											2018-12-15 18:27:10 +03:00
+								### User model changes
 								This patch changes the `User.superuser` field from a hardcoded boolean to a `Boolean()` database column. `User.superuser` will default to `False`, which means that this privilege will have to be granted manually to any users that may require it.
 								For example, open a Python shell and
 								```python
 								from airflow import models, settings
 								session = settings.Session()
 								users = session.query(models.User).all()  # [admin, regular_user]
 								users[1].superuser  # False
 								admin = users[0]
 								admin.superuser = True
 								session.add(admin)
 								session.commit()
 								```
-												[AIRFLOW-3103][AIRFLOW-3147] Update flask-appbuilder (#3937)


											
										
										
											2018-10-04 10:20:24 +03:00
+								### Custom auth backends interface change
 								We have updated the version of flask-login we depend upon, and as a result any
 								custom auth backends might need a small change: `is_active`,
 								`is_authenticated`, and `is_anonymous` should now be properties. What this means is if
 								previously you had this in your user class
 								    def is_active(self):
 								      return self.active
 								then you need to change it like this
 								    @property
 								    def is_active(self):
 								      return self.active
-												[AIRFLOW-XXX] Clarify documentation related to autodetect parameter in GCS_to_BQ Op (#5294)


											
										
										
											2019-05-25 00:36:47 +03:00
 								### Support autodetected schemas to GoogleCloudStorageToBigQueryOperator
 								GoogleCloudStorageToBigQueryOperator is now support schema auto-detection is available when you load data into BigQuery. Unfortunately, changes can be required.
 								If BigQuery tables are created outside of airflow and the schema is not defined in the task, multiple options are available:
 								define a schema_fields:
 								    gcs_to_bq.GoogleCloudStorageToBigQueryOperator(
 								      ...
 								      schema_fields={...})
 								or define a schema_object:
 								    gcs_to_bq.GoogleCloudStorageToBigQueryOperator(
 								      ...
 								      schema_object='path/to/schema/object)
 								or enabled autodetect of schema:
 								    gcs_to_bq.GoogleCloudStorageToBigQueryOperator(
 								      ...
 								      autodetect=True)
-												[AIRFLOW-3103][AIRFLOW-3147] Update flask-appbuilder (#3937)


											
										
										
											2018-10-04 10:20:24 +03:00
-												[AIRFLOW-XXX] Update the UPDATING.md file for 1.10.2

											
										
										
											2019-01-23 04:03:45 +03:00
+								## Airflow 1.10.1
 								### StatsD Metrics
 								The `scheduler_heartbeat` metric has been changed from a gauge to a counter. Each loop of the scheduler will increment the counter by 1. This provides a higher degree of visibility and allows for better integration with Prometheus using the [StatsD Exporter](https://github.com/prometheus/statsd_exporter). The scheduler's activity status can be determined by graphing and alerting using a rate of change of the counter. If the scheduler goes down, the rate will drop to 0.
-												[AIRFLOW-3197] EMRHook is missing new parameters of the AWS API (#4044)

Allow passing any params to the CreateJobFlow API, so that we don't have
to stay up to date with AWS api changes.
											
										
										
											2018-10-13 09:25:57 +03:00
+								### EMRHook now passes all of connection's extra to CreateJobFlow API
 								EMRHook.create_job_flow has been changed to pass all keys to the create_job_flow API, rather than
 								just specific known keys for greater flexibility.
 								However prior to this release the "emr_default" sample connection that was created had invalid
 								configuration, so creating EMR clusters might fail until your connection is updated. (Ec2KeyName,
 								Ec2SubnetId, TerminationProtection and KeepJobFlowAliveWhenNoSteps were all top-level keys when they
 								should be inside the "Instances" dict)
-												[AIRFLOW-3164] Verify server certificate when connecting to LDAP (#4006)


											
										
										
											2018-11-09 16:58:34 +03:00
+								### LDAP Auth Backend now requires TLS
-												[AIRFLOW-XXX] Correct typos in UPDATING.md (#4242)

Started with "habe", "serever" and "certificiate" needing to be:
  "have", "server", and "certificate".
Ran a check, ignoring British and US accepted spellings.
Kept jargon. EG admin, aync, auth, backend, config, dag, s3, utils, etc.
Took exception to: "num of dag run" meaning "number of dag runs",
  "upness" is normally for quarks,
  "url" being lower-case, and
  sftp example having an excess file ending.
Python documentation writes "builtin" hyphenated, cases "PYTHONPATH".
Gave up on mixed use of "dag" and "DAG" as well as long line lengths.
											
										
										
											2018-11-27 09:14:24 +03:00
+								Connecting to an LDAP server over plain text is not supported anymore. The
-												[AIRFLOW-3164] Verify server certificate when connecting to LDAP (#4006)


											
										
										
											2018-11-09 16:58:34 +03:00
+								certificate presented by the LDAP server must be signed by a trusted
-												[AIRFLOW-XXX] Correct typos in UPDATING.md (#4242)

Started with "habe", "serever" and "certificiate" needing to be:
  "have", "server", and "certificate".
Ran a check, ignoring British and US accepted spellings.
Kept jargon. EG admin, aync, auth, backend, config, dag, s3, utils, etc.
Took exception to: "num of dag run" meaning "number of dag runs",
  "upness" is normally for quarks,
  "url" being lower-case, and
  sftp example having an excess file ending.
Python documentation writes "builtin" hyphenated, cases "PYTHONPATH".
Gave up on mixed use of "dag" and "DAG" as well as long line lengths.
											
										
										
											2018-11-27 09:14:24 +03:00
+								certificate, or you must provide the `cacert` option under `[ldap]` in the
-												[AIRFLOW-3164] Verify server certificate when connecting to LDAP (#4006)


											
										
										
											2018-11-09 16:58:34 +03:00
+								config file.
-												[AIRFLOW-XXX] Correct typos in UPDATING.md (#4242)

Started with "habe", "serever" and "certificiate" needing to be:
  "have", "server", and "certificate".
Ran a check, ignoring British and US accepted spellings.
Kept jargon. EG admin, aync, auth, backend, config, dag, s3, utils, etc.
Took exception to: "num of dag run" meaning "number of dag runs",
  "upness" is normally for quarks,
  "url" being lower-case, and
  sftp example having an excess file ending.
Python documentation writes "builtin" hyphenated, cases "PYTHONPATH".
Gave up on mixed use of "dag" and "DAG" as well as long line lengths.
											
										
										
											2018-11-27 09:14:24 +03:00
+								If you want to use LDAP auth backend without TLS then you will have to create a
-												[AIRFLOW-3164] Verify server certificate when connecting to LDAP (#4006)


											
										
										
											2018-11-09 16:58:34 +03:00
+								custom-auth backend based on
-												[AIRFLOW-3612] Remove incubation/incubator mention (#4419)


											
										
										
											2019-01-05 17:05:25 +03:00
+								https://github.com/apache/airflow/blob/1.10.0/airflow/contrib/auth/backends/ldap_auth.py
-												[AIRFLOW-3164] Verify server certificate when connecting to LDAP (#4006)


											
										
										
											2018-11-09 16:58:34 +03:00
-												[AIRFLOW-2817] Force explicit choice on GPL dependency (#3660)

By default one of Apache Airflow's dependencies pulls in a GPL
library. Airflow should not install (and upgrade) without an explicit choice.

This is part of the Apache requirements as we cannot depend on Category X
software.
											
										
										
											2018-08-01 12:25:31 +03:00
+								## Airflow 1.10
 								Installation and upgrading requires setting `SLUGIFY_USES_TEXT_UNIDECODE=yes` in your environment or
 								`AIRFLOW_GPL_UNIDECODE=yes`. In case of the latter a GPL runtime dependency will be installed due to a
 								dependency (python-nvd3 -> python-slugify -> unidecode).
-												[AIRFLOW-2716] Replace async and await py3.7 keywords

Closes #3578 from JacobHayes/py37-keywords

											
										
										
											2018-07-29 12:56:41 +03:00
+								### Replace DataProcHook.await calls to DataProcHook.wait
 								The method name was changed to be compatible with the Python 3.7 async/await keywords
-												[AIRFLOW-2696] Setting UTF-8 as default mime_charset mail

update UPDATING.md

Closes #3559 from lxneng/feature/utf8_mime_charset

											
										
										
											2018-07-01 11:05:51 +03:00
+								### Setting UTF-8 as default mime_charset in email utils
-												[AIRFLOW-2086][AIRFLOW-2393] Customize default dagrun number in tree view

Closes #3279 from feng-tao/reduce-tree-view

This introduces a new configuration variable to set the default
number of dag runs displayed in the tree view. For large DAGs, this
could cause timeouts in the webserver.

											
										
										
											2018-05-09 18:45:06 +03:00
+								### Add a configuration variable(default_dag_run_display_number) to control numbers of dag run for display
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-XXX] Correct typos in UPDATING.md (#4242)

Started with "habe", "serever" and "certificiate" needing to be:
  "have", "server", and "certificate".
Ran a check, ignoring British and US accepted spellings.
Kept jargon. EG admin, aync, auth, backend, config, dag, s3, utils, etc.
Took exception to: "num of dag run" meaning "number of dag runs",
  "upness" is normally for quarks,
  "url" being lower-case, and
  sftp example having an excess file ending.
Python documentation writes "builtin" hyphenated, cases "PYTHONPATH".
Gave up on mixed use of "dag" and "DAG" as well as long line lengths.
											
										
										
											2018-11-27 09:14:24 +03:00
+								Add a configuration variable(default_dag_run_display_number) under webserver section to control the number of dag runs to show in UI.
-												[AIRFLOW-2086][AIRFLOW-2393] Customize default dagrun number in tree view

Closes #3279 from feng-tao/reduce-tree-view

This introduces a new configuration variable to set the default
number of dag runs displayed in the tree view. For large DAGs, this
could cause timeouts in the webserver.

											
										
										
											2018-05-09 18:45:06 +03:00
-												[AIRFLOW-74] SubdagOperators can consume all celeryd worker processes

Closes #3251 from feng-tao/airflow-74

											
										
										
											2018-04-24 20:13:25 +03:00
+								### Default executor for SubDagOperator is changed to SequentialExecutor
-												[AIRFLOW-1433][AIRFLOW-85] New Airflow Webserver UI with RBAC support

Closes #3015 from jgao54/rbac

											
										
										
											2018-03-23 11:18:48 +03:00
+								### New Webserver UI with Role-Based Access Control
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								The current webserver UI uses the Flask-Admin extension. The new webserver UI uses the [Flask-AppBuilder (FAB)](https://github.com/dpgaspar/Flask-AppBuilder) extension. FAB has built-in authentication support and Role-Based Access Control (RBAC), which provides configurable roles and permissions for individual users.
-												[AIRFLOW-1433][AIRFLOW-85] New Airflow Webserver UI with RBAC support

Closes #3015 from jgao54/rbac

											
										
										
											2018-03-23 11:18:48 +03:00
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								To turn on this feature, in your airflow.cfg file (under [webserver]), set the configuration variable `rbac = True`, and then run `airflow` command, which will generate the `webserver_config.py` file in your $AIRFLOW_HOME.
-												[AIRFLOW-1433][AIRFLOW-85] New Airflow Webserver UI with RBAC support

Closes #3015 from jgao54/rbac

											
										
										
											2018-03-23 11:18:48 +03:00
 								#### Setting up Authentication
 								FAB has built-in authentication support for DB, OAuth, OpenID, LDAP, and REMOTE_USER. The default auth type is `AUTH_DB`.
 								For any other authentication type (OAuth, OpenID, LDAP, REMOTE_USER), see the [Authentication section of FAB docs](http://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-methods) for how to configure variables in webserver_config.py file.
-												[AIRFLOW-3611] Simplified development environment (#4932)


											
										
										
											2019-08-27 21:39:36 +03:00
+								Once you modify your config file, run `airflow db init` to generate new tables for RBAC support (these tables will have the prefix `ab_`).
-												[AIRFLOW-1433][AIRFLOW-85] New Airflow Webserver UI with RBAC support

Closes #3015 from jgao54/rbac

											
										
										
											2018-03-23 11:18:48 +03:00
 								#### Creating an Admin Account
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								Once configuration settings have been updated and new tables have been generated, create an admin account with `airflow create_user` command.
-												[AIRFLOW-1433][AIRFLOW-85] New Airflow Webserver UI with RBAC support

Closes #3015 from jgao54/rbac

											
										
										
											2018-03-23 11:18:48 +03:00
 								#### Using your new UI
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								Run `airflow webserver` to start the new UI. This will bring up a log in page, enter the recently created admin username and password.
-												[AIRFLOW-1433][AIRFLOW-85] New Airflow Webserver UI with RBAC support

Closes #3015 from jgao54/rbac

											
										
										
											2018-03-23 11:18:48 +03:00
 								There are five roles created for Airflow by default: Admin, User, Op, Viewer, and Public. To configure roles/permissions, go to the `Security` tab and click `List Roles` in the new UI.
 								#### Breaking changes
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-2542][AIRFLOW-1790] Rename AWS Batch Operator queue to job_queue

- Improved the retries times to jobs below 60s
- Renamed property queue to job_queue to prevent
AWS Batch and CeleryExecutor queue conflict
- Added Breaking Chain note for the UPDATING.md
master
- Fixed operator infinit loop
- Added documentation warning about the Breaking
chain
- Fixed the commit parameter to keep it on Airflow
guidelines
- Fixed logging typo
- rebased with master

Changes to be committed:
	modified:   ../../../UPDATING.md
	modified:   awsbatch_operator.py
	modified:   ../../../tests/contrib/operators/test_
awsbatch_operator.py

Closes #3436 from hprudent/master

											
										
										
											2018-06-19 11:00:42 +03:00
+								- AWS Batch Operator renamed property queue to job_queue to prevent conflict with the internal queue from CeleryExecutor - AIRFLOW-2542
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								- Users created and stored in the old users table will not be migrated automatically. FAB's built-in authentication support must be reconfigured.
-												[AIRFLOW-1433][AIRFLOW-85] New Airflow Webserver UI with RBAC support

Closes #3015 from jgao54/rbac

											
										
										
											2018-03-23 11:18:48 +03:00
+								- Airflow dag home page is now `/home` (instead of `/admin`).
-												[AIRFLOW-XXX] Correct typos in UPDATING.md (#4242)

Started with "habe", "serever" and "certificiate" needing to be:
  "have", "server", and "certificate".
Ran a check, ignoring British and US accepted spellings.
Kept jargon. EG admin, aync, auth, backend, config, dag, s3, utils, etc.
Took exception to: "num of dag run" meaning "number of dag runs",
  "upness" is normally for quarks,
  "url" being lower-case, and
  sftp example having an excess file ending.
Python documentation writes "builtin" hyphenated, cases "PYTHONPATH".
Gave up on mixed use of "dag" and "DAG" as well as long line lengths.
											
										
										
											2018-11-27 09:14:24 +03:00
+								- All ModelViews in Flask-AppBuilder follow a different pattern from Flask-Admin. The `/admin` part of the URL path will no longer exist. For example: `/admin/connection` becomes `/connection/list`, `/admin/connection/new` becomes `/connection/add`, `/admin/connection/edit` becomes `/connection/edit`, etc.
-												[AIRFLOW-1433][AIRFLOW-85] New Airflow Webserver UI with RBAC support

Closes #3015 from jgao54/rbac

											
										
										
											2018-03-23 11:18:48 +03:00
+								- Due to security concerns, the new webserver will no longer support the features in the `Data Profiling` menu of old UI, including `Ad Hoc Query`, `Charts`, and `Known Events`.
-												[AIRFLOW-2463] Make task instance context available for hive queries

[AIRFLOW-2463] Make task instance context
available for hive queries

update UPDATING.md, please squash

Closes #3405 from yrqls21/kevin_yang_add_context

											
										
										
											2018-07-11 11:28:06 +03:00
+								- HiveServer2Hook.get_results() always returns a list of tuples, even when a single column is queried, as per Python API 2.
-												[AIRFLOW-3515] Remove the run_duration option (#4320)


											
										
										
											2019-01-08 13:40:10 +03:00
+								- **UTC is now the default timezone**: Either reconfigure your workflows scheduling in UTC or set `default_timezone` as explained in https://airflow.apache.org/timezone.html#default-time-zone
-												[AIRFLOW-1433][AIRFLOW-85] New Airflow Webserver UI with RBAC support

Closes #3015 from jgao54/rbac

											
										
										
											2018-03-23 11:18:48 +03:00
-												[AIRFLOW-2233] Update updating.md to include the info of hdfs_sensors renaming

Closes #3145 from feng-tao/airflow-2233

											
										
										
											2018-03-31 12:16:46 +03:00
+								### airflow.contrib.sensors.hdfs_sensors renamed to airflow.contrib.sensors.hdfs_sensor
 								We now rename airflow.contrib.sensors.hdfs_sensors to airflow.contrib.sensors.hdfs_sensor for consistency purpose.
-												[AIRFLOW-1895] Fix primary key integrity for mysql

sla_miss and task_instances cannot have NULL
execution_dates. The timezone
 migration scripts forgot to set this properly. In
addition to make sure
MySQL does not set "ON UPDATE CURRENT_TIMESTAMP"
or MariaDB "DEFAULT
0000-00-00 00:00:00" we now check if
explicit_defaults_for_timestamp is turned
on and otherwise fail an database upgrade.

Closes #2969, #2857

Closes #2979 from bolkedebruin/AIRFLOW-1895

											
										
										
											2018-01-27 11:01:10 +03:00
+								### MySQL setting required
 								We now rely on more strict ANSI SQL settings for MySQL in order to have sane defaults. Make sure
 								to have specified `explicit_defaults_for_timestamp=1` in your my.cnf under `[mysqld]`
-												[AIRFLOW-1840] Make celery configuration congruent with Celery 4

Explicitly set the celery backend from the config
and align the config
with the celery config as this might be confusing.

Closes #2806 from Fokko/AIRFLOW-1840-Fix-celery-
config

											
										
										
											2017-12-11 20:56:29 +03:00
 								### Celery config
 								To make the config of Airflow compatible with Celery, some properties have been renamed:
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-1840] Make celery configuration congruent with Celery 4

Explicitly set the celery backend from the config
and align the config
with the celery config as this might be confusing.

Closes #2806 from Fokko/AIRFLOW-1840-Fix-celery-
config

											
										
										
											2017-12-11 20:56:29 +03:00
+								```
 								celeryd_concurrency -> worker_concurrency
 								celery_result_backend -> result_backend
-												[AIRFLOW-1840] Support back-compat on old celery config

The new names are in-line with Celery 4, but if
anyone upgrades Airflow
without following the UPDATING.md instructions
(which we probably assume
most people won't, not until something stops
working) their workers
would suddenly just start failing. That's bad.

This will issue a warning but carry on working as
expected. We can
remove the deprecation settings (but leave the
code in config) after
this release has been made.

Closes #3549 from ashb/AIRFLOW-1840-back-compat

											
										
										
											2018-06-27 23:07:31 +03:00
+								celery_ssl_active -> ssl_active
 								celery_ssl_cert -> ssl_cert
 								celery_ssl_key -> ssl_key
-												[AIRFLOW-1840] Make celery configuration congruent with Celery 4

Explicitly set the celery backend from the config
and align the config
with the celery config as this might be confusing.

Closes #2806 from Fokko/AIRFLOW-1840-Fix-celery-
config

											
										
										
											2017-12-11 20:56:29 +03:00
+								```
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								Resulting in the same config parameters as Celery 4, with more transparency.
-												[AIRFLOW-1840] Make celery configuration congruent with Celery 4

Explicitly set the celery backend from the config
and align the config
with the celery config as this might be confusing.

Closes #2806 from Fokko/AIRFLOW-1840-Fix-celery-
config

											
										
										
											2017-12-11 20:56:29 +03:00
-												[AIRFLOW-1953] Add labels to dataflow operators

Closes #2913 from fenglu-g/master

											
										
										
											2018-01-03 22:16:39 +03:00
+								### GCP Dataflow Operators
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-1953] Add labels to dataflow operators

Closes #2913 from fenglu-g/master

											
										
										
											2018-01-03 22:16:39 +03:00
+								Dataflow job labeling is now supported in Dataflow{Java,Python}Operator with a default
 								"airflow-version" label, please upgrade your google-cloud-dataflow or apache-beam version
 								to 2.2.0 or greater.
-												[AIRFLOW-5072] gcs_hook should download once (#5685)

When a user supplied a filename the expected behaviour is that airflow
downloads the file and does not return it's content as a string.
											
										
										
											2019-09-05 18:54:07 +03:00
+								### Google Cloud Storage Hook
 								The `GoogleCloudStorageDownloadOperator` can either write to a supplied `filename` or return the content of a file via xcom through `store_to_xcom_key` - both options are mutually exclusive.
-												[AIRFLOW-2513] Change `bql` to `sql` for BigQuery Hooks & Ops

- Change `bql` to `sql` for BigQuery Hooks &
Operators for consistency

Closes #3454 from kaxil/consistent-bq-lang

											
										
										
											2018-06-04 12:04:03 +03:00
+								### BigQuery Hooks and Operator
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-2513] Change `bql` to `sql` for BigQuery Hooks & Ops

- Change `bql` to `sql` for BigQuery Hooks &
Operators for consistency

Closes #3454 from kaxil/consistent-bq-lang

											
										
										
											2018-06-04 12:04:03 +03:00
+								The `bql` parameter passed to `BigQueryOperator` and `BigQueryBaseCursor.run_query` has been deprecated and renamed to `sql` for consistency purposes. Using `bql` will still work (and raise a `DeprecationWarning`), but is no longer
 								supported and will be removed entirely in Airflow 2.0
-												[AIRFLOW-2254] Put header as first row in unload

Currently, data is ordered by first column in
descending order
Header row comes as first only if the first column
is integer
This fix puts header as first row regardless of
first column data type

Closes #3180 from sathyaprakashg/AIRFLOW-2254

											
										
										
											2018-04-16 11:21:22 +03:00
+								### Redshift to S3 Operator
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-1812] Update logging example

The logging has changed, therefore we should also
update the
updating.md guide

Closes #2784 from Fokko/AIRFLOW-1812-update-
logging-example

											
										
										
											2018-05-04 08:58:44 +03:00
+								With Airflow 1.9 or lower, Unload operation always included header row. In order to include header row,
-												[AIRFLOW-2254] Put header as first row in unload

Currently, data is ordered by first column in
descending order
Header row comes as first only if the first column
is integer
This fix puts header as first row regardless of
first column data type

Closes #3180 from sathyaprakashg/AIRFLOW-2254

											
										
										
											2018-04-16 11:21:22 +03:00
+								we need to turn off parallel unload. It is preferred to perform unload operation using all nodes so that it is
-												[AIRFLOW-1812] Update logging example

The logging has changed, therefore we should also
update the
updating.md guide

Closes #2784 from Fokko/AIRFLOW-1812-update-
logging-example

											
										
										
											2018-05-04 08:58:44 +03:00
+								faster for larger tables. So, parameter called `include_header` is added and default is set to False.
-												[AIRFLOW-2463] Make task instance context available for hive queries

[AIRFLOW-2463] Make task instance context
available for hive queries

update UPDATING.md, please squash

Closes #3405 from yrqls21/kevin_yang_add_context

											
										
										
											2018-07-11 11:28:06 +03:00
+								Header row will be added only if this parameter is set True and also in that case parallel will be automatically turned off (`PARALLEL OFF`)
-												[AIRFLOW-2254] Put header as first row in unload

Currently, data is ordered by first column in
descending order
Header row comes as first only if the first column
is integer
This fix puts header as first row regardless of
first column data type

Closes #3180 from sathyaprakashg/AIRFLOW-2254

											
										
										
											2018-04-16 11:21:22 +03:00
-												[AIRFLOW-2226] Rename google_cloud_storage_default to google_cloud_default

The Google cloud operators uses both
google_cloud_storage_default and
google_cloud_default as a default conn_id. This is
confusing and the
google_cloud_storage_default conn_id isnt
initialized by default in db.py
Therefore we rename the
google_cloud_storage_default to
google_cloud_default for simplicity and
convenience

Closes #3141 from Fokko/airflow-2226

											
										
										
											2018-03-20 00:02:10 +03:00
+								### Google cloud connection string
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								With Airflow 1.9 or lower, there were two connection strings for the Google Cloud operators, both `google_cloud_storage_default` and `google_cloud_default`. This can be confusing and therefore the `google_cloud_storage_default` connection id has been replaced with `google_cloud_default` to make the connection id consistent across Airflow.
-												[AIRFLOW-2226] Rename google_cloud_storage_default to google_cloud_default

The Google cloud operators uses both
google_cloud_storage_default and
google_cloud_default as a default conn_id. This is
confusing and the
google_cloud_storage_default conn_id isnt
initialized by default in db.py
Therefore we rename the
google_cloud_storage_default to
google_cloud_default for simplicity and
convenience

Closes #3141 from Fokko/airflow-2226

											
										
										
											2018-03-20 00:02:10 +03:00
-												[AIRFLOW-2539][AIRFLOW-2359] Move remaing log config to configuration file

Closes #3435 from
NielsZeilemaker/env_logging_filename

											
										
										
											2018-06-15 14:25:26 +03:00
+								### Logging Configuration
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-2539][AIRFLOW-2359] Move remaing log config to configuration file

Closes #3435 from
NielsZeilemaker/env_logging_filename

											
										
										
											2018-06-15 14:25:26 +03:00
+								With Airflow 1.9 or lower, `FILENAME_TEMPLATE`, `PROCESSOR_FILENAME_TEMPLATE`, `LOG_ID_TEMPLATE`, `END_OF_LOG_MARK` were configured in `airflow_local_settings.py`. These have been moved into the configuration file, and hence if you were using a custom configuration file the following defaults need to be added.
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-2539][AIRFLOW-2359] Move remaing log config to configuration file

Closes #3435 from
NielsZeilemaker/env_logging_filename

											
										
										
											2018-06-15 14:25:26 +03:00
+								```
 								[core]
 								fab_logging_level = WARN
-												[Airflow-2760] Decouple DAG parsing loop from scheduler loop (#3873)


											
										
										
											2018-10-26 11:37:10 +03:00
+								log_filename_template = {{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ try_number }}}}.log
 								log_processor_filename_template = {{{{ filename }}}}.log
-												[AIRFLOW-2539][AIRFLOW-2359] Move remaing log config to configuration file

Closes #3435 from
NielsZeilemaker/env_logging_filename

											
										
										
											2018-06-15 14:25:26 +03:00
 								[elasticsearch]
-												[Airflow-2760] Decouple DAG parsing loop from scheduler loop (#3873)


											
										
										
											2018-10-26 11:37:10 +03:00
+								elasticsearch_log_id_template = {{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}}
-												[AIRFLOW-2539][AIRFLOW-2359] Move remaing log config to configuration file

Closes #3435 from
NielsZeilemaker/env_logging_filename

											
										
										
											2018-06-15 14:25:26 +03:00
+								elasticsearch_end_of_log_mark = end_of_log
 								```
-												[AIRFLOW-XXX] Updating instructions about logging changes in 1.10 (#3715)

We had a few other logging changes that weren't mentioned in here that
meant previous logs were not viewable anymore.
											
										
										
											2018-08-07 23:34:56 +03:00
+								The previous setting of `log_task_reader` is not needed in many cases now when using the default logging config with remote storages. (Previously it needed to be set to `s3.task` or similar. This is not needed with the default config anymore)
 								#### Change of per-task log path
 								With the change to Airflow core to be timezone aware the default log path for task instances will now include timezone information. This will by default mean all previous task logs won't be found. You can get the old behaviour back by setting the following config options:
 								```
 								[core]
 								log_filename_template = {{ ti.dag_id }}/{{ ti.task_id }}/{{ execution_date.strftime("%%Y-%%m-%%dT%%H:%%M:%%S") }}/{{ try_number }}.log
 								```
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								## Airflow 1.9
-												[AIRFLOW-862] Add DaskExecutor

Adds a DaskExecutor for running Airflow tasks
in Dask clusters.

Closes #2067 from jlowin/dask-executor

											
										
										
											2017-02-13 00:06:31 +03:00
-												Fix new SSH documentation

											
										
										
											2017-07-20 23:12:31 +03:00
+								### SSH Hook updates, along with new SSH Operator & SFTP Operator
-												[AIRFLOW-1795] Correctly call S3Hook after migration to boto3

In the migration of S3Hook to boto3 the connection
ID parameter changed
to `aws_conn_id`. This fixes the uses of
`s3_conn_id` in the code base
and adds a note to UPDATING.md about the change.

In correcting the tests for S3ToHiveTransfer I
noticed that
S3Hook.get_key was returning a dictionary, rather
then the S3.Object as
mentioned in it's doc string. The important thing
that was missing was
ability to get the key name from the return a call
to get_wildcard_key.

Closes #2795 from
ashb/AIRFLOW-1795-s3hook_boto3_fixes

											
										
										
											2017-11-18 16:07:38 +03:00
-												[AIRFLOW-1933] Fix some typos

Closes #2474 from Philippus/patch-1

											
										
										
											2018-04-30 06:08:48 +03:00
+								SSH Hook now uses the Paramiko library to create an ssh client connection, instead of the sub-process based ssh command execution previously (<1.9.0), so this is backward incompatible.
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
 								- update SSHHook constructor
 								- use SSHOperator class in place of SSHExecuteOperator which is removed now. Refer to test_ssh_operator.py for usage info.
-												[AIRFLOW-XXX] Correct typos in UPDATING.md (#4242)

Started with "habe", "serever" and "certificiate" needing to be:
  "have", "server", and "certificate".
Ran a check, ignoring British and US accepted spellings.
Kept jargon. EG admin, aync, auth, backend, config, dag, s3, utils, etc.
Took exception to: "num of dag run" meaning "number of dag runs",
  "upness" is normally for quarks,
  "url" being lower-case, and
  sftp example having an excess file ending.
Python documentation writes "builtin" hyphenated, cases "PYTHONPATH".
Gave up on mixed use of "dag" and "DAG" as well as long line lengths.
											
										
										
											2018-11-27 09:14:24 +03:00
+								- SFTPOperator is added to perform secure file transfer from serverA to serverB. Refer to test_sftp_operator.py for usage info.
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
+								- No updates are required if you are using ftpHook, it will continue to work as is.
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
-												[AIRFLOW-1795] Correctly call S3Hook after migration to boto3

In the migration of S3Hook to boto3 the connection
ID parameter changed
to `aws_conn_id`. This fixes the uses of
`s3_conn_id` in the code base
and adds a note to UPDATING.md about the change.

In correcting the tests for S3ToHiveTransfer I
noticed that
S3Hook.get_key was returning a dictionary, rather
then the S3.Object as
mentioned in it's doc string. The important thing
that was missing was
ability to get the key name from the return a call
to get_wildcard_key.

Closes #2795 from
ashb/AIRFLOW-1795-s3hook_boto3_fixes

											
										
										
											2017-11-18 16:07:38 +03:00
+								### S3Hook switched to use Boto3
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								The airflow.hooks.S3_hook.S3Hook has been switched to use boto3 instead of the older boto (a.k.a. boto2). This results in a few backwards incompatible changes to the following classes: S3Hook:
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
 								- the constructors no longer accepts `s3_conn_id`. It is now called `aws_conn_id`.
 								- the default connection is now "aws_default" instead of "s3_default"
 								- the return type of objects returned by `get_bucket` is now boto3.s3.Bucket
 								- the return type of `get_key`, and `get_wildcard_key` is now an boto3.S3.Object.
-												[AIRFLOW-1795] Correctly call S3Hook after migration to boto3

In the migration of S3Hook to boto3 the connection
ID parameter changed
to `aws_conn_id`. This fixes the uses of
`s3_conn_id` in the code base
and adds a note to UPDATING.md about the change.

In correcting the tests for S3ToHiveTransfer I
noticed that
S3Hook.get_key was returning a dictionary, rather
then the S3.Object as
mentioned in it's doc string. The important thing
that was missing was
ability to get the key name from the return a call
to get_wildcard_key.

Closes #2795 from
ashb/AIRFLOW-1795-s3hook_boto3_fixes

											
										
										
											2017-11-18 16:07:38 +03:00
 								If you are using any of these in your DAGs and specify a connection ID you will need to update the parameter name for the connection to "aws_conn_id": S3ToHiveTransfer, S3PrefixSensor, S3KeySensor, RedshiftToS3Transfer.
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								### Logging update
-												[AIRFLOW-1582] Improve logging within Airflow

Clean the way of logging within Airflow. Remove
the old logging.py and
move to the airflow.utils.log.* interface. Remove
setting the logging
outside of the settings/configuration code. Move
away from the string
format to logging_function(msg, *args).

Closes #2592 from Fokko/AIRFLOW-1582-Improve-
logging-structure

											
										
										
											2017-09-13 10:36:58 +03:00
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								The logging structure of Airflow has been rewritten to make configuration easier and the logging system more transparent.
 								#### A quick recap about logging
-												[AIRFLOW-1731] Set pythonpath for logging

Before initializing the logging framework, we want
to set the python
path so the logging config can be found.

Closes #2721 from Fokko/AIRFLOW-1731-import-
pythonpath

											
										
										
											2017-10-27 17:02:56 +03:00
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								A logger is the entry point into the logging system. Each logger is a named bucket to which messages can be written for processing. A logger is configured to have a log level. This log level describes the severity of the messages that the logger will handle. Python defines the following log levels: DEBUG, INFO, WARNING, ERROR or CRITICAL.
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								Each message that is written to the logger is a Log Record. Each log record contains a log level indicating the severity of that specific message. A log record can also contain useful metadata that describes the event that is being logged. This can include details such as a stack trace or an error code.
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
 								When a message is given to the logger, the log level of the message is compared to the log level of the logger. If the log level of the message meets or exceeds the log level of the logger itself, the message will undergo further processing. If it doesn’t, the message will be ignored.
 								Once a logger has determined that a message needs to be processed, it is passed to a Handler. This configuration is now more flexible and can be easily be maintained in a single file.
 								#### Changes in Airflow Logging
-												[AIRFLOW-XXX] Correct typos in UPDATING.md (#4242)

Started with "habe", "serever" and "certificiate" needing to be:
  "have", "server", and "certificate".
Ran a check, ignoring British and US accepted spellings.
Kept jargon. EG admin, aync, auth, backend, config, dag, s3, utils, etc.
Took exception to: "num of dag run" meaning "number of dag runs",
  "upness" is normally for quarks,
  "url" being lower-case, and
  sftp example having an excess file ending.
Python documentation writes "builtin" hyphenated, cases "PYTHONPATH".
Gave up on mixed use of "dag" and "DAG" as well as long line lengths.
											
										
										
											2018-11-27 09:14:24 +03:00
+								Airflow's logging mechanism has been refactored to use Python’s built-in `logging` module to perform logging of the application. By extending classes with the existing `LoggingMixin`, all the logging will go through a central logger. Also the `BaseHook` and `BaseOperator` already extend this class, so it is easily available to do logging.
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
 								The main benefit is easier configuration of the logging by setting a single centralized python file. Disclaimer; there is still some inline configuration, but this will be removed eventually. The new logging class is defined by setting the dotted classpath in your `~/airflow/airflow.cfg` file:
 								```
 								# Logging class
 								# Specify the class that will specify the logging configuration
 								# This class has to be on the python classpath
 								logging_config_class = my.path.default_local_settings.LOGGING_CONFIG
 								```
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								The logging configuration file needs to be on the `PYTHONPATH`, for example `$AIRFLOW_HOME/config`. This directory is loaded by default. Any directory may be added to the `PYTHONPATH`, this might be handy when the config is in another directory or a volume is mounted in case of Docker.
-												[AIRFLOW-1731] Set pythonpath for logging

Before initializing the logging framework, we want
to set the python
path so the logging config can be found.

Closes #2721 from Fokko/AIRFLOW-1731-import-
pythonpath

											
										
										
											2017-10-27 17:02:56 +03:00
-												[AIRFLOW-1812] Update logging example

The logging has changed, therefore we should also
update the
updating.md guide

Closes #2784 from Fokko/AIRFLOW-1812-update-
logging-example

											
										
										
											2018-05-04 08:58:44 +03:00
+								The config can be taken from `airflow/config_templates/airflow_local_settings.py` as a starting point. Copy the contents to `${AIRFLOW_HOME}/config/airflow_local_settings.py`,  and alter the config as is preferred.
 								```
 								# -*- coding: utf-8 -*-
 								#
-												[AIRFLOW-3594] Unify different License Header


											
										
										
											2019-01-11 21:17:20 +03:00
+								# Licensed to the Apache Software Foundation (ASF) under one
 								# or more contributor license agreements.  See the NOTICE file
 								# distributed with this work for additional information
 								# regarding copyright ownership.  The ASF licenses this file
 								# to you under the Apache License, Version 2.0 (the
 								# "License"); you may not use this file except in compliance
 								# with the License.  You may obtain a copy of the License at
-												[AIRFLOW-1812] Update logging example

The logging has changed, therefore we should also
update the
updating.md guide

Closes #2784 from Fokko/AIRFLOW-1812-update-
logging-example

											
										
										
											2018-05-04 08:58:44 +03:00
+								#
-												[AIRFLOW-3594] Unify different License Header


											
										
										
											2019-01-11 21:17:20 +03:00
+								#   http://www.apache.org/licenses/LICENSE-2.0
-												[AIRFLOW-1812] Update logging example

The logging has changed, therefore we should also
update the
updating.md guide

Closes #2784 from Fokko/AIRFLOW-1812-update-
logging-example

											
										
										
											2018-05-04 08:58:44 +03:00
+								#
-												[AIRFLOW-3594] Unify different License Header


											
										
										
											2019-01-11 21:17:20 +03:00
+								# Unless required by applicable law or agreed to in writing,
 								# software distributed under the License is distributed on an
 								# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 								# KIND, either express or implied.  See the License for the
 								# specific language governing permissions and limitations
 								# under the License.
-												[AIRFLOW-1812] Update logging example

The logging has changed, therefore we should also
update the
updating.md guide

Closes #2784 from Fokko/AIRFLOW-1812-update-
logging-example

											
										
										
											2018-05-04 08:58:44 +03:00
 								import os
 								from airflow import configuration as conf
 								# TODO: Logging format and level should be configured
 								# in this file instead of from airflow.cfg. Currently
 								# there are other log format and level configurations in
 								# settings.py and cli.py. Please see AIRFLOW-1455.
 								LOG_LEVEL = conf.get('core', 'LOGGING_LEVEL').upper()
 								LOG_FORMAT = conf.get('core', 'log_format')
 								BASE_LOG_FOLDER = conf.get('core', 'BASE_LOG_FOLDER')
 								PROCESSOR_LOG_FOLDER = conf.get('scheduler', 'child_process_log_directory')
 								FILENAME_TEMPLATE = '{{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log'
 								PROCESSOR_FILENAME_TEMPLATE = '{{ filename }}.log'
 								DEFAULT_LOGGING_CONFIG = {
 								    'version': 1,
 								    'disable_existing_loggers': False,
 								    'formatters': {
 								        'airflow.task': {
 								            'format': LOG_FORMAT,
 								        },
 								        'airflow.processor': {
 								            'format': LOG_FORMAT,
 								        },
 								    },
 								    'handlers': {
 								        'console': {
 								            'class': 'logging.StreamHandler',
 								            'formatter': 'airflow.task',
 								            'stream': 'ext://sys.stdout'
 								        },
 								        'file.task': {
 								            'class': 'airflow.utils.log.file_task_handler.FileTaskHandler',
 								            'formatter': 'airflow.task',
 								            'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
 								            'filename_template': FILENAME_TEMPLATE,
 								        },
 								        'file.processor': {
 								            'class': 'airflow.utils.log.file_processor_handler.FileProcessorHandler',
 								            'formatter': 'airflow.processor',
 								            'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
 								            'filename_template': PROCESSOR_FILENAME_TEMPLATE,
 								        }
 								        # When using s3 or gcs, provide a customized LOGGING_CONFIG
 								        # in airflow_local_settings within your PYTHONPATH, see UPDATING.md
 								        # for details
 								        # 's3.task': {
 								        #     'class': 'airflow.utils.log.s3_task_handler.S3TaskHandler',
 								        #     'formatter': 'airflow.task',
 								        #     'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
 								        #     's3_log_folder': S3_LOG_FOLDER,
 								        #     'filename_template': FILENAME_TEMPLATE,
 								        # },
 								        # 'gcs.task': {
 								        #     'class': 'airflow.utils.log.gcs_task_handler.GCSTaskHandler',
 								        #     'formatter': 'airflow.task',
 								        #     'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
 								        #     'gcs_log_folder': GCS_LOG_FOLDER,
 								        #     'filename_template': FILENAME_TEMPLATE,
 								        # },
 								    },
 								    'loggers': {
 								        '': {
 								            'handlers': ['console'],
 								            'level': LOG_LEVEL
 								        },
 								        'airflow': {
 								            'handlers': ['console'],
 								            'level': LOG_LEVEL,
 								            'propagate': False,
 								        },
 								        'airflow.processor': {
 								            'handlers': ['file.processor'],
 								            'level': LOG_LEVEL,
 								            'propagate': True,
 								        },
 								        'airflow.task': {
 								            'handlers': ['file.task'],
 								            'level': LOG_LEVEL,
 								            'propagate': False,
 								        },
 								        'airflow.task_runner': {
 								            'handlers': ['file.task'],
 								            'level': LOG_LEVEL,
 								            'propagate': True,
 								        },
 								    }
 								}
 								```
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								To customize the logging (for example, use logging rotate), define one or more of the logging handles that [Python has to offer](https://docs.python.org/3/library/logging.handlers.html). For more details about the Python logging, please refer to the [official logging documentation](https://docs.python.org/3/library/logging.html).
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
 								Furthermore, this change also simplifies logging within the DAG itself:
 								```
 								root@ae1bc863e815:/airflow# python
-												[AIRFLOW-1731] Set pythonpath for logging

Before initializing the logging framework, we want
to set the python
path so the logging config can be found.

Closes #2721 from Fokko/AIRFLOW-1731-import-
pythonpath

											
										
										
											2017-10-27 17:02:56 +03:00
+								Python 3.6.2 (default, Sep 13 2017, 14:26:54)
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								[GCC 4.9.2] on linux
 								Type "help", "copyright", "credits" or "license" for more information.
 								>>> from airflow.settings import *
-												[AIRFLOW-1731] Set pythonpath for logging

Before initializing the logging framework, we want
to set the python
path so the logging config can be found.

Closes #2721 from Fokko/AIRFLOW-1731-import-
pythonpath

											
										
										
											2017-10-27 17:02:56 +03:00
+								>>>
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								>>> from datetime import datetime
 								>>> from airflow import DAG
 								>>> from airflow.operators.dummy_operator import DummyOperator
-												[AIRFLOW-1731] Set pythonpath for logging

Before initializing the logging framework, we want
to set the python
path so the logging config can be found.

Closes #2721 from Fokko/AIRFLOW-1731-import-
pythonpath

											
										
										
											2017-10-27 17:02:56 +03:00
+								>>>
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								>>> dag = DAG('simple_dag', start_date=datetime(2017, 9, 1))
-												[AIRFLOW-1731] Set pythonpath for logging

Before initializing the logging framework, we want
to set the python
path so the logging config can be found.

Closes #2721 from Fokko/AIRFLOW-1731-import-
pythonpath

											
										
										
											2017-10-27 17:02:56 +03:00
+								>>>
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								>>> task = DummyOperator(task_id='task_1', dag=dag)
-												[AIRFLOW-1731] Set pythonpath for logging

Before initializing the logging framework, we want
to set the python
path so the logging config can be found.

Closes #2721 from Fokko/AIRFLOW-1731-import-
pythonpath

											
										
										
											2017-10-27 17:02:56 +03:00
+								>>>
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								>>> task.log.error('I want to say something..')
 								[2017-09-25 20:17:04,927] {<stdin>:1} ERROR - I want to say something..
 								```
 								#### Template path of the file_task_handler
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								The `file_task_handler` logger has been made more flexible. The default format can be changed, `{dag_id}/{task_id}/{execution_date}/{try_number}.log` by supplying Jinja templating in the `FILENAME_TEMPLATE` configuration variable. See the `file_task_handler` for more information.
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
 								#### I'm using S3Log or GCSLogs, what do I do!?
-												[AIRFLOW-3612] Remove incubation/incubator mention (#4419)


											
										
										
											2019-01-05 17:05:25 +03:00
+								If you are logging to Google cloud storage, please see the [Google cloud platform documentation](https://airflow.apache.org/integration.html#gcp-google-cloud-platform) for logging instructions.
-												[AIRFLOW-1691] Add better Google cloud logging documentation

Closes #2671 from criccomini/fix-log-docs

											
										
										
											2017-10-09 20:32:34 +03:00
 								If you are using S3, the instructions should be largely the same as the Google cloud platform instructions above. You will need a custom logging config. The `REMOTE_BASE_LOG_FOLDER` configuration key in your airflow config has been removed, therefore you will need to take the following steps:
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-3612] Remove incubation/incubator mention (#4419)


											
										
										
											2019-01-05 17:05:25 +03:00
+								- Copy the logging configuration from [`airflow/config_templates/airflow_logging_settings.py`](https://github.com/apache/airflow/blob/master/airflow/config_templates/airflow_local_settings.py).
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
+								- Place it in a directory inside the Python import path `PYTHONPATH`. If you are using Python 2.7, ensuring that any `__init__.py` files exist so that it is importable.
 								- Update the config by setting the path of `REMOTE_BASE_LOG_FOLDER` explicitly in the config. The `REMOTE_BASE_LOG_FOLDER` key is not used anymore.
-												[AIRFLOW-XXX] Correct typos in UPDATING.md (#4242)

Started with "habe", "serever" and "certificiate" needing to be:
  "have", "server", and "certificate".
Ran a check, ignoring British and US accepted spellings.
Kept jargon. EG admin, aync, auth, backend, config, dag, s3, utils, etc.
Took exception to: "num of dag run" meaning "number of dag runs",
  "upness" is normally for quarks,
  "url" being lower-case, and
  sftp example having an excess file ending.
Python documentation writes "builtin" hyphenated, cases "PYTHONPATH".
Gave up on mixed use of "dag" and "DAG" as well as long line lengths.
											
										
										
											2018-11-27 09:14:24 +03:00
+								- Set the `logging_config_class` to the filename and dict. For example, if you place `custom_logging_config.py` on the base of your `PYTHONPATH`, you will need to set `logging_config_class = custom_logging_config.LOGGING_CONFIG` in your config as Airflow 1.8.
-												Fix new SSH documentation

											
										
										
											2017-07-20 23:12:31 +03:00
-												[AIRFLOW-862] Add DaskExecutor

Adds a DaskExecutor for running Airflow tasks
in Dask clusters.

Closes #2067 from jlowin/dask-executor

											
										
										
											2017-02-13 00:06:31 +03:00
+								### New Features
 								#### Dask Executor
 								A new DaskExecutor allows Airflow tasks to be run in Dask Distributed clusters.
-												[AIRFLOW-886] Pass result to post_execute() hook

The post_execute() hook should receive
the Operator result in addition to the
execution context.

											
										
										
											2017-02-18 20:04:22 +03:00
+								### Deprecated Features
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-886] Pass result to post_execute() hook

The post_execute() hook should receive
the Operator result in addition to the
execution context.

											
										
										
											2017-02-18 20:04:22 +03:00
+								These features are marked for deprecation. They may still work (and raise a `DeprecationWarning`), but are no longer
 								supported and will be removed entirely in Airflow 2.0
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-1323] Made Dataproc operator parameter names consistent

Closes #2636 from cjqian/1323

											
										
										
											2017-10-03 12:15:27 +03:00
+								- If you're using the `google_cloud_conn_id` or `dataproc_cluster` argument names explicitly in `contrib.operators.Dataproc{*}Operator`(s), be sure to rename them to `gcp_conn_id` or `cluster_name`, respectively. We've renamed these arguments for consistency. (AIRFLOW-1323)
-												[AIRFLOW-886] Pass result to post_execute() hook

The post_execute() hook should receive
the Operator result in addition to the
execution context.

											
										
										
											2017-02-18 20:04:22 +03:00
 								- `post_execute()` hooks now take two arguments, `context` and `result`
 								  (AIRFLOW-886)
 								  Previously, post_execute() only took one argument, `context`.
-												[AIRFLOW-1338][AIRFLOW-782] Add GCP dataflow hook runner change to UPDATING.md

Closes #2326 from yk5/df-python

											
										
										
											2017-06-24 01:07:45 +03:00
+								- `contrib.hooks.gcp_dataflow_hook.DataFlowHook` starts to use `--runner=DataflowRunner` instead of `DataflowPipelineRunner`, which is removed from the package `google-cloud-dataflow-0.6.0`.
-												[AIRFLOW-855] Replace PickleType with LargeBinary in XCom

PickleType in Xcom allows remote code execution.
In order to deprecate
it without changing mysql table schema, change
PickleType to LargeBinary
 because they both maps to blob type in mysql. Add
"enable_pickling" to
function signature to control using ether pickle
type or JSON. "enable_pickling"
 should also be added to core section of
airflow.cfg

Picked up where https://github.com/apache
/incubator-airflow/pull/2132 left off. Took this
PR, fixed merge conflicts, added
documentation/tests, fixed broken tests/operators,
and fixed the python3 issues.

Closes #2518 from aoen/disable-pickle-type

											
										
										
											2017-08-15 22:24:02 +03:00
+								- The pickle type for XCom messages has been replaced by json to prevent RCE attacks.
 								  Note that JSON serialization is stricter than pickling, so if you want to e.g. pass
 								  raw bytes through XCom you must encode them using an encoding like base64.
-												[AIRFLOW-1323] Made Dataproc operator parameter names consistent

Closes #2636 from cjqian/1323

											
										
										
											2017-10-03 12:15:27 +03:00
+								  By default pickling is still enabled until Airflow 2.0. To disable it
-												[AIRFLOW-1933] Fix some typos

Closes #2474 from Philippus/patch-1

											
										
										
											2018-04-30 06:08:48 +03:00
+								  set enable_xcom_pickling = False in your Airflow config.
-												[AIRFLOW-855] Replace PickleType with LargeBinary in XCom

PickleType in Xcom allows remote code execution.
In order to deprecate
it without changing mysql table schema, change
PickleType to LargeBinary
 because they both maps to blob type in mysql. Add
"enable_pickling" to
function signature to control using ether pickle
type or JSON. "enable_pickling"
 should also be added to core section of
airflow.cfg

Picked up where https://github.com/apache
/incubator-airflow/pull/2132 left off. Took this
PR, fixed merge conflicts, added
documentation/tests, fixed broken tests/operators,
and fixed the python3 issues.

Closes #2518 from aoen/disable-pickle-type

											
										
										
											2017-08-15 22:24:02 +03:00
-												[AIRFLOW-XXX] Updating CHANGELOG, README, and UPDATING after 1.8.1 release

											
										
										
											2017-05-09 23:14:50 +03:00
+								## Airflow 1.8.1
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								The Airflow package name was changed from `airflow` to `apache-airflow` during this release. You must uninstall
-												[AIRFLOW-1933] Fix some typos

Closes #2474 from Philippus/patch-1

											
										
										
											2018-04-30 06:08:48 +03:00
+								a previously installed version of Airflow before installing 1.8.1.
-												[AIRFLOW-XXX] Updating CHANGELOG, README, and UPDATING after 1.8.1 release

											
										
										
											2017-05-09 23:14:50 +03:00
-												Deprecate *args and **kwargs in BaseOperator

BaseOperator silently accepts any arguments. This deprecates the
behavior with a warning that says it will be forbidden in Airflow 2.0.

This PR also turns on DeprecationWarnings by default, which in turn
revealed that inspect.getargspec is deprecated. Here it is replaced by
`inspect.signature` (Python 3) or `funcsigs.signature` (Python 2).

Lastly, this brought to attention that example_http_operator was
passing an illegal argument.


											
										
										
											2016-04-05 11:04:55 +03:00
+								## Airflow 1.8
-												Set dags_are_paused_at_creation's default value to True

											
										
										
											2016-03-31 00:32:18 +03:00
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								### Database
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								The database schema needs to be upgraded. Make sure to shutdown Airflow and make a backup of your database. To
 								upgrade the schema issue `airflow upgradedb`.
 								### Upgrade systemd unit files
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								Systemd unit files have been updated. If you use systemd please make sure to update these.
 								> Please note that the webserver does not detach properly, this will be fixed in a future version.
-												Add pool upgrade issue description

											
										
										
											2017-02-09 18:10:17 +03:00
+								### Tasks not starting although dependencies are met due to stricter pool checking
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												Add pool upgrade issue description

											
										
										
											2017-02-09 18:10:17 +03:00
+								Airflow 1.7.1 has issues with being able to over subscribe to a pool, ie. more slots could be used than were
 								available. This is fixed in Airflow 1.8.0, but due to past issue jobs may fail to start although their
 								dependencies are met after an upgrade. To workaround either temporarily increase the amount of slots above
-												[AIRFLOW-1933] Fix some typos

Closes #2474 from Philippus/patch-1

											
										
										
											2018-04-30 06:08:48 +03:00
+								the amount of queued tasks or use a new pool.
-												Add pool upgrade issue description

											
										
										
											2017-02-09 18:10:17 +03:00
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								### Less forgiving scheduler on dynamic start_date
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								Using a dynamic start_date (e.g. `start_date = datetime.now()`) is not considered a best practice. The 1.8.0 scheduler
 								is less forgiving in this area. If you encounter DAGs not being scheduled you can try using a fixed start_date and
-												[AIRFLOW-1933] Fix some typos

Closes #2474 from Philippus/patch-1

											
										
										
											2018-04-30 06:08:48 +03:00
+								renaming your DAG. The last step is required to make sure you start with a clean slate, otherwise the old schedule can
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								interfere.
 								### New and updated scheduler options
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								Please read through the new scheduler options, defaults have changed since 1.7.1.
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
 								#### child_process_log_directory
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								In order to increase the robustness of the scheduler, DAGS are now processed in their own process. Therefore each
 								DAG has its own log file for the scheduler. These log files are placed in `child_process_log_directory` which defaults to
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								`<AIRFLOW_HOME>/scheduler/latest`. You will need to make sure these log files are removed.
 								> DAG logs or processor logs ignore and command line settings for log file locations.
 								#### run_duration
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								Previously the command line option `num_runs` was used to let the scheduler terminate after a certain amount of
 								loops. This is now time bound and defaults to `-1`, which means run continuously. See also num_runs.
 								#### num_runs
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								Previously `num_runs` was used to let the scheduler terminate after a certain amount of loops. Now num_runs specifies
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								the number of times to try to schedule each DAG file within `run_duration` time. Defaults to `-1`, which means try
-												Add known issue of 'num_runs'

											
										
										
											2017-02-10 16:53:02 +03:00
+								indefinitely. This is only available on the command line.
-												Deprecate *args and **kwargs in BaseOperator

BaseOperator silently accepts any arguments. This deprecates the
behavior with a warning that says it will be forbidden in Airflow 2.0.

This PR also turns on DeprecationWarnings by default, which in turn
revealed that inspect.getargspec is deprecated. Here it is replaced by
`inspect.signature` (Python 3) or `funcsigs.signature` (Python 2).

Lastly, this brought to attention that example_http_operator was
passing an illegal argument.


											
										
										
											2016-04-05 11:04:55 +03:00
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								#### min_file_process_interval
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								After how much time should an updated DAG be picked up from the filesystem.
-												[AIRFLOW-2027] Only trigger sleep in scheduler after all files have parsed

Closes #2986 from aoen/ddavydov--open_source_disab
le_unecessary_sleep_in_scheduler_loop

											
										
										
											2018-04-09 11:22:11 +03:00
+								#### min_file_parsing_loop_time
-												[AIRFLOW-2895] Prevent scheduler from spamming heartbeats/logs

Reverts most of AIRFLOW-2027 until the issues with
it can be fixed.

Closes #3747 from
aoen/revert_min_file_parsing_time_commit

											
										
										
											2018-08-20 16:14:22 +03:00
+								CURRENTLY DISABLED DUE TO A BUG
-												[AIRFLOW-2027] Only trigger sleep in scheduler after all files have parsed

Closes #2986 from aoen/ddavydov--open_source_disab
le_unecessary_sleep_in_scheduler_loop

											
										
										
											2018-04-09 11:22:11 +03:00
+								How many seconds to wait between file-parsing loops to prevent the logs from being spammed.
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								#### dag_dir_list_interval
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								The frequency with which the scheduler should relist the contents of the DAG directory. If while developing +dags, they are not being picked up, have a look at this number and decrease it when necessary.
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
 								#### catchup_by_default
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								By default the scheduler will fill any missing interval DAG Runs between the last execution date and the current date.
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								This setting changes that behavior to only execute the latest interval. This can also be specified per DAG as
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								`catchup = False / True`. Command line backfills will still work.
-												[AIRFLOW-1933] Fix some typos

Closes #2474 from Philippus/patch-1

											
										
										
											2018-04-30 06:08:48 +03:00
+								### Faulty DAGs do not show an error in the Web UI
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
 								Due to changes in the way Airflow processes DAGs the Web UI does not show an error when processing a faulty DAG. To
 								find processing errors go the `child_process_log_directory` which defaults to `<AIRFLOW_HOME>/scheduler/latest`.
 								### New DAGs are paused by default
-												Deprecate *args and **kwargs in BaseOperator

BaseOperator silently accepts any arguments. This deprecates the
behavior with a warning that says it will be forbidden in Airflow 2.0.

This PR also turns on DeprecationWarnings by default, which in turn
revealed that inspect.getargspec is deprecated. Here it is replaced by
`inspect.signature` (Python 3) or `funcsigs.signature` (Python 2).

Lastly, this brought to attention that example_http_operator was
passing an illegal argument.


											
										
										
											2016-04-05 11:04:55 +03:00
 								Previously, new DAGs would be scheduled immediately. To retain the old behavior, add this to airflow.cfg:
-												Set dags_are_paused_at_creation's default value to True

											
										
										
											2016-03-31 00:32:18 +03:00
 								```
-												Deprecate *args and **kwargs in BaseOperator

BaseOperator silently accepts any arguments. This deprecates the
behavior with a warning that says it will be forbidden in Airflow 2.0.

This PR also turns on DeprecationWarnings by default, which in turn
revealed that inspect.getargspec is deprecated. Here it is replaced by
`inspect.signature` (Python 3) or `funcsigs.signature` (Python 2).

Lastly, this brought to attention that example_http_operator was
passing an illegal argument.


											
										
										
											2016-04-05 11:04:55 +03:00
+								[core]
-												Set dags_are_paused_at_creation's default value to True

											
										
										
											2016-03-31 00:32:18 +03:00
+								dags_are_paused_at_creation = False
 								```
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								### Airflow Context variable are passed to Hive config if conf is specified
 								If you specify a hive conf to the run_cli command of the HiveHook, Airflow add some
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								convenience variables to the config. In case you run a secure Hadoop setup it might be
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								required to whitelist these variables by adding the following to your configuration:
 								```
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								<property>
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								     <name>hive.security.authorization.sqlstd.confwhitelist.append</name>
 								     <value>airflow\.ctx\..*</value>
 								</property>
 								```
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								### Google Cloud Operator and Hook alignment
-												Update upgrade documentation for Google Cloud

Closes #1979 from alexvanboxel/pr/doc_gcloud

											
										
										
											2017-01-10 11:03:37 +03:00
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								All Google Cloud Operators and Hooks are aligned and use the same client library. Now you have a single connection
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								type for all kinds of Google Cloud Operators.
-												Update upgrade documentation for Google Cloud

Closes #1979 from alexvanboxel/pr/doc_gcloud

											
										
										
											2017-01-10 11:03:37 +03:00
 								If you experience problems connecting with your operator make sure you set the connection type "Google Cloud Platform".
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								Also the old P12 key file type is not supported anymore and only the new JSON key files are supported as a service
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								account.
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
-												Deprecate *args and **kwargs in BaseOperator

BaseOperator silently accepts any arguments. This deprecates the
behavior with a warning that says it will be forbidden in Airflow 2.0.

This PR also turns on DeprecationWarnings by default, which in turn
revealed that inspect.getargspec is deprecated. Here it is replaced by
`inspect.signature` (Python 3) or `funcsigs.signature` (Python 2).

Lastly, this brought to attention that example_http_operator was
passing an illegal argument.


											
										
										
											2016-04-05 11:04:55 +03:00
+								### Deprecated Features
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								These features are marked for deprecation. They may still work (and raise a `DeprecationWarning`), but are no longer
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								supported and will be removed entirely in Airflow 2.0
-												Deprecate *args and **kwargs in BaseOperator

BaseOperator silently accepts any arguments. This deprecates the
behavior with a warning that says it will be forbidden in Airflow 2.0.

This PR also turns on DeprecationWarnings by default, which in turn
revealed that inspect.getargspec is deprecated. Here it is replaced by
`inspect.signature` (Python 3) or `funcsigs.signature` (Python 2).

Lastly, this brought to attention that example_http_operator was
passing an illegal argument.


											
										
										
											2016-04-05 11:04:55 +03:00
-												[AIRFLOW-31][AIRFLOW-200] Add note to updating.md

AIRFLOW-31 and AIRFLOW-200 deprecated the old important mechanism and should be noted in UPDATING.md

Closes #1643 from jlowin/patch-1

											
										
										
											2016-07-06 11:41:46 +03:00
+								- Hooks and operators must be imported from their respective submodules
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								  `airflow.operators.PigOperator` is no longer supported; `from airflow.operators.pig_operator import PigOperator` is.
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								  (AIRFLOW-31, AIRFLOW-200)
-												[AIRFLOW-31][AIRFLOW-200] Add note to updating.md

AIRFLOW-31 and AIRFLOW-200 deprecated the old important mechanism and should be noted in UPDATING.md

Closes #1643 from jlowin/patch-1

											
										
										
											2016-07-06 11:41:46 +03:00
 								- Operators no longer accept arbitrary arguments
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								  Previously, `Operator.__init__()` accepted any arguments (either positional `*args` or keyword `**kwargs`) without
-												[AIRFLOW-3612] Remove incubation/incubator mention (#4419)


											
										
										
											2019-01-05 17:05:25 +03:00
+								  complaint. Now, invalid arguments will be rejected. (https://github.com/apache/airflow/pull/1285)
-												[AIRFLOW-171] Add upgrade notes on email and S3 to 1.7.1.2

Closes #1587 from rfroetscher/upgrading_readme

											
										
										
											2016-06-14 13:27:58 +03:00
-												[AIRFLOW-1697] Mode to disable charts endpoint

											
										
										
											2017-10-10 00:46:38 +03:00
+								- The config value secure_mode will default to True which will disable some insecure endpoints/features
-												Add known issue of 'num_runs'

											
										
										
											2017-02-10 16:53:02 +03:00
+								### Known Issues
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												Add known issue of 'num_runs'

											
										
										
											2017-02-10 16:53:02 +03:00
+								There is a report that the default of "-1" for num_runs creates an issue where errors are reported while parsing tasks.
 								It was not confirmed, but a workaround was found by changing the default back to `None`.
 								To do this edit `cli.py`, find the following:
 								```
 								        'num_runs': Arg(
 								            ("-n", "--num_runs"),
 								            default=-1, type=int,
 								            help="Set the number of runs to execute before exiting"),
 								```
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								and change `default=-1` to `default=None`. If you have this issue please report it on the mailing list.
-												Add known issue of 'num_runs'

											
										
										
											2017-02-10 16:53:02 +03:00
-												[AIRFLOW-171] Add upgrade notes on email and S3 to 1.7.1.2

Closes #1587 from rfroetscher/upgrading_readme

											
										
										
											2016-06-14 13:27:58 +03:00
+								## Airflow 1.7.1.2
 								### Changes to Configuration
 								#### Email configuration change
 								To continue using the default smtp email backend, change the email_backend line in your config file from:
 								```
 								[email]
 								email_backend = airflow.utils.send_email_smtp
 								```
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-171] Add upgrade notes on email and S3 to 1.7.1.2

Closes #1587 from rfroetscher/upgrading_readme

											
										
										
											2016-06-14 13:27:58 +03:00
+								to:
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-171] Add upgrade notes on email and S3 to 1.7.1.2

Closes #1587 from rfroetscher/upgrading_readme

											
										
										
											2016-06-14 13:27:58 +03:00
+								```
 								[email]
 								email_backend = airflow.utils.email.send_email_smtp
 								```
 								#### S3 configuration change
 								To continue using S3 logging, update your config file so:
 								```
 								s3_log_folder = s3://my-airflow-log-bucket/logs
 								```
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-171] Add upgrade notes on email and S3 to 1.7.1.2

Closes #1587 from rfroetscher/upgrading_readme

											
										
										
											2016-06-14 13:27:58 +03:00
+								becomes:
-												[AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files (#3670)

Clean up the Markdown files and make the formatting consistent
											
										
										
											2018-08-01 10:50:23 +03:00
-												[AIRFLOW-171] Add upgrade notes on email and S3 to 1.7.1.2

Closes #1587 from rfroetscher/upgrading_readme

											
										
										
											2016-06-14 13:27:58 +03:00
+								```
 								remote_base_log_folder = s3://my-airflow-log-bucket/logs
 								remote_log_conn_id = <your desired s3 connection>
 								```