incubator-airflow/UPDATING.md

# Updating Airflow

This file documents any backwards-incompatible changes in Airflow and
assists users migrating to a new version.

## Airflow Master

### Default executor for SubDagOperator is changed to SequentialExecutor

### New Webserver UI with Role-Based Access Control

The current webserver UI uses the Flask-Admin extension. The new webserver UI uses the [Flask-AppBuilder (FAB)](https://github.com/dpgaspar/Flask-AppBuilder) extension. FAB has built-in authentication support and Role-Based Access Control (RBAC), which provides configurable roles and permissions for individual users.

To turn on this feature, in your airflow.cfg file (under [webserver]), set the configuration variable `rbac = True`, and then run `airflow` command, which will generate the `webserver_config.py` file in your $AIRFLOW_HOME.

#### Setting up Authentication

FAB has built-in authentication support for DB, OAuth, OpenID, LDAP, and REMOTE_USER. The default auth type is `AUTH_DB`.

For any other authentication type (OAuth, OpenID, LDAP, REMOTE_USER), see the [Authentication section of FAB docs](http://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-methods) for how to configure variables in webserver_config.py file.

Once you modify your config file, run `airflow initdb` to generate new tables for RBAC support (these tables will have the prefix `ab_`).

#### Creating an Admin Account

Once configuration settings have been updated and new tables have been generated, create an admin account with `airflow create_user` command.

#### Using your new UI

Run `airflow webserver` to start the new UI. This will bring up a log in page, enter the recently created admin username and password.

There are five roles created for Airflow by default: Admin, User, Op, Viewer, and Public. To configure roles/permissions, go to the `Security` tab and click `List Roles` in the new UI.

#### Breaking changes
- Users created and stored in the old users table will not be migrated automatically. FAB's built-in authentication support must be reconfigured.
- Airflow dag home page is now `/home` (instead of `/admin`).
- All ModelViews in Flask-AppBuilder follow a different pattern from Flask-Admin. The `/admin` part of the url path will no longer exist. For example: `/admin/connection` becomes `/connection/list`, `/admin/connection/new` becomes `/connection/add`, `/admin/connection/edit` becomes `/connection/edit`, etc.
- Due to security concerns, the new webserver will no longer support the features in the `Data Profiling` menu of old UI, including `Ad Hoc Query`, `Charts`, and `Known Events`.

### airflow.contrib.sensors.hdfs_sensors renamed to airflow.contrib.sensors.hdfs_sensor

We now rename airflow.contrib.sensors.hdfs_sensors to airflow.contrib.sensors.hdfs_sensor for consistency purpose.

### MySQL setting required

We now rely on more strict ANSI SQL settings for MySQL in order to have sane defaults. Make sure
to have specified `explicit_defaults_for_timestamp=1` in your my.cnf under `[mysqld]`

### Celery config

To make the config of Airflow compatible with Celery, some properties have been renamed:
```
celeryd_concurrency -> worker_concurrency
celery_result_backend -> result_backend
```
Resulting in the same config parameters as Celery 4, with more transparency.

### GCP Dataflow Operators
Dataflow job labeling is now supported in Dataflow{Java,Python}Operator with a default
"airflow-version" label, please upgrade your google-cloud-dataflow or apache-beam version
to 2.2.0 or greater.

### Redshift to S3 Operator
With Airflow 1.9 or lower, Unload operation always included header row. In order to include header row, 
we need to turn off parallel unload. It is preferred to perform unload operation using all nodes so that it is
faster for larger tables. So, parameter called `include_header` is added and default is set to False. 
Header row will be added only if this parameter is set True and also in that case parallel will be automatically turned off (`PARALLEL OFF`)  

### Google cloud connection string

With Airflow 1.9 or lower, there were two connection strings for the Google Cloud operators, both `google_cloud_storage_default` and `google_cloud_default`. This can be confusing and therefore the `google_cloud_storage_default` connection id has been replaced with `google_cloud_default` to make the connection id consistent across Airflow.

## Airflow 1.9

### SSH Hook updates, along with new SSH Operator & SFTP Operator

SSH Hook now uses the Paramiko library to create an ssh client connection, instead of the sub-process based ssh command execution previously (<1.9.0), so this is backward incompatible.
  - update SSHHook constructor
  - use SSHOperator class in place of SSHExecuteOperator which is removed now. Refer to test_ssh_operator.py for usage info.
  - SFTPOperator is added to perform secure file transfer from serverA to serverB. Refer to test_sftp_operator.py.py for usage info.
  - No updates are required if you are using ftpHook, it will continue to work as is.

### S3Hook switched to use Boto3

The airflow.hooks.S3_hook.S3Hook has been switched to use boto3 instead of the older boto (a.k.a. boto2). This results in a few backwards incompatible changes to the following classes: S3Hook:
  - the constructors no longer accepts `s3_conn_id`. It is now called `aws_conn_id`.
  - the default connection is now "aws_default" instead of "s3_default"
  - the return type of objects returned by `get_bucket` is now boto3.s3.Bucket
  - the return type of `get_key`, and `get_wildcard_key` is now an boto3.S3.Object.

If you are using any of these in your DAGs and specify a connection ID you will need to update the parameter name for the connection to "aws_conn_id": S3ToHiveTransfer, S3PrefixSensor, S3KeySensor, RedshiftToS3Transfer.

### Logging update

The logging structure of Airflow has been rewritten to make configuration easier and the logging system more transparent.

#### A quick recap about logging

A logger is the entry point into the logging system. Each logger is a named bucket to which messages can be written for processing. A logger is configured to have a log level. This log level describes the severity of the messages that the logger will handle. Python defines the following log levels: DEBUG, INFO, WARNING, ERROR or CRITICAL.

Each message that is written to the logger is a Log Record. Each log record contains a log level indicating the severity of that specific message. A log record can also contain useful metadata that describes the event that is being logged. This can include details such as a stack trace or an error code.

When a message is given to the logger, the log level of the message is compared to the log level of the logger. If the log level of the message meets or exceeds the log level of the logger itself, the message will undergo further processing. If it doesn’t, the message will be ignored.

Once a logger has determined that a message needs to be processed, it is passed to a Handler. This configuration is now more flexible and can be easily be maintained in a single file.

#### Changes in Airflow Logging

Airflow's logging mechanism has been refactored to use Python’s builtin `logging` module to perform logging of the application. By extending classes with the existing `LoggingMixin`, all the logging will go through a central logger. Also the `BaseHook` and `BaseOperator` already extend this class, so it is easily available to do logging.

The main benefit is easier configuration of the logging by setting a single centralized python file. Disclaimer; there is still some inline configuration, but this will be removed eventually. The new logging class is defined by setting the dotted classpath in your `~/airflow/airflow.cfg` file:

```
# Logging class
# Specify the class that will specify the logging configuration
# This class has to be on the python classpath
logging_config_class = my.path.default_local_settings.LOGGING_CONFIG
```

The logging configuration file needs to be on the `PYTHONPATH`, for example `$AIRFLOW_HOME/config`. This directory is loaded by default. Any directory may be added to the `PYTHONPATH`, this might be handy when the config is in another directory or a volume is mounted in case of Docker.

The config can be taken from `airflow/config_templates/airflow_local_settings.py` as a starting point. Copy the contents to `${AIRFLOW_HOME}/config/airflow_local_settings.py`,  and alter the config as is preferred. 

To customize the logging (for example, use logging rotate), define one or more of the logging handles that [Python has to offer](https://docs.python.org/3/library/logging.handlers.html). For more details about the Python logging, please refer to the [official logging documentation](https://docs.python.org/3/library/logging.html).

Furthermore, this change also simplifies logging within the DAG itself:

```
root@ae1bc863e815:/airflow# python
Python 3.6.2 (default, Sep 13 2017, 14:26:54)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from airflow.settings import *
>>>
>>> from datetime import datetime
>>> from airflow import DAG
>>> from airflow.operators.dummy_operator import DummyOperator
>>>
>>> dag = DAG('simple_dag', start_date=datetime(2017, 9, 1))
>>>
>>> task = DummyOperator(task_id='task_1', dag=dag)
>>>
>>> task.log.error('I want to say something..')
[2017-09-25 20:17:04,927] {<stdin>:1} ERROR - I want to say something..
```

#### Template path of the file_task_handler

The `file_task_handler` logger has been made more flexible. The default format can be changed, `{dag_id}/{task_id}/{execution_date}/{try_number}.log` by supplying Jinja templating in the `FILENAME_TEMPLATE` configuration variable. See the `file_task_handler` for more information.

#### I'm using S3Log or GCSLogs, what do I do!?

If you are logging to Google cloud storage, please see the [Google cloud platform documentation](https://airflow.incubator.apache.org/integration.html#gcp-google-cloud-platform) for logging instructions.

If you are using S3, the instructions should be largely the same as the Google cloud platform instructions above. You will need a custom logging config. The `REMOTE_BASE_LOG_FOLDER` configuration key in your airflow config has been removed, therefore you will need to take the following steps:
 - Copy the logging configuration from [`airflow/config_templates/airflow_logging_settings.py`](https://github.com/apache/incubator-airflow/blob/master/airflow/config_templates/airflow_local_settings.py).
 - Place it in a directory inside the Python import path `PYTHONPATH`. If you are using Python 2.7, ensuring that any `__init__.py` files exist so that it is importable.
 - Update the config by setting the path of `REMOTE_BASE_LOG_FOLDER` explicitly in the config. The `REMOTE_BASE_LOG_FOLDER` key is not used anymore.
 - Set the `logging_config_class` to the filename and dict. For example, if you place `custom_logging_config.py` on the base of your pythonpath, you will need to set `logging_config_class = custom_logging_config.LOGGING_CONFIG` in your config as Airflow 1.8.

### New Features

#### Dask Executor

A new DaskExecutor allows Airflow tasks to be run in Dask Distributed clusters.

### Deprecated Features
These features are marked for deprecation. They may still work (and raise a `DeprecationWarning`), but are no longer
supported and will be removed entirely in Airflow 2.0
- If you're using the `google_cloud_conn_id` or `dataproc_cluster` argument names explicitly in `contrib.operators.Dataproc{*}Operator`(s), be sure to rename them to `gcp_conn_id` or `cluster_name`, respectively. We've renamed these arguments for consistency. (AIRFLOW-1323)

- `post_execute()` hooks now take two arguments, `context` and `result`
  (AIRFLOW-886)

  Previously, post_execute() only took one argument, `context`.

- `contrib.hooks.gcp_dataflow_hook.DataFlowHook` starts to use `--runner=DataflowRunner` instead of `DataflowPipelineRunner`, which is removed from the package `google-cloud-dataflow-0.6.0`.

- The pickle type for XCom messages has been replaced by json to prevent RCE attacks.
  Note that JSON serialization is stricter than pickling, so if you want to e.g. pass
  raw bytes through XCom you must encode them using an encoding like base64.
  By default pickling is still enabled until Airflow 2.0. To disable it
  set enable_xcom_pickling = False in your Airflow config.

## Airflow 1.8.1

The Airflow package name was changed from `airflow` to `apache-airflow` during this release. You must uninstall
a previously installed version of Airflow before installing 1.8.1.

## Airflow 1.8

### Database
The database schema needs to be upgraded. Make sure to shutdown Airflow and make a backup of your database. To
upgrade the schema issue `airflow upgradedb`.

### Upgrade systemd unit files
Systemd unit files have been updated. If you use systemd please make sure to update these.

> Please note that the webserver does not detach properly, this will be fixed in a future version.

### Tasks not starting although dependencies are met due to stricter pool checking
Airflow 1.7.1 has issues with being able to over subscribe to a pool, ie. more slots could be used than were
available. This is fixed in Airflow 1.8.0, but due to past issue jobs may fail to start although their
dependencies are met after an upgrade. To workaround either temporarily increase the amount of slots above
the amount of queued tasks or use a new pool.

### Less forgiving scheduler on dynamic start_date
Using a dynamic start_date (e.g. `start_date = datetime.now()`) is not considered a best practice. The 1.8.0 scheduler
is less forgiving in this area. If you encounter DAGs not being scheduled you can try using a fixed start_date and
renaming your DAG. The last step is required to make sure you start with a clean slate, otherwise the old schedule can
interfere.

### New and updated scheduler options
Please read through the new scheduler options, defaults have changed since 1.7.1.

#### child_process_log_directory
In order to increase the robustness of the scheduler, DAGS are now processed in their own process. Therefore each
DAG has its own log file for the scheduler. These log files are placed in `child_process_log_directory` which defaults to
`<AIRFLOW_HOME>/scheduler/latest`. You will need to make sure these log files are removed.

> DAG logs or processor logs ignore and command line settings for log file locations.

#### run_duration
Previously the command line option `num_runs` was used to let the scheduler terminate after a certain amount of
loops. This is now time bound and defaults to `-1`, which means run continuously. See also num_runs.

#### num_runs
Previously `num_runs` was used to let the scheduler terminate after a certain amount of loops. Now num_runs specifies
the number of times to try to schedule each DAG file within `run_duration` time. Defaults to `-1`, which means try
indefinitely. This is only available on the command line.

#### min_file_process_interval
After how much time should an updated DAG be picked up from the filesystem.

#### min_file_parsing_loop_time
How many seconds to wait between file-parsing loops to prevent the logs from being spammed.

#### dag_dir_list_interval
The frequency with which the scheduler should relist the contents of the DAG directory. If while developing +dags, they are not being picked up, have a look at this number and decrease it when necessary.

#### catchup_by_default
By default the scheduler will fill any missing interval DAG Runs between the last execution date and the current date.
This setting changes that behavior to only execute the latest interval. This can also be specified per DAG as
`catchup = False / True`. Command line backfills will still work.

### Faulty DAGs do not show an error in the Web UI

Due to changes in the way Airflow processes DAGs the Web UI does not show an error when processing a faulty DAG. To
find processing errors go the `child_process_log_directory` which defaults to `<AIRFLOW_HOME>/scheduler/latest`.

### New DAGs are paused by default

Previously, new DAGs would be scheduled immediately. To retain the old behavior, add this to airflow.cfg:

```
[core]
dags_are_paused_at_creation = False
```

### Airflow Context variable are passed to Hive config if conf is specified

If you specify a hive conf to the run_cli command of the HiveHook, Airflow add some
convenience variables to the config. In case you run a secure Hadoop setup it might be
required to whitelist these variables by adding the following to your configuration:

```
<property>
     <name>hive.security.authorization.sqlstd.confwhitelist.append</name>
     <value>airflow\.ctx\..*</value>
</property>
```
### Google Cloud Operator and Hook alignment

All Google Cloud Operators and Hooks are aligned and use the same client library. Now you have a single connection
type for all kinds of Google Cloud Operators.

If you experience problems connecting with your operator make sure you set the connection type "Google Cloud Platform".

Also the old P12 key file type is not supported anymore and only the new JSON key files are supported as a service
account.

### Deprecated Features
These features are marked for deprecation. They may still work (and raise a `DeprecationWarning`), but are no longer
supported and will be removed entirely in Airflow 2.0

- Hooks and operators must be imported from their respective submodules

  `airflow.operators.PigOperator` is no longer supported; `from airflow.operators.pig_operator import PigOperator` is.
  (AIRFLOW-31, AIRFLOW-200)

- Operators no longer accept arbitrary arguments

  Previously, `Operator.__init__()` accepted any arguments (either positional `*args` or keyword `**kwargs`) without
  complaint. Now, invalid arguments will be rejected. (https://github.com/apache/incubator-airflow/pull/1285)

- The config value secure_mode will default to True which will disable some insecure endpoints/features

### Known Issues
There is a report that the default of "-1" for num_runs creates an issue where errors are reported while parsing tasks.
It was not confirmed, but a workaround was found by changing the default back to `None`.

To do this edit `cli.py`, find the following:

```
        'num_runs': Arg(
            ("-n", "--num_runs"),
            default=-1, type=int,
            help="Set the number of runs to execute before exiting"),
```

and change `default=-1` to `default=None`. If you have this issue please report it on the mailing list.

## Airflow 1.7.1.2

### Changes to Configuration

#### Email configuration change

To continue using the default smtp email backend, change the email_backend line in your config file from:

```
[email]
email_backend = airflow.utils.send_email_smtp
```
to:
```
[email]
email_backend = airflow.utils.email.send_email_smtp
```

#### S3 configuration change

To continue using S3 logging, update your config file so:

```
s3_log_folder = s3://my-airflow-log-bucket/logs
```
becomes:
```
remote_base_log_folder = s3://my-airflow-log-bucket/logs
remote_log_conn_id = <your desired s3 connection>
```
-												Set dags_are_paused_at_creation's default value to True

											
										
										
											2016-03-31 00:32:18 +03:00
+								# Updating Airflow
-												Deprecate *args and **kwargs in BaseOperator

BaseOperator silently accepts any arguments. This deprecates the
behavior with a warning that says it will be forbidden in Airflow 2.0.

This PR also turns on DeprecationWarnings by default, which in turn
revealed that inspect.getargspec is deprecated. Here it is replaced by
`inspect.signature` (Python 3) or `funcsigs.signature` (Python 2).

Lastly, this brought to attention that example_http_operator was
passing an illegal argument.


											
										
										
											2016-04-05 11:04:55 +03:00
+								This file documents any backwards-incompatible changes in Airflow and
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								assists users migrating to a new version.
-												Set dags_are_paused_at_creation's default value to True

											
										
										
											2016-03-31 00:32:18 +03:00
-												[AIRFLOW-1895] Fix primary key integrity for mysql

sla_miss and task_instances cannot have NULL
execution_dates. The timezone
 migration scripts forgot to set this properly. In
addition to make sure
MySQL does not set "ON UPDATE CURRENT_TIMESTAMP"
or MariaDB "DEFAULT
0000-00-00 00:00:00" we now check if
explicit_defaults_for_timestamp is turned
on and otherwise fail an database upgrade.

Closes #2969, #2857

Closes #2979 from bolkedebruin/AIRFLOW-1895

											
										
										
											2018-01-27 11:01:10 +03:00
+								## Airflow Master
-												[AIRFLOW-74] SubdagOperators can consume all celeryd worker processes

Closes #3251 from feng-tao/airflow-74

											
										
										
											2018-04-24 20:13:25 +03:00
+								### Default executor for SubDagOperator is changed to SequentialExecutor
-												[AIRFLOW-1433][AIRFLOW-85] New Airflow Webserver UI with RBAC support

Closes #3015 from jgao54/rbac

											
										
										
											2018-03-23 11:18:48 +03:00
+								### New Webserver UI with Role-Based Access Control
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								The current webserver UI uses the Flask-Admin extension. The new webserver UI uses the [Flask-AppBuilder (FAB)](https://github.com/dpgaspar/Flask-AppBuilder) extension. FAB has built-in authentication support and Role-Based Access Control (RBAC), which provides configurable roles and permissions for individual users.
-												[AIRFLOW-1433][AIRFLOW-85] New Airflow Webserver UI with RBAC support

Closes #3015 from jgao54/rbac

											
										
										
											2018-03-23 11:18:48 +03:00
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								To turn on this feature, in your airflow.cfg file (under [webserver]), set the configuration variable `rbac = True`, and then run `airflow` command, which will generate the `webserver_config.py` file in your $AIRFLOW_HOME.
-												[AIRFLOW-1433][AIRFLOW-85] New Airflow Webserver UI with RBAC support

Closes #3015 from jgao54/rbac

											
										
										
											2018-03-23 11:18:48 +03:00
 								#### Setting up Authentication
 								FAB has built-in authentication support for DB, OAuth, OpenID, LDAP, and REMOTE_USER. The default auth type is `AUTH_DB`.
 								For any other authentication type (OAuth, OpenID, LDAP, REMOTE_USER), see the [Authentication section of FAB docs](http://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-methods) for how to configure variables in webserver_config.py file.
 								Once you modify your config file, run `airflow initdb` to generate new tables for RBAC support (these tables will have the prefix `ab_`).
 								#### Creating an Admin Account
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								Once configuration settings have been updated and new tables have been generated, create an admin account with `airflow create_user` command.
-												[AIRFLOW-1433][AIRFLOW-85] New Airflow Webserver UI with RBAC support

Closes #3015 from jgao54/rbac

											
										
										
											2018-03-23 11:18:48 +03:00
 								#### Using your new UI
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								Run `airflow webserver` to start the new UI. This will bring up a log in page, enter the recently created admin username and password.
-												[AIRFLOW-1433][AIRFLOW-85] New Airflow Webserver UI with RBAC support

Closes #3015 from jgao54/rbac

											
										
										
											2018-03-23 11:18:48 +03:00
 								There are five roles created for Airflow by default: Admin, User, Op, Viewer, and Public. To configure roles/permissions, go to the `Security` tab and click `List Roles` in the new UI.
 								#### Breaking changes
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								- Users created and stored in the old users table will not be migrated automatically. FAB's built-in authentication support must be reconfigured.
-												[AIRFLOW-1433][AIRFLOW-85] New Airflow Webserver UI with RBAC support

Closes #3015 from jgao54/rbac

											
										
										
											2018-03-23 11:18:48 +03:00
+								- Airflow dag home page is now `/home` (instead of `/admin`).
 								- All ModelViews in Flask-AppBuilder follow a different pattern from Flask-Admin. The `/admin` part of the url path will no longer exist. For example: `/admin/connection` becomes `/connection/list`, `/admin/connection/new` becomes `/connection/add`, `/admin/connection/edit` becomes `/connection/edit`, etc.
 								- Due to security concerns, the new webserver will no longer support the features in the `Data Profiling` menu of old UI, including `Ad Hoc Query`, `Charts`, and `Known Events`.
-												[AIRFLOW-2233] Update updating.md to include the info of hdfs_sensors renaming

Closes #3145 from feng-tao/airflow-2233

											
										
										
											2018-03-31 12:16:46 +03:00
+								### airflow.contrib.sensors.hdfs_sensors renamed to airflow.contrib.sensors.hdfs_sensor
 								We now rename airflow.contrib.sensors.hdfs_sensors to airflow.contrib.sensors.hdfs_sensor for consistency purpose.
-												[AIRFLOW-1895] Fix primary key integrity for mysql

sla_miss and task_instances cannot have NULL
execution_dates. The timezone
 migration scripts forgot to set this properly. In
addition to make sure
MySQL does not set "ON UPDATE CURRENT_TIMESTAMP"
or MariaDB "DEFAULT
0000-00-00 00:00:00" we now check if
explicit_defaults_for_timestamp is turned
on and otherwise fail an database upgrade.

Closes #2969, #2857

Closes #2979 from bolkedebruin/AIRFLOW-1895

											
										
										
											2018-01-27 11:01:10 +03:00
+								### MySQL setting required
 								We now rely on more strict ANSI SQL settings for MySQL in order to have sane defaults. Make sure
 								to have specified `explicit_defaults_for_timestamp=1` in your my.cnf under `[mysqld]`
-												[AIRFLOW-1840] Make celery configuration congruent with Celery 4

Explicitly set the celery backend from the config
and align the config
with the celery config as this might be confusing.

Closes #2806 from Fokko/AIRFLOW-1840-Fix-celery-
config

											
										
										
											2017-12-11 20:56:29 +03:00
 								### Celery config
 								To make the config of Airflow compatible with Celery, some properties have been renamed:
 								```
 								celeryd_concurrency -> worker_concurrency
 								celery_result_backend -> result_backend
 								```
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								Resulting in the same config parameters as Celery 4, with more transparency.
-												[AIRFLOW-1840] Make celery configuration congruent with Celery 4

Explicitly set the celery backend from the config
and align the config
with the celery config as this might be confusing.

Closes #2806 from Fokko/AIRFLOW-1840-Fix-celery-
config

											
										
										
											2017-12-11 20:56:29 +03:00
-												[AIRFLOW-1953] Add labels to dataflow operators

Closes #2913 from fenglu-g/master

											
										
										
											2018-01-03 22:16:39 +03:00
+								### GCP Dataflow Operators
 								Dataflow job labeling is now supported in Dataflow{Java,Python}Operator with a default
 								"airflow-version" label, please upgrade your google-cloud-dataflow or apache-beam version
 								to 2.2.0 or greater.
-												[AIRFLOW-2254] Put header as first row in unload

Currently, data is ordered by first column in
descending order
Header row comes as first only if the first column
is integer
This fix puts header as first row regardless of
first column data type

Closes #3180 from sathyaprakashg/AIRFLOW-2254

											
										
										
											2018-04-16 11:21:22 +03:00
+								### Redshift to S3 Operator
 								With Airflow 1.9 or lower, Unload operation always included header row. In order to include header row,
 								we need to turn off parallel unload. It is preferred to perform unload operation using all nodes so that it is
 								faster for larger tables. So, parameter called `include_header` is added and default is set to False.
 								Header row will be added only if this parameter is set True and also in that case parallel will be automatically turned off (`PARALLEL OFF`)
-												[AIRFLOW-2226] Rename google_cloud_storage_default to google_cloud_default

The Google cloud operators uses both
google_cloud_storage_default and
google_cloud_default as a default conn_id. This is
confusing and the
google_cloud_storage_default conn_id isnt
initialized by default in db.py
Therefore we rename the
google_cloud_storage_default to
google_cloud_default for simplicity and
convenience

Closes #3141 from Fokko/airflow-2226

											
										
										
											2018-03-20 00:02:10 +03:00
+								### Google cloud connection string
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								With Airflow 1.9 or lower, there were two connection strings for the Google Cloud operators, both `google_cloud_storage_default` and `google_cloud_default`. This can be confusing and therefore the `google_cloud_storage_default` connection id has been replaced with `google_cloud_default` to make the connection id consistent across Airflow.
-												[AIRFLOW-2226] Rename google_cloud_storage_default to google_cloud_default

The Google cloud operators uses both
google_cloud_storage_default and
google_cloud_default as a default conn_id. This is
confusing and the
google_cloud_storage_default conn_id isnt
initialized by default in db.py
Therefore we rename the
google_cloud_storage_default to
google_cloud_default for simplicity and
convenience

Closes #3141 from Fokko/airflow-2226

											
										
										
											2018-03-20 00:02:10 +03:00
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								## Airflow 1.9
-												[AIRFLOW-862] Add DaskExecutor

Adds a DaskExecutor for running Airflow tasks
in Dask clusters.

Closes #2067 from jlowin/dask-executor

											
										
										
											2017-02-13 00:06:31 +03:00
-												Fix new SSH documentation

											
										
										
											2017-07-20 23:12:31 +03:00
+								### SSH Hook updates, along with new SSH Operator & SFTP Operator
-												[AIRFLOW-1795] Correctly call S3Hook after migration to boto3

In the migration of S3Hook to boto3 the connection
ID parameter changed
to `aws_conn_id`. This fixes the uses of
`s3_conn_id` in the code base
and adds a note to UPDATING.md about the change.

In correcting the tests for S3ToHiveTransfer I
noticed that
S3Hook.get_key was returning a dictionary, rather
then the S3.Object as
mentioned in it's doc string. The important thing
that was missing was
ability to get the key name from the return a call
to get_wildcard_key.

Closes #2795 from
ashb/AIRFLOW-1795-s3hook_boto3_fixes

											
										
										
											2017-11-18 16:07:38 +03:00
-												[AIRFLOW-1933] Fix some typos

Closes #2474 from Philippus/patch-1

											
										
										
											2018-04-30 06:08:48 +03:00
+								SSH Hook now uses the Paramiko library to create an ssh client connection, instead of the sub-process based ssh command execution previously (<1.9.0), so this is backward incompatible.
-												Fix new SSH documentation

											
										
										
											2017-07-20 23:12:31 +03:00
+								  - update SSHHook constructor
-												[AIRFLOW-1933] Fix some typos

Closes #2474 from Philippus/patch-1

											
										
										
											2018-04-30 06:08:48 +03:00
+								  - use SSHOperator class in place of SSHExecuteOperator which is removed now. Refer to test_ssh_operator.py for usage info.
 								  - SFTPOperator is added to perform secure file transfer from serverA to serverB. Refer to test_sftp_operator.py.py for usage info.
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								  - No updates are required if you are using ftpHook, it will continue to work as is.
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
-												[AIRFLOW-1795] Correctly call S3Hook after migration to boto3

In the migration of S3Hook to boto3 the connection
ID parameter changed
to `aws_conn_id`. This fixes the uses of
`s3_conn_id` in the code base
and adds a note to UPDATING.md about the change.

In correcting the tests for S3ToHiveTransfer I
noticed that
S3Hook.get_key was returning a dictionary, rather
then the S3.Object as
mentioned in it's doc string. The important thing
that was missing was
ability to get the key name from the return a call
to get_wildcard_key.

Closes #2795 from
ashb/AIRFLOW-1795-s3hook_boto3_fixes

											
										
										
											2017-11-18 16:07:38 +03:00
+								### S3Hook switched to use Boto3
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								The airflow.hooks.S3_hook.S3Hook has been switched to use boto3 instead of the older boto (a.k.a. boto2). This results in a few backwards incompatible changes to the following classes: S3Hook:
-												[AIRFLOW-1795] Correctly call S3Hook after migration to boto3

In the migration of S3Hook to boto3 the connection
ID parameter changed
to `aws_conn_id`. This fixes the uses of
`s3_conn_id` in the code base
and adds a note to UPDATING.md about the change.

In correcting the tests for S3ToHiveTransfer I
noticed that
S3Hook.get_key was returning a dictionary, rather
then the S3.Object as
mentioned in it's doc string. The important thing
that was missing was
ability to get the key name from the return a call
to get_wildcard_key.

Closes #2795 from
ashb/AIRFLOW-1795-s3hook_boto3_fixes

											
										
										
											2017-11-18 16:07:38 +03:00
+								  - the constructors no longer accepts `s3_conn_id`. It is now called `aws_conn_id`.
-												[AIRFLOW-1933] Fix some typos

Closes #2474 from Philippus/patch-1

											
										
										
											2018-04-30 06:08:48 +03:00
+								  - the default connection is now "aws_default" instead of "s3_default"
-												[AIRFLOW-1795] Correctly call S3Hook after migration to boto3

In the migration of S3Hook to boto3 the connection
ID parameter changed
to `aws_conn_id`. This fixes the uses of
`s3_conn_id` in the code base
and adds a note to UPDATING.md about the change.

In correcting the tests for S3ToHiveTransfer I
noticed that
S3Hook.get_key was returning a dictionary, rather
then the S3.Object as
mentioned in it's doc string. The important thing
that was missing was
ability to get the key name from the return a call
to get_wildcard_key.

Closes #2795 from
ashb/AIRFLOW-1795-s3hook_boto3_fixes

											
										
										
											2017-11-18 16:07:38 +03:00
+								  - the return type of objects returned by `get_bucket` is now boto3.s3.Bucket
 								  - the return type of `get_key`, and `get_wildcard_key` is now an boto3.S3.Object.
 								If you are using any of these in your DAGs and specify a connection ID you will need to update the parameter name for the connection to "aws_conn_id": S3ToHiveTransfer, S3PrefixSensor, S3KeySensor, RedshiftToS3Transfer.
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								### Logging update
-												[AIRFLOW-1582] Improve logging within Airflow

Clean the way of logging within Airflow. Remove
the old logging.py and
move to the airflow.utils.log.* interface. Remove
setting the logging
outside of the settings/configuration code. Move
away from the string
format to logging_function(msg, *args).

Closes #2592 from Fokko/AIRFLOW-1582-Improve-
logging-structure

											
										
										
											2017-09-13 10:36:58 +03:00
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								The logging structure of Airflow has been rewritten to make configuration easier and the logging system more transparent.
 								#### A quick recap about logging
-												[AIRFLOW-1731] Set pythonpath for logging

Before initializing the logging framework, we want
to set the python
path so the logging config can be found.

Closes #2721 from Fokko/AIRFLOW-1731-import-
pythonpath

											
										
										
											2017-10-27 17:02:56 +03:00
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								A logger is the entry point into the logging system. Each logger is a named bucket to which messages can be written for processing. A logger is configured to have a log level. This log level describes the severity of the messages that the logger will handle. Python defines the following log levels: DEBUG, INFO, WARNING, ERROR or CRITICAL.
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								Each message that is written to the logger is a Log Record. Each log record contains a log level indicating the severity of that specific message. A log record can also contain useful metadata that describes the event that is being logged. This can include details such as a stack trace or an error code.
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
 								When a message is given to the logger, the log level of the message is compared to the log level of the logger. If the log level of the message meets or exceeds the log level of the logger itself, the message will undergo further processing. If it doesn’t, the message will be ignored.
 								Once a logger has determined that a message needs to be processed, it is passed to a Handler. This configuration is now more flexible and can be easily be maintained in a single file.
 								#### Changes in Airflow Logging
-												[AIRFLOW-1933] Fix some typos

Closes #2474 from Philippus/patch-1

											
										
										
											2018-04-30 06:08:48 +03:00
+								Airflow's logging mechanism has been refactored to use Python’s builtin `logging` module to perform logging of the application. By extending classes with the existing `LoggingMixin`, all the logging will go through a central logger. Also the `BaseHook` and `BaseOperator` already extend this class, so it is easily available to do logging.
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
 								The main benefit is easier configuration of the logging by setting a single centralized python file. Disclaimer; there is still some inline configuration, but this will be removed eventually. The new logging class is defined by setting the dotted classpath in your `~/airflow/airflow.cfg` file:
 								```
 								# Logging class
 								# Specify the class that will specify the logging configuration
 								# This class has to be on the python classpath
 								logging_config_class = my.path.default_local_settings.LOGGING_CONFIG
 								```
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								The logging configuration file needs to be on the `PYTHONPATH`, for example `$AIRFLOW_HOME/config`. This directory is loaded by default. Any directory may be added to the `PYTHONPATH`, this might be handy when the config is in another directory or a volume is mounted in case of Docker.
-												[AIRFLOW-1731] Set pythonpath for logging

Before initializing the logging framework, we want
to set the python
path so the logging config can be found.

Closes #2721 from Fokko/AIRFLOW-1731-import-
pythonpath

											
										
										
											2017-10-27 17:02:56 +03:00
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								The config can be taken from `airflow/config_templates/airflow_local_settings.py` as a starting point. Copy the contents to `${AIRFLOW_HOME}/config/airflow_local_settings.py`,  and alter the config as is preferred.
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								To customize the logging (for example, use logging rotate), define one or more of the logging handles that [Python has to offer](https://docs.python.org/3/library/logging.handlers.html). For more details about the Python logging, please refer to the [official logging documentation](https://docs.python.org/3/library/logging.html).
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
 								Furthermore, this change also simplifies logging within the DAG itself:
 								```
 								root@ae1bc863e815:/airflow# python
-												[AIRFLOW-1731] Set pythonpath for logging

Before initializing the logging framework, we want
to set the python
path so the logging config can be found.

Closes #2721 from Fokko/AIRFLOW-1731-import-
pythonpath

											
										
										
											2017-10-27 17:02:56 +03:00
+								Python 3.6.2 (default, Sep 13 2017, 14:26:54)
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								[GCC 4.9.2] on linux
 								Type "help", "copyright", "credits" or "license" for more information.
 								>>> from airflow.settings import *
-												[AIRFLOW-1731] Set pythonpath for logging

Before initializing the logging framework, we want
to set the python
path so the logging config can be found.

Closes #2721 from Fokko/AIRFLOW-1731-import-
pythonpath

											
										
										
											2017-10-27 17:02:56 +03:00
+								>>>
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								>>> from datetime import datetime
 								>>> from airflow import DAG
 								>>> from airflow.operators.dummy_operator import DummyOperator
-												[AIRFLOW-1731] Set pythonpath for logging

Before initializing the logging framework, we want
to set the python
path so the logging config can be found.

Closes #2721 from Fokko/AIRFLOW-1731-import-
pythonpath

											
										
										
											2017-10-27 17:02:56 +03:00
+								>>>
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								>>> dag = DAG('simple_dag', start_date=datetime(2017, 9, 1))
-												[AIRFLOW-1731] Set pythonpath for logging

Before initializing the logging framework, we want
to set the python
path so the logging config can be found.

Closes #2721 from Fokko/AIRFLOW-1731-import-
pythonpath

											
										
										
											2017-10-27 17:02:56 +03:00
+								>>>
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								>>> task = DummyOperator(task_id='task_1', dag=dag)
-												[AIRFLOW-1731] Set pythonpath for logging

Before initializing the logging framework, we want
to set the python
path so the logging config can be found.

Closes #2721 from Fokko/AIRFLOW-1731-import-
pythonpath

											
										
										
											2017-10-27 17:02:56 +03:00
+								>>>
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								>>> task.log.error('I want to say something..')
 								[2017-09-25 20:17:04,927] {<stdin>:1} ERROR - I want to say something..
 								```
 								#### Template path of the file_task_handler
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								The `file_task_handler` logger has been made more flexible. The default format can be changed, `{dag_id}/{task_id}/{execution_date}/{try_number}.log` by supplying Jinja templating in the `FILENAME_TEMPLATE` configuration variable. See the `file_task_handler` for more information.
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
 								#### I'm using S3Log or GCSLogs, what do I do!?
-												[AIRFLOW-1691] Add better Google cloud logging documentation

Closes #2671 from criccomini/fix-log-docs

											
										
										
											2017-10-09 20:32:34 +03:00
+								If you are logging to Google cloud storage, please see the [Google cloud platform documentation](https://airflow.incubator.apache.org/integration.html#gcp-google-cloud-platform) for logging instructions.
 								If you are using S3, the instructions should be largely the same as the Google cloud platform instructions above. You will need a custom logging config. The `REMOTE_BASE_LOG_FOLDER` configuration key in your airflow config has been removed, therefore you will need to take the following steps:
-												[AIRFLOW-1933] Fix some typos

Closes #2474 from Philippus/patch-1

											
										
										
											2018-04-30 06:08:48 +03:00
+								 - Copy the logging configuration from [`airflow/config_templates/airflow_logging_settings.py`](https://github.com/apache/incubator-airflow/blob/master/airflow/config_templates/airflow_local_settings.py).
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								 - Place it in a directory inside the Python import path `PYTHONPATH`. If you are using Python 2.7, ensuring that any `__init__.py` files exist so that it is importable.
-												[AIRFLOW-1731] Set pythonpath for logging

Before initializing the logging framework, we want
to set the python
path so the logging config can be found.

Closes #2721 from Fokko/AIRFLOW-1731-import-
pythonpath

											
										
										
											2017-10-27 17:02:56 +03:00
+								 - Update the config by setting the path of `REMOTE_BASE_LOG_FOLDER` explicitly in the config. The `REMOTE_BASE_LOG_FOLDER` key is not used anymore.
-												[AIRFLOW-1611] Customize logging

Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

											
										
										
											2017-10-02 18:14:01 +03:00
+								 - Set the `logging_config_class` to the filename and dict. For example, if you place `custom_logging_config.py` on the base of your pythonpath, you will need to set `logging_config_class = custom_logging_config.LOGGING_CONFIG` in your config as Airflow 1.8.
-												Fix new SSH documentation

											
										
										
											2017-07-20 23:12:31 +03:00
-												[AIRFLOW-862] Add DaskExecutor

Adds a DaskExecutor for running Airflow tasks
in Dask clusters.

Closes #2067 from jlowin/dask-executor

											
										
										
											2017-02-13 00:06:31 +03:00
+								### New Features
 								#### Dask Executor
 								A new DaskExecutor allows Airflow tasks to be run in Dask Distributed clusters.
-												[AIRFLOW-886] Pass result to post_execute() hook

The post_execute() hook should receive
the Operator result in addition to the
execution context.

											
										
										
											2017-02-18 20:04:22 +03:00
+								### Deprecated Features
 								These features are marked for deprecation. They may still work (and raise a `DeprecationWarning`), but are no longer
 								supported and will be removed entirely in Airflow 2.0
-												[AIRFLOW-1323] Made Dataproc operator parameter names consistent

Closes #2636 from cjqian/1323

											
										
										
											2017-10-03 12:15:27 +03:00
+								- If you're using the `google_cloud_conn_id` or `dataproc_cluster` argument names explicitly in `contrib.operators.Dataproc{*}Operator`(s), be sure to rename them to `gcp_conn_id` or `cluster_name`, respectively. We've renamed these arguments for consistency. (AIRFLOW-1323)
-												[AIRFLOW-886] Pass result to post_execute() hook

The post_execute() hook should receive
the Operator result in addition to the
execution context.

											
										
										
											2017-02-18 20:04:22 +03:00
 								- `post_execute()` hooks now take two arguments, `context` and `result`
 								  (AIRFLOW-886)
 								  Previously, post_execute() only took one argument, `context`.
-												[AIRFLOW-1338][AIRFLOW-782] Add GCP dataflow hook runner change to UPDATING.md

Closes #2326 from yk5/df-python

											
										
										
											2017-06-24 01:07:45 +03:00
+								- `contrib.hooks.gcp_dataflow_hook.DataFlowHook` starts to use `--runner=DataflowRunner` instead of `DataflowPipelineRunner`, which is removed from the package `google-cloud-dataflow-0.6.0`.
-												[AIRFLOW-855] Replace PickleType with LargeBinary in XCom

PickleType in Xcom allows remote code execution.
In order to deprecate
it without changing mysql table schema, change
PickleType to LargeBinary
 because they both maps to blob type in mysql. Add
"enable_pickling" to
function signature to control using ether pickle
type or JSON. "enable_pickling"
 should also be added to core section of
airflow.cfg

Picked up where https://github.com/apache
/incubator-airflow/pull/2132 left off. Took this
PR, fixed merge conflicts, added
documentation/tests, fixed broken tests/operators,
and fixed the python3 issues.

Closes #2518 from aoen/disable-pickle-type

											
										
										
											2017-08-15 22:24:02 +03:00
+								- The pickle type for XCom messages has been replaced by json to prevent RCE attacks.
 								  Note that JSON serialization is stricter than pickling, so if you want to e.g. pass
 								  raw bytes through XCom you must encode them using an encoding like base64.
-												[AIRFLOW-1323] Made Dataproc operator parameter names consistent

Closes #2636 from cjqian/1323

											
										
										
											2017-10-03 12:15:27 +03:00
+								  By default pickling is still enabled until Airflow 2.0. To disable it
-												[AIRFLOW-1933] Fix some typos

Closes #2474 from Philippus/patch-1

											
										
										
											2018-04-30 06:08:48 +03:00
+								  set enable_xcom_pickling = False in your Airflow config.
-												[AIRFLOW-855] Replace PickleType with LargeBinary in XCom

PickleType in Xcom allows remote code execution.
In order to deprecate
it without changing mysql table schema, change
PickleType to LargeBinary
 because they both maps to blob type in mysql. Add
"enable_pickling" to
function signature to control using ether pickle
type or JSON. "enable_pickling"
 should also be added to core section of
airflow.cfg

Picked up where https://github.com/apache
/incubator-airflow/pull/2132 left off. Took this
PR, fixed merge conflicts, added
documentation/tests, fixed broken tests/operators,
and fixed the python3 issues.

Closes #2518 from aoen/disable-pickle-type

											
										
										
											2017-08-15 22:24:02 +03:00
-												[AIRFLOW-XXX] Updating CHANGELOG, README, and UPDATING after 1.8.1 release

											
										
										
											2017-05-09 23:14:50 +03:00
+								## Airflow 1.8.1
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								The Airflow package name was changed from `airflow` to `apache-airflow` during this release. You must uninstall
-												[AIRFLOW-1933] Fix some typos

Closes #2474 from Philippus/patch-1

											
										
										
											2018-04-30 06:08:48 +03:00
+								a previously installed version of Airflow before installing 1.8.1.
-												[AIRFLOW-XXX] Updating CHANGELOG, README, and UPDATING after 1.8.1 release

											
										
										
											2017-05-09 23:14:50 +03:00
-												Deprecate *args and **kwargs in BaseOperator

BaseOperator silently accepts any arguments. This deprecates the
behavior with a warning that says it will be forbidden in Airflow 2.0.

This PR also turns on DeprecationWarnings by default, which in turn
revealed that inspect.getargspec is deprecated. Here it is replaced by
`inspect.signature` (Python 3) or `funcsigs.signature` (Python 2).

Lastly, this brought to attention that example_http_operator was
passing an illegal argument.


											
										
										
											2016-04-05 11:04:55 +03:00
+								## Airflow 1.8
-												Set dags_are_paused_at_creation's default value to True

											
										
										
											2016-03-31 00:32:18 +03:00
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								### Database
 								The database schema needs to be upgraded. Make sure to shutdown Airflow and make a backup of your database. To
 								upgrade the schema issue `airflow upgradedb`.
 								### Upgrade systemd unit files
 								Systemd unit files have been updated. If you use systemd please make sure to update these.
 								> Please note that the webserver does not detach properly, this will be fixed in a future version.
-												Add pool upgrade issue description

											
										
										
											2017-02-09 18:10:17 +03:00
+								### Tasks not starting although dependencies are met due to stricter pool checking
 								Airflow 1.7.1 has issues with being able to over subscribe to a pool, ie. more slots could be used than were
 								available. This is fixed in Airflow 1.8.0, but due to past issue jobs may fail to start although their
 								dependencies are met after an upgrade. To workaround either temporarily increase the amount of slots above
-												[AIRFLOW-1933] Fix some typos

Closes #2474 from Philippus/patch-1

											
										
										
											2018-04-30 06:08:48 +03:00
+								the amount of queued tasks or use a new pool.
-												Add pool upgrade issue description

											
										
										
											2017-02-09 18:10:17 +03:00
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								### Less forgiving scheduler on dynamic start_date
 								Using a dynamic start_date (e.g. `start_date = datetime.now()`) is not considered a best practice. The 1.8.0 scheduler
 								is less forgiving in this area. If you encounter DAGs not being scheduled you can try using a fixed start_date and
-												[AIRFLOW-1933] Fix some typos

Closes #2474 from Philippus/patch-1

											
										
										
											2018-04-30 06:08:48 +03:00
+								renaming your DAG. The last step is required to make sure you start with a clean slate, otherwise the old schedule can
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								interfere.
 								### New and updated scheduler options
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								Please read through the new scheduler options, defaults have changed since 1.7.1.
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
 								#### child_process_log_directory
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								In order to increase the robustness of the scheduler, DAGS are now processed in their own process. Therefore each
 								DAG has its own log file for the scheduler. These log files are placed in `child_process_log_directory` which defaults to
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								`<AIRFLOW_HOME>/scheduler/latest`. You will need to make sure these log files are removed.
 								> DAG logs or processor logs ignore and command line settings for log file locations.
 								#### run_duration
 								Previously the command line option `num_runs` was used to let the scheduler terminate after a certain amount of
 								loops. This is now time bound and defaults to `-1`, which means run continuously. See also num_runs.
 								#### num_runs
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								Previously `num_runs` was used to let the scheduler terminate after a certain amount of loops. Now num_runs specifies
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								the number of times to try to schedule each DAG file within `run_duration` time. Defaults to `-1`, which means try
-												Add known issue of 'num_runs'

											
										
										
											2017-02-10 16:53:02 +03:00
+								indefinitely. This is only available on the command line.
-												Deprecate *args and **kwargs in BaseOperator

BaseOperator silently accepts any arguments. This deprecates the
behavior with a warning that says it will be forbidden in Airflow 2.0.

This PR also turns on DeprecationWarnings by default, which in turn
revealed that inspect.getargspec is deprecated. Here it is replaced by
`inspect.signature` (Python 3) or `funcsigs.signature` (Python 2).

Lastly, this brought to attention that example_http_operator was
passing an illegal argument.


											
										
										
											2016-04-05 11:04:55 +03:00
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								#### min_file_process_interval
 								After how much time should an updated DAG be picked up from the filesystem.
-												[AIRFLOW-2027] Only trigger sleep in scheduler after all files have parsed

Closes #2986 from aoen/ddavydov--open_source_disab
le_unecessary_sleep_in_scheduler_loop

											
										
										
											2018-04-09 11:22:11 +03:00
+								#### min_file_parsing_loop_time
 								How many seconds to wait between file-parsing loops to prevent the logs from being spammed.
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								#### dag_dir_list_interval
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								The frequency with which the scheduler should relist the contents of the DAG directory. If while developing +dags, they are not being picked up, have a look at this number and decrease it when necessary.
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
 								#### catchup_by_default
 								By default the scheduler will fill any missing interval DAG Runs between the last execution date and the current date.
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								This setting changes that behavior to only execute the latest interval. This can also be specified per DAG as
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								`catchup = False / True`. Command line backfills will still work.
-												[AIRFLOW-1933] Fix some typos

Closes #2474 from Philippus/patch-1

											
										
										
											2018-04-30 06:08:48 +03:00
+								### Faulty DAGs do not show an error in the Web UI
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
 								Due to changes in the way Airflow processes DAGs the Web UI does not show an error when processing a faulty DAG. To
 								find processing errors go the `child_process_log_directory` which defaults to `<AIRFLOW_HOME>/scheduler/latest`.
 								### New DAGs are paused by default
-												Deprecate *args and **kwargs in BaseOperator

BaseOperator silently accepts any arguments. This deprecates the
behavior with a warning that says it will be forbidden in Airflow 2.0.

This PR also turns on DeprecationWarnings by default, which in turn
revealed that inspect.getargspec is deprecated. Here it is replaced by
`inspect.signature` (Python 3) or `funcsigs.signature` (Python 2).

Lastly, this brought to attention that example_http_operator was
passing an illegal argument.


											
										
										
											2016-04-05 11:04:55 +03:00
 								Previously, new DAGs would be scheduled immediately. To retain the old behavior, add this to airflow.cfg:
-												Set dags_are_paused_at_creation's default value to True

											
										
										
											2016-03-31 00:32:18 +03:00
 								```
-												Deprecate *args and **kwargs in BaseOperator

BaseOperator silently accepts any arguments. This deprecates the
behavior with a warning that says it will be forbidden in Airflow 2.0.

This PR also turns on DeprecationWarnings by default, which in turn
revealed that inspect.getargspec is deprecated. Here it is replaced by
`inspect.signature` (Python 3) or `funcsigs.signature` (Python 2).

Lastly, this brought to attention that example_http_operator was
passing an illegal argument.


											
										
										
											2016-04-05 11:04:55 +03:00
+								[core]
-												Set dags_are_paused_at_creation's default value to True

											
										
										
											2016-03-31 00:32:18 +03:00
+								dags_are_paused_at_creation = False
 								```
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								### Airflow Context variable are passed to Hive config if conf is specified
 								If you specify a hive conf to the run_cli command of the HiveHook, Airflow add some
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								convenience variables to the config. In case you run a secure Hadoop setup it might be
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								required to whitelist these variables by adding the following to your configuration:
 								```
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								<property>
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								     <name>hive.security.authorization.sqlstd.confwhitelist.append</name>
 								     <value>airflow\.ctx\..*</value>
 								</property>
 								```
 								### Google Cloud Operator and Hook alignment
-												Update upgrade documentation for Google Cloud

Closes #1979 from alexvanboxel/pr/doc_gcloud

											
										
										
											2017-01-10 11:03:37 +03:00
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								All Google Cloud Operators and Hooks are aligned and use the same client library. Now you have a single connection
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								type for all kinds of Google Cloud Operators.
-												Update upgrade documentation for Google Cloud

Closes #1979 from alexvanboxel/pr/doc_gcloud

											
										
										
											2017-01-10 11:03:37 +03:00
 								If you experience problems connecting with your operator make sure you set the connection type "Google Cloud Platform".
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								Also the old P12 key file type is not supported anymore and only the new JSON key files are supported as a service
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								account.
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
-												Deprecate *args and **kwargs in BaseOperator

BaseOperator silently accepts any arguments. This deprecates the
behavior with a warning that says it will be forbidden in Airflow 2.0.

This PR also turns on DeprecationWarnings by default, which in turn
revealed that inspect.getargspec is deprecated. Here it is replaced by
`inspect.signature` (Python 3) or `funcsigs.signature` (Python 2).

Lastly, this brought to attention that example_http_operator was
passing an illegal argument.


											
										
										
											2016-04-05 11:04:55 +03:00
+								### Deprecated Features
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								These features are marked for deprecation. They may still work (and raise a `DeprecationWarning`), but are no longer
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								supported and will be removed entirely in Airflow 2.0
-												Deprecate *args and **kwargs in BaseOperator

BaseOperator silently accepts any arguments. This deprecates the
behavior with a warning that says it will be forbidden in Airflow 2.0.

This PR also turns on DeprecationWarnings by default, which in turn
revealed that inspect.getargspec is deprecated. Here it is replaced by
`inspect.signature` (Python 3) or `funcsigs.signature` (Python 2).

Lastly, this brought to attention that example_http_operator was
passing an illegal argument.


											
										
										
											2016-04-05 11:04:55 +03:00
-												[AIRFLOW-31][AIRFLOW-200] Add note to updating.md

AIRFLOW-31 and AIRFLOW-200 deprecated the old important mechanism and should be noted in UPDATING.md

Closes #1643 from jlowin/patch-1

											
										
										
											2016-07-06 11:41:46 +03:00
+								- Hooks and operators must be imported from their respective submodules
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								  `airflow.operators.PigOperator` is no longer supported; `from airflow.operators.pig_operator import PigOperator` is.
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								  (AIRFLOW-31, AIRFLOW-200)
-												[AIRFLOW-31][AIRFLOW-200] Add note to updating.md

AIRFLOW-31 and AIRFLOW-200 deprecated the old important mechanism and should be noted in UPDATING.md

Closes #1643 from jlowin/patch-1

											
										
										
											2016-07-06 11:41:46 +03:00
 								- Operators no longer accept arbitrary arguments
-												[AIRFLOW-1443] Update Airflow configuration documentation

This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.

Closes #2467 from AllisonWang/allison--update-doc

											
										
										
											2017-08-10 00:49:54 +03:00
+								  Previously, `Operator.__init__()` accepted any arguments (either positional `*args` or keyword `**kwargs`) without
-												[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789

											
										
										
											2017-02-01 18:52:45 +03:00
+								  complaint. Now, invalid arguments will be rejected. (https://github.com/apache/incubator-airflow/pull/1285)
-												[AIRFLOW-171] Add upgrade notes on email and S3 to 1.7.1.2

Closes #1587 from rfroetscher/upgrading_readme

											
										
										
											2016-06-14 13:27:58 +03:00
-												[AIRFLOW-1697] Mode to disable charts endpoint

											
										
										
											2017-10-10 00:46:38 +03:00
+								- The config value secure_mode will default to True which will disable some insecure endpoints/features
-												Add known issue of 'num_runs'

											
										
										
											2017-02-10 16:53:02 +03:00
+								### Known Issues
 								There is a report that the default of "-1" for num_runs creates an issue where errors are reported while parsing tasks.
 								It was not confirmed, but a workaround was found by changing the default back to `None`.
 								To do this edit `cli.py`, find the following:
 								```
 								        'num_runs': Arg(
 								            ("-n", "--num_runs"),
 								            default=-1, type=int,
 								            help="Set the number of runs to execute before exiting"),
 								```
-												[AIRFLOW-2350] Fix grammar in UPDATING.md

Closes #3248 from r39132/patch-1

											
										
										
											2018-04-21 09:34:16 +03:00
+								and change `default=-1` to `default=None`. If you have this issue please report it on the mailing list.
-												Add known issue of 'num_runs'

											
										
										
											2017-02-10 16:53:02 +03:00
-												[AIRFLOW-171] Add upgrade notes on email and S3 to 1.7.1.2

Closes #1587 from rfroetscher/upgrading_readme

											
										
										
											2016-06-14 13:27:58 +03:00
+								## Airflow 1.7.1.2
 								### Changes to Configuration
 								#### Email configuration change
 								To continue using the default smtp email backend, change the email_backend line in your config file from:
 								```
 								[email]
 								email_backend = airflow.utils.send_email_smtp
 								```
 								to:
 								```
 								[email]
 								email_backend = airflow.utils.email.send_email_smtp
 								```
 								#### S3 configuration change
 								To continue using S3 logging, update your config file so:
 								```
 								s3_log_folder = s3://my-airflow-log-bucket/logs
 								```
 								becomes:
 								```
 								remote_base_log_folder = s3://my-airflow-log-bucket/logs
 								remote_log_conn_id = <your desired s3 connection>
 								```