[AIRFLOW-XXX] Fix backtick issues in .rst files & Add Precommit hook (#6162)
This commit is contained in:
Родитель
123479cd6a
Коммит
cfd8d605d9
|
@ -163,6 +163,10 @@ repos:
|
|||
- id: mixed-line-ending
|
||||
- id: check-executables-have-shebangs
|
||||
- id: check-xml
|
||||
- repo: https://github.com/pre-commit/pygrep-hooks
|
||||
rev: v1.4.1
|
||||
hooks:
|
||||
- id: rst-backticks
|
||||
- repo: local
|
||||
hooks:
|
||||
- id: yamllint
|
||||
|
|
|
@ -376,7 +376,7 @@ DAGs/tasks:
|
|||
.. image:: img/task_manual_vs_scheduled.png
|
||||
|
||||
The DAGs/tasks with a black border are scheduled runs, whereas the non-bordered
|
||||
DAGs/tasks are manually triggered, i.e. by `airflow dags trigger`.
|
||||
DAGs/tasks are manually triggered, i.e. by ``airflow dags trigger``.
|
||||
|
||||
Workflows
|
||||
=========
|
||||
|
@ -792,9 +792,9 @@ detailing the list of tasks that missed their SLA. The event is also recorded
|
|||
in the database and made available in the web UI under ``Browse->SLA Misses``
|
||||
where events can be analyzed and documented.
|
||||
|
||||
SLAs can be configured for scheduled tasks by using the `sla` parameter.
|
||||
In addition to sending alerts to the addresses specified in a task's `email` parameter,
|
||||
the `sla_miss_callback` specifies an additional `Callable`
|
||||
SLAs can be configured for scheduled tasks by using the ``sla`` parameter.
|
||||
In addition to sending alerts to the addresses specified in a task's ``email`` parameter,
|
||||
the ``sla_miss_callback`` specifies an additional ``Callable``
|
||||
object to be invoked when the SLA is not met.
|
||||
|
||||
Email Configuration
|
||||
|
|
|
@ -70,5 +70,5 @@ Some caveats:
|
|||
|
||||
- Make sure to use a database backed result backend
|
||||
- Make sure to set a visibility timeout in [celery_broker_transport_options] that exceeds the ETA of your longest running task
|
||||
- Tasks can consume resources. Make sure your worker has enough resources to run `worker_concurrency` tasks
|
||||
- Tasks can consume resources. Make sure your worker has enough resources to run ``worker_concurrency`` tasks
|
||||
- Queue names are limited to 256 characters, but each broker backend might have its own restrictions
|
||||
|
|
|
@ -27,15 +27,15 @@ Authenticating to gRPC
|
|||
|
||||
There are several ways to connect to gRPC service using Airflow.
|
||||
|
||||
1. Using `NO_AUTH` mode, simply setup an insecure channel of connection.
|
||||
2. Using `SSL` or `TLS` mode, supply a credential pem file for the connection id,
|
||||
1. Using ``NO_AUTH`` mode, simply setup an insecure channel of connection.
|
||||
2. Using ``SSL`` or ``TLS`` mode, supply a credential pem file for the connection id,
|
||||
this will setup SSL or TLS secured connection with gRPC service.
|
||||
3. Using `JWT_GOOGLE` mode. It is using google auth default credentials by default,
|
||||
3. Using ``JWT_GOOGLE`` mode. It is using google auth default credentials by default,
|
||||
further use case of getting credentials from service account can be add later on.
|
||||
4. Using `OATH_GOOGLE` mode. Scopes are required in the extra field, can be setup in the UI.
|
||||
4. Using ``OATH_GOOGLE`` mode. Scopes are required in the extra field, can be setup in the UI.
|
||||
It is using google auth default credentials by default,
|
||||
further use case of getting credentials from service account can be add later on.
|
||||
5. Using `CUSTOM` mode. For this type of connection, you can pass in a connection
|
||||
5. Using ``CUSTOM`` mode. For this type of connection, you can pass in a connection
|
||||
function takes in the connection object and return a gRPC channel and supply whatever
|
||||
authentication type you want.
|
||||
|
||||
|
@ -59,17 +59,17 @@ Port (Optional)
|
|||
|
||||
Auth Type
|
||||
Authentication type of the gRPC connection.
|
||||
`NO_AUTH` by default, possible values are
|
||||
`NO_AUTH`, `SSL`, `TLS`, `JWT_GOOGLE`,
|
||||
`OATH_GOOGLE`, `CUSTOM`
|
||||
``NO_AUTH`` by default, possible values are
|
||||
``NO_AUTH``, ``SSL``, ``TLS``, ``JWT_GOOGLE``,
|
||||
``OATH_GOOGLE``, ``CUSTOM``
|
||||
|
||||
Credential Pem File (Optional)
|
||||
Pem file that contains credentials for
|
||||
`SSL` and `TLS` type auth
|
||||
``SSL`` and ``TLS`` type auth
|
||||
Not required for other types.
|
||||
|
||||
Scopes (comma separated) (Optional)
|
||||
A list of comma-separated `Google Cloud scopes
|
||||
<https://developers.google.com/identity/protocols/googlescopes>`_ to
|
||||
authenticate with.
|
||||
Only for `OATH_GOOGLE` type connection
|
||||
Only for ``OATH_GOOGLE`` type connection
|
||||
|
|
|
@ -78,7 +78,7 @@ Airflow assumes the value returned from the environment variable to be in a URI
|
|||
format (e.g. ``postgres://user:password@localhost:5432/master`` or
|
||||
``s3://accesskey:secretkey@S3``). The underscore character is not allowed
|
||||
in the scheme part of URI, so it must be changed to a hyphen character
|
||||
(e.g. `google-compute-platform` if `conn_type` is `google_compute_platform`).
|
||||
(e.g. ``google-compute-platform`` if ``conn_type`` is ``google_compute_platform``).
|
||||
Query parameters are parsed to one-dimensional dict and then used to fill extra.
|
||||
|
||||
|
||||
|
|
|
@ -46,18 +46,18 @@ Extra (optional)
|
|||
connection. The following parameters are supported:
|
||||
|
||||
* **encoding** - The encoding to use for regular database strings. If not specified,
|
||||
the environment variable `NLS_LANG` is used. If the environment variable `NLS_LANG`
|
||||
is not set, `ASCII` is used.
|
||||
the environment variable ``NLS_LANG`` is used. If the environment variable ``NLS_LANG``
|
||||
is not set, ``ASCII`` is used.
|
||||
* **nencoding** - The encoding to use for national character set database strings.
|
||||
If not specified, the environment variable `NLS_NCHAR` is used. If the environment
|
||||
variable `NLS_NCHAR` is not used, the environment variable `NLS_LANG` is used instead,
|
||||
and if the environment variable `NLS_LANG` is not set, `ASCII` is used.
|
||||
If not specified, the environment variable ``NLS_NCHAR`` is used. If the environment
|
||||
variable ``NLS_NCHAR`` is not used, the environment variable ``NLS_LANG`` is used instead,
|
||||
and if the environment variable ``NLS_LANG`` is not set, ``ASCII`` is used.
|
||||
* **threaded** - Whether or not Oracle should wrap accesses to connections with a mutex.
|
||||
Default value is False.
|
||||
* **events** - Whether or not to initialize Oracle in events mode.
|
||||
* **mode** - one of `sysdba`, `sysasm`, `sysoper`, `sysbkp`, `sysdgd`, `syskmt` or `sysrac`
|
||||
* **mode** - one of ``sysdba``, ``sysasm``, ``sysoper``, ``sysbkp``, ``sysdgd``, ``syskmt`` or ``sysrac``
|
||||
which are defined at the module level, Default mode is connecting.
|
||||
* **purity** - one of `new`, `self`, `default`. Specify the session acquired from the pool.
|
||||
* **purity** - one of ``new``, ``self``, ``default``. Specify the session acquired from the pool.
|
||||
configuration parameter.
|
||||
|
||||
More details on all Oracle connect parameters supported can be found in
|
||||
|
|
|
@ -43,7 +43,7 @@ Extra (optional)
|
|||
* **key_file** - Full Path of the private SSH Key file that will be used to connect to the remote_host.
|
||||
* **private_key** - Content of the private key used to connect to the remote_host.
|
||||
* **timeout** - An optional timeout (in seconds) for the TCP connect. Default is ``10``.
|
||||
* **compress** - ``true`` to ask the remote client/server to compress traffic; `false` to refuse compression. Default is ``true``.
|
||||
* **compress** - ``true`` to ask the remote client/server to compress traffic; ``false`` to refuse compression. Default is ``true``.
|
||||
* **no_host_key_check** - Set to ``false`` to restrict connecting to hosts with no entries in ``~/.ssh/known_hosts`` (Hosts file). This provides maximum protection against trojan horse attacks, but can be troublesome when the ``/etc/ssh/ssh_known_hosts`` file is poorly maintained or connections to new hosts are frequently made. This option forces the user to manually add all new hosts. Default is ``true``, ssh will automatically add new host keys to the user known hosts files.
|
||||
* **allow_host_key_change** - Set to ``true`` if you want to allow connecting to hosts that has host key changed or when you get 'REMOTE HOST IDENTIFICATION HAS CHANGED' error. This wont protect against Man-In-The-Middle attacks. Other possible solution is to remove the host entry from ``~/.ssh/known_hosts`` file. Default is ``false``.
|
||||
|
||||
|
|
|
@ -28,8 +28,8 @@ library, you should be able to use any database backend supported as a
|
|||
SqlAlchemy backend. We recommend using **MySQL** or **Postgres**.
|
||||
|
||||
.. note:: We rely on more strict ANSI SQL settings for MySQL in order to have
|
||||
sane defaults. Make sure to have specified `explicit_defaults_for_timestamp=1`
|
||||
in your my.cnf under `[mysqld]`
|
||||
sane defaults. Make sure to have specified ``explicit_defaults_for_timestamp=1``
|
||||
in your my.cnf under ``[mysqld]``
|
||||
|
||||
.. note:: If you decide to use **Postgres**, we recommend using the ``psycopg2``
|
||||
driver and specifying it in your SqlAlchemy connection string. (I.e.,
|
||||
|
|
|
@ -56,7 +56,7 @@ template to it, which will fail.
|
|||
t2 = BashOperator(
|
||||
task_id='bash_example',
|
||||
|
||||
# This fails with `Jinja template not found` error
|
||||
# This fails with 'Jinja template not found' error
|
||||
# bash_command="/home/batcher/test.sh",
|
||||
|
||||
# This works (has a space after)
|
||||
|
|
|
@ -183,7 +183,7 @@ it means that your service account does not have the correct Cloud IAM permissio
|
|||
2. Grant the user the Cloud IAM Service Account User role on the Cloud Functions runtime
|
||||
service account.
|
||||
|
||||
The typical way of assigning Cloud IAM permissions with `gcloud` is
|
||||
The typical way of assigning Cloud IAM permissions with ``gcloud`` is
|
||||
shown below. Just replace PROJECT_ID with ID of your Google Cloud Platform project
|
||||
and SERVICE_ACCOUNT_EMAIL with the email ID of your service account.
|
||||
|
||||
|
|
|
@ -72,7 +72,7 @@ The following Operator would copy all the multiples files (i.e. using wildcard).
|
|||
Move files
|
||||
----------
|
||||
|
||||
Using the `move_object` parameter allows you to move the files. After copying the file to Google Drive,
|
||||
Using the ``move_object`` parameter allows you to move the files. After copying the file to Google Drive,
|
||||
the original file from the bucket is deleted.
|
||||
|
||||
.. exampleinclude:: ../../../../airflow/contrib/example_dags/example_gcs_to_gdrive.py
|
||||
|
|
|
@ -138,7 +138,7 @@ existing database.
|
|||
You can optionally specify an operation_id parameter which simplifies determining whether
|
||||
the statements were executed in case the update_database call is replayed
|
||||
(idempotency check). The operation_id should be unique within the database, and must be
|
||||
a valid identifier: `[a-z][a-z0-9_]*`. More information can be found in
|
||||
a valid identifier: ``[a-z][a-z0-9_]*``. More information can be found in
|
||||
`the documentation of updateDdl API <https://cloud.google.com/spanner/docs/reference/rest/v1/projects.instances.databases/updateDdl>`_
|
||||
|
||||
For parameter definition take a look at
|
||||
|
|
|
@ -575,7 +575,7 @@ dynamically as needed by the operator.
|
|||
There is a *gcpcloudsql://* connection type that you should use to define what
|
||||
kind of connectivity you want the operator to use. The connection is a "meta"
|
||||
type of connection. It is not used to make an actual connectivity on its own, but it
|
||||
determines whether Cloud SQL Proxy should be started by `CloudSqlDatabaseHook`
|
||||
determines whether Cloud SQL Proxy should be started by ``CloudSqlDatabaseHook``
|
||||
and what kind of database connection (Postgres or MySQL) should be created
|
||||
dynamically to connect to Cloud SQL via public IP address or via the proxy.
|
||||
The 'CloudSqlDatabaseHook` uses
|
||||
|
@ -585,7 +585,7 @@ Proxy lifecycle (each task has its own Cloud SQL Proxy)
|
|||
When you build connection, you should use connection parameters as described in
|
||||
:class:`~airflow.gcp.hooks.cloud_sql.CloudSqlDatabaseHook`. You can see
|
||||
examples of connections below for all the possible types of connectivity. Such connection
|
||||
can be reused between different tasks (instances of `CloudSqlQueryOperator`). Each
|
||||
can be reused between different tasks (instances of ``CloudSqlQueryOperator``). Each
|
||||
task will get their own proxy started if needed with their own TCP or UNIX socket.
|
||||
|
||||
For parameter definition, take a look at
|
||||
|
@ -599,7 +599,7 @@ used to create tables in an idempotent way.
|
|||
Arguments
|
||||
"""""""""
|
||||
|
||||
If you define connection via `AIRFLOW_CONN_*` URL defined in an environment
|
||||
If you define connection via ``AIRFLOW_CONN_*`` URL defined in an environment
|
||||
variable, make sure the URL components in the URL are URL-encoded.
|
||||
See examples below for details.
|
||||
|
||||
|
@ -627,7 +627,7 @@ Using the operator
|
|||
""""""""""""""""""
|
||||
|
||||
Example operators below are using all connectivity options. Note connection id
|
||||
from the operator matches the `AIRFLOW_CONN_*` postfix uppercase. This is
|
||||
from the operator matches the ``AIRFLOW_CONN_*`` postfix uppercase. This is
|
||||
standard AIRFLOW notation for defining connection via environment variables):
|
||||
|
||||
.. exampleinclude:: ../../../../airflow/gcp/example_dags/example_cloud_sql_query.py
|
||||
|
|
|
@ -197,9 +197,9 @@ Creates and returns a new product resource.
|
|||
|
||||
Possible errors regarding the :code:`Product` object provided:
|
||||
|
||||
- Returns INVALID_ARGUMENT if `display_name` is missing or longer than 4096 characters.
|
||||
- Returns INVALID_ARGUMENT if `description` is longer than 4096 characters.
|
||||
- Returns INVALID_ARGUMENT if `product_category` is missing or invalid.
|
||||
- Returns INVALID_ARGUMENT if ``display_name`` is missing or longer than 4096 characters.
|
||||
- Returns INVALID_ARGUMENT if ``description`` is longer than 4096 characters.
|
||||
- Returns INVALID_ARGUMENT if ``product_category`` is missing or invalid.
|
||||
|
||||
For parameter definition, take a look at
|
||||
:class:`~airflow.gcp.operators.vision.CloudVisionProductCreateOperator`
|
||||
|
@ -347,7 +347,7 @@ Gets information associated with a :code:`Product`.
|
|||
|
||||
Possible errors:
|
||||
|
||||
- Returns NOT_FOUND if the `Product` does not exist.
|
||||
- Returns NOT_FOUND if the ``Product`` does not exist.
|
||||
|
||||
For parameter definition, take a look at
|
||||
:class:`~airflow.gcp.operators.vision.CloudVisionProductGetOperator`
|
||||
|
@ -606,16 +606,16 @@ CloudVisionProductSetUpdateOperator
|
|||
Makes changes to a :code:`ProductSet` resource. Only :code:`display_name` can be updated
|
||||
currently.
|
||||
|
||||
.. note:: To locate the `ProductSet` resource, its `name` in the form
|
||||
.. note:: To locate the ``ProductSet`` resource, its ``name`` in the form
|
||||
``projects/PROJECT_ID/locations/LOC_ID/productSets/PRODUCT_SET_ID`` is necessary.
|
||||
|
||||
You can provide the `name` directly as an attribute of the `product_set` object.
|
||||
However, you can leave it blank and provide `location` and `product_set_id` instead (and
|
||||
optionally `project_id` - if not present, the connection default will be used) and the
|
||||
`name` will be created by the operator itself.
|
||||
You can provide the ``name`` directly as an attribute of the ``product_set`` object.
|
||||
However, you can leave it blank and provide ``location`` and ``product_set_id`` instead (and
|
||||
optionally ``project_id`` - if not present, the connection default will be used) and the
|
||||
``name`` will be created by the operator itself.
|
||||
|
||||
This mechanism exists for your convenience, to allow leaving the `project_id` empty and
|
||||
having Airflow use the connection default `project_id`.
|
||||
This mechanism exists for your convenience, to allow leaving the ``project_id`` empty and
|
||||
having Airflow use the connection default ``project_id``.
|
||||
|
||||
For parameter definition, take a look at
|
||||
:class:`~airflow.gcp.operators.vision.CloudVisionProductSetUpdateOperator`
|
||||
|
@ -693,25 +693,25 @@ Makes changes to a :code:`Product` resource. Only the :code:`display_name`,
|
|||
If labels are updated, the change will not be reflected in queries until the next index
|
||||
time.
|
||||
|
||||
.. note:: To locate the `Product` resource, its `name` in the form
|
||||
.. note:: To locate the ``Product`` resource, its ``name`` in the form
|
||||
``projects/PROJECT_ID/locations/LOC_ID/products/PRODUCT_ID`` is necessary.
|
||||
|
||||
You can provide the `name` directly as an attribute of the `product` object. However, you
|
||||
can leave it blank and provide `location` and `product_id` instead (and optionally
|
||||
`project_id` - if not present, the connection default will be used) and the `name` will
|
||||
You can provide the ``name`` directly as an attribute of the ``product`` object. However, you
|
||||
can leave it blank and provide ``location`` and ``product_id`` instead (and optionally
|
||||
``project_id`` - if not present, the connection default will be used) and the ``name`` will
|
||||
be created by the operator itself.
|
||||
|
||||
This mechanism exists for your convenience, to allow leaving the `project_id` empty and
|
||||
having Airflow use the connection default `project_id`.
|
||||
This mechanism exists for your convenience, to allow leaving the ``project_id`` empty and
|
||||
having Airflow use the connection default ``project_id``.
|
||||
|
||||
Possible errors:
|
||||
|
||||
- Returns NOT_FOUND if the `Product` does not exist.
|
||||
- Returns INVALID_ARGUMENT if `display_name` is present in `update_mask` but is missing
|
||||
- Returns NOT_FOUND if the ``Product`` does not exist.
|
||||
- Returns INVALID_ARGUMENT if ``display_name`` is present in ``update_mask`` but is missing
|
||||
from the request or longer than 4096 characters.
|
||||
- Returns INVALID_ARGUMENT if `description` is present in `update_mask` but is longer than
|
||||
- Returns INVALID_ARGUMENT if ``description`` is present in ``update_mask`` but is longer than
|
||||
4096 characters.
|
||||
- Returns INVALID_ARGUMENT if `product_category` is present in `update_mask`.
|
||||
- Returns INVALID_ARGUMENT if ``product_category`` is present in ``update_mask``.
|
||||
|
||||
For parameter definition, take a look at
|
||||
:class:`~airflow.gcp.operators.vision.CloudVisionProductUpdateOperator`
|
||||
|
|
|
@ -42,7 +42,7 @@ with input parameters in order to overwrite the values in parameters. If no cell
|
|||
tagged with parameters the injected cell will be inserted at the top of the notebook.
|
||||
|
||||
Note that Jupyter notebook has out of the box support for tags but you need to install
|
||||
the celltags extension for Jupyter Lab: `jupyter labextension install @jupyterlab/celltags`
|
||||
the celltags extension for Jupyter Lab: ``jupyter labextension install @jupyterlab/celltags``
|
||||
|
||||
Make sure that you save your notebook somewhere so that Airflow can access it. Papermill
|
||||
supports S3, GCS, Azure and Local. HDFS is *not* supported.
|
||||
|
|
|
@ -29,11 +29,11 @@ For example, you can configure your reverse proxy to get:
|
|||
|
||||
https://lab.mycompany.com/myorg/airflow/
|
||||
|
||||
To do so, you need to set the following setting in your `airflow.cfg`::
|
||||
To do so, you need to set the following setting in your ``airflow.cfg``::
|
||||
|
||||
base_url = http://my_host/myorg/airflow
|
||||
|
||||
Additionally if you use Celery Executor, you can get Flower in `/myorg/flower` with::
|
||||
Additionally if you use Celery Executor, you can get Flower in ``/myorg/flower`` with::
|
||||
|
||||
flower_url_prefix = /myorg/flower
|
||||
|
||||
|
@ -74,11 +74,11 @@ Your reverse proxy (ex: nginx) should be configured as follow:
|
|||
|
||||
To ensure that Airflow generates URLs with the correct scheme when
|
||||
running behind a TLS-terminating proxy, you should configure the proxy
|
||||
to set the `X-Forwarded-Proto` header, and enable the `ProxyFix`
|
||||
middleware in your `airflow.cfg`::
|
||||
to set the ``X-Forwarded-Proto`` header, and enable the ``ProxyFix``
|
||||
middleware in your ``airflow.cfg``::
|
||||
|
||||
enable_proxy_fix = True
|
||||
|
||||
.. note::
|
||||
You should only enable the `ProxyFix` middleware when running
|
||||
You should only enable the ``ProxyFix`` middleware when running
|
||||
Airflow behind a trusted proxy (AWS ELB, nginx, etc.).
|
||||
|
|
|
@ -21,7 +21,7 @@ Running Airflow with systemd
|
|||
============================
|
||||
|
||||
Airflow can integrate with systemd based systems. This makes watching your
|
||||
daemons easy as `systemd` can take care of restarting a daemon on failures.
|
||||
daemons easy as ``systemd`` can take care of restarting a daemon on failures.
|
||||
|
||||
In the ``scripts/systemd`` directory, you can find unit files that
|
||||
have been tested on Redhat based systems. These files can be used as-is by copying them over to
|
||||
|
@ -29,7 +29,7 @@ have been tested on Redhat based systems. These files can be used as-is by copyi
|
|||
|
||||
The following **assumptions** have been made while creating these unit files:
|
||||
|
||||
#. Airflow runs as the following `user:group` ``airflow:airflow``.
|
||||
#. Airflow runs as the following ``user:group`` ``airflow:airflow``.
|
||||
#. Airflow runs on a Redhat based system.
|
||||
|
||||
If this is not the case, appropriate changes will need to be made.
|
||||
|
|
|
@ -28,7 +28,7 @@ You can find sample upstart job files in the ``scripts/upstart`` directory.
|
|||
|
||||
The following assumptions have been used while creating these unit files:
|
||||
|
||||
1. Airflow will run as the following `user:group` ``airflow:airflow``.
|
||||
1. Airflow will run as the following ``user:group`` ``airflow:airflow``.
|
||||
Change ``setuid`` and ``setgid`` appropriately in ``*.conf`` if airflow runs as a different user or group
|
||||
2. These files have been tested on **Ubuntu 14.04 LTS**
|
||||
You may have to adjust ``start on`` and ``stop on`` stanzas to make it work on other upstart systems.
|
||||
|
|
|
@ -38,7 +38,7 @@ You can still enable encryption for passwords within connections by following be
|
|||
fernet_key= Fernet.generate_key()
|
||||
print(fernet_key.decode()) # your fernet_key, keep it in secured place!
|
||||
|
||||
#. Replace ``airflow.cfg`` fernet_key value with the one from `Step 2`. *Alternatively,* you can store your fernet_key in OS environment variable - You do not need to change ``airflow.cfg`` in this case as Airflow will use environment variable over the value in ``airflow.cfg``:
|
||||
#. Replace ``airflow.cfg`` fernet_key value with the one from ``Step 2``. *Alternatively,* you can store your fernet_key in OS environment variable - You do not need to change ``airflow.cfg`` in this case as Airflow will use environment variable over the value in ``airflow.cfg``:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
|
|
|
@ -76,9 +76,9 @@ Airflow can be configured to read and write task logs in Azure Blob Storage.
|
|||
|
||||
Follow the steps below to enable Azure Blob Storage logging:
|
||||
|
||||
#. Airflow's logging system requires a custom `.py` file to be located in the ``PYTHONPATH``, so that it's importable from Airflow. Start by creating a directory to store the config file, ``$AIRFLOW_HOME/config`` is recommended.
|
||||
#. Airflow's logging system requires a custom ``.py`` file to be located in the ``PYTHONPATH``, so that it's importable from Airflow. Start by creating a directory to store the config file, ``$AIRFLOW_HOME/config`` is recommended.
|
||||
#. Create empty files called ``$AIRFLOW_HOME/config/log_config.py`` and ``$AIRFLOW_HOME/config/__init__.py``.
|
||||
#. Copy the contents of ``airflow/config_templates/airflow_local_settings.py`` into the ``log_config.py`` file created in `Step 2`.
|
||||
#. Copy the contents of ``airflow/config_templates/airflow_local_settings.py`` into the ``log_config.py`` file created in ``Step 2``.
|
||||
#. Customize the following portions of the template:
|
||||
|
||||
.. code-block:: bash
|
||||
|
@ -190,7 +190,7 @@ To output task logs to stdout in JSON format, the following config could be used
|
|||
Writing Logs to Elasticsearch over TLS
|
||||
----------------------------------------
|
||||
|
||||
To add custom configurations to ElasticSearch (e.g. turning on ssl_verify, adding a custom self-signed cert, etc.) use the `elasticsearch_configs` setting in your airfow.cfg
|
||||
To add custom configurations to ElasticSearch (e.g. turning on ssl_verify, adding a custom self-signed cert, etc.) use the ``elasticsearch_configs`` setting in your airfow.cfg
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
|
|
|
@ -66,7 +66,7 @@ works.
|
|||
run_this.set_downstream(run_this_last)
|
||||
|
||||
|
||||
Tasks take the parameters `inlets` and `outlets`.
|
||||
Tasks take the parameters ``inlets`` and ``outlets``.
|
||||
|
||||
Inlets can be manually defined by the following options:
|
||||
|
||||
|
@ -83,19 +83,19 @@ the context when the task is being executed.
|
|||
|
||||
.. note:: Operators can add inlets and outlets automatically if the operator supports it.
|
||||
|
||||
In the example DAG task `run_me_first` is a BashOperator that takes 3 inlets: `CAT1`, `CAT2`, `CAT3`, that are
|
||||
generated from a list. Note that `execution_date` is a templated field and will be rendered when the task is running.
|
||||
In the example DAG task ``run_me_first`` is a BashOperator that takes 3 inlets: ``CAT1``, ``CAT2``, ``CAT3``, that are
|
||||
generated from a list. Note that ``execution_date`` is a templated field and will be rendered when the task is running.
|
||||
|
||||
.. note:: Behind the scenes Airflow prepares the lineage metadata as part of the `pre_execute` method of a task. When the task
|
||||
has finished execution `post_execute` is called and lineage metadata is pushed into XCOM. Thus if you are creating
|
||||
your own operators that override this method make sure to decorate your method with `prepare_lineage` and `apply_lineage`
|
||||
.. note:: Behind the scenes Airflow prepares the lineage metadata as part of the ``pre_execute`` method of a task. When the task
|
||||
has finished execution ``post_execute`` is called and lineage metadata is pushed into XCOM. Thus if you are creating
|
||||
your own operators that override this method make sure to decorate your method with ``prepare_lineage`` and ``apply_lineage``
|
||||
respectively.
|
||||
|
||||
|
||||
Apache Atlas
|
||||
------------
|
||||
|
||||
Airflow can send its lineage metadata to Apache Atlas. You need to enable the `atlas` backend and configure it
|
||||
Airflow can send its lineage metadata to Apache Atlas. You need to enable the ``atlas`` backend and configure it
|
||||
properly, e.g. in your ``airflow.cfg``:
|
||||
|
||||
.. code:: python
|
||||
|
@ -110,4 +110,4 @@ properly, e.g. in your ``airflow.cfg``:
|
|||
port = 21000
|
||||
|
||||
|
||||
Please make sure to have the `atlasclient` package installed.
|
||||
Please make sure to have the ``atlasclient`` package installed.
|
||||
|
|
|
@ -258,11 +258,11 @@ your plugin using an entrypoint in your package. If the package is installed, ai
|
|||
will automatically load the registered plugins from the entrypoint list.
|
||||
|
||||
.. note::
|
||||
Neither the entrypoint name (eg, `my_plugin`) nor the name of the
|
||||
Neither the entrypoint name (eg, ``my_plugin``) nor the name of the
|
||||
plugin class will contribute towards the module and class name of the plugin
|
||||
itself. The structure is determined by
|
||||
`airflow.plugins_manager.AirflowPlugin.name` and the class name of the plugin
|
||||
component with the pattern `airflow.{component}.{name}.{component_class_name}`.
|
||||
``airflow.plugins_manager.AirflowPlugin.name`` and the class name of the plugin
|
||||
component with the pattern ``airflow.{component}.{name}.{component_class_name}``.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
|
@ -299,5 +299,5 @@ will automatically load the registered plugins from the entrypoint list.
|
|||
|
||||
|
||||
This will create a hook, and an operator accessible at:
|
||||
- `airflow.hooks.my_namespace.MyHook`
|
||||
- `airflow.operators.my_namespace.MyOperator`
|
||||
- ``airflow.hooks.my_namespace.MyHook``
|
||||
- ``airflow.operators.my_namespace.MyOperator``
|
||||
|
|
|
@ -255,7 +255,7 @@ and in your DAG, when initializing the HiveOperator, specify:
|
|||
|
||||
run_as_owner=True
|
||||
|
||||
To use kerberos authentication, you must install Airflow with the `kerberos` extras group:
|
||||
To use kerberos authentication, you must install Airflow with the ``kerberos`` extras group:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
|
@ -288,7 +288,7 @@ to only members of those teams.
|
|||
.. note:: If you do not specify a team whitelist, anyone with a valid account on
|
||||
your GHE installation will be able to login to Airflow.
|
||||
|
||||
To use GHE authentication, you must install Airflow with the `github_enterprise` extras group:
|
||||
To use GHE authentication, you must install Airflow with the ``github_enterprise`` extras group:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
|
@ -336,7 +336,7 @@ login, separated with a comma, to only members of those domains.
|
|||
oauth_callback_route = /oauth2callback
|
||||
domain = example1.com,example2.com
|
||||
|
||||
To use Google authentication, you must install Airflow with the `google_auth` extras group:
|
||||
To use Google authentication, you must install Airflow with the ``google_auth`` extras group:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
|
@ -395,10 +395,10 @@ Impersonation
|
|||
Airflow has the ability to impersonate a unix user while running task
|
||||
instances based on the task's ``run_as_user`` parameter, which takes a user's name.
|
||||
|
||||
**NOTE:** For impersonations to work, Airflow must be run with `sudo` as subtasks are run
|
||||
with `sudo -u` and permissions of files are changed. Furthermore, the unix user needs to
|
||||
**NOTE:** For impersonations to work, Airflow must be run with ``sudo`` as subtasks are run
|
||||
with ``sudo -u`` and permissions of files are changed. Furthermore, the unix user needs to
|
||||
exist on the worker. Here is what a simple sudoers file entry could look like to achieve
|
||||
this, assuming as airflow is running as the `airflow` user. Note that this means that
|
||||
this, assuming as airflow is running as the ``airflow`` user. Note that this means that
|
||||
the airflow user must be trusted and treated the same way as the root user.
|
||||
|
||||
.. code-block:: none
|
||||
|
@ -411,8 +411,8 @@ log to will have permissions changed such that only the unix user can write to i
|
|||
|
||||
Default Impersonation
|
||||
'''''''''''''''''''''
|
||||
To prevent tasks that don't use impersonation to be run with `sudo` privileges, you can set the
|
||||
``core:default_impersonation`` config which sets a default user impersonate if `run_as_user` is
|
||||
To prevent tasks that don't use impersonation to be run with ``sudo`` privileges, you can set the
|
||||
``core:default_impersonation`` config which sets a default user impersonate if ``run_as_user`` is
|
||||
not set.
|
||||
|
||||
.. code-block:: bash
|
||||
|
|
|
@ -37,7 +37,7 @@ for a simple DAG, but it’s a problem if you are in, for example, financial ser
|
|||
deadlines to meet.
|
||||
|
||||
The time zone is set in ``airflow.cfg``. By default it is set to utc, but you change it to use the system’s settings or
|
||||
an arbitrary IANA time zone, e.g. `Europe/Amsterdam`. It is dependent on `pendulum`, which is more accurate than `pytz`.
|
||||
an arbitrary IANA time zone, e.g. ``Europe/Amsterdam``. It is dependent on ``pendulum``, which is more accurate than ``pytz``.
|
||||
Pendulum is installed when you install Airflow.
|
||||
|
||||
Please note that the Web UI currently only runs in UTC.
|
||||
|
@ -66,12 +66,12 @@ Because Airflow uses time-zone-aware datetime objects. If your code creates date
|
|||
Interpretation of naive datetime objects
|
||||
''''''''''''''''''''''''''''''''''''''''
|
||||
|
||||
Although Airflow operates fully time zone aware, it still accepts naive date time objects for `start_dates`
|
||||
and `end_dates` in your DAG definitions. This is mostly in order to preserve backwards compatibility. In
|
||||
case a naive `start_date` or `end_date` is encountered the default time zone is applied. It is applied
|
||||
Although Airflow operates fully time zone aware, it still accepts naive date time objects for ``start_dates``
|
||||
and ``end_dates`` in your DAG definitions. This is mostly in order to preserve backwards compatibility. In
|
||||
case a naive ``start_date`` or ``end_date`` is encountered the default time zone is applied. It is applied
|
||||
in such a way that it is assumed that the naive date time is already in the default time zone. In other
|
||||
words if you have a default time zone setting of `Europe/Amsterdam` and create a naive datetime `start_date` of
|
||||
`datetime(2017,1,1)` it is assumed to be a `start_date` of Jan 1, 2017 Amsterdam time.
|
||||
words if you have a default time zone setting of ``Europe/Amsterdam`` and create a naive datetime ``start_date`` of
|
||||
``datetime(2017,1,1)`` it is assumed to be a ``start_date`` of Jan 1, 2017 Amsterdam time.
|
||||
|
||||
.. code:: python
|
||||
|
||||
|
@ -96,9 +96,9 @@ created in application code is the current time, and timezone.utcnow() automatic
|
|||
Default time zone
|
||||
'''''''''''''''''
|
||||
|
||||
The default time zone is the time zone defined by the `default_timezone` setting under `[core]`. If
|
||||
you just installed Airflow it will be set to `utc`, which is recommended. You can also set it to
|
||||
`system` or an IANA time zone (e.g.`Europe/Amsterdam`). DAGs are also evaluated on Airflow workers,
|
||||
The default time zone is the time zone defined by the ``default_timezone`` setting under ``[core]``. If
|
||||
you just installed Airflow it will be set to ``utc``, which is recommended. You can also set it to
|
||||
`system` or an IANA time zone (e.g.``Europe/Amsterdam``). DAGs are also evaluated on Airflow workers,
|
||||
it is therefore important to make sure this setting is equal on all Airflow nodes.
|
||||
|
||||
|
||||
|
@ -111,8 +111,8 @@ it is therefore important to make sure this setting is equal on all Airflow node
|
|||
Time zone aware DAGs
|
||||
--------------------
|
||||
|
||||
Creating a time zone aware DAG is quite simple. Just make sure to supply a time zone aware `start_date`
|
||||
using `pendulum`.
|
||||
Creating a time zone aware DAG is quite simple. Just make sure to supply a time zone aware ``start_date``
|
||||
using ``pendulum``.
|
||||
|
||||
.. code:: python
|
||||
|
||||
|
@ -129,7 +129,7 @@ using `pendulum`.
|
|||
op = DummyOperator(task_id='dummy', dag=dag)
|
||||
print(dag.timezone) # <Timezone [Europe/Amsterdam]>
|
||||
|
||||
Please note that while it is possible to set a `start_date` and `end_date` for Tasks always the DAG timezone
|
||||
Please note that while it is possible to set a ``start_date`` and ``end_date`` for Tasks always the DAG timezone
|
||||
or global timezone (in that order) will be used to calculate the next execution date. Upon first encounter
|
||||
the start date or end date will be converted to UTC using the timezone associated with start_date or end_date,
|
||||
then for calculations this timezone information will be disregarded.
|
||||
|
|
|
@ -266,7 +266,7 @@ in templates, make sure to read through the :doc:`macros-ref`
|
|||
|
||||
Setting up Dependencies
|
||||
-----------------------
|
||||
We have tasks `t1`, `t2` and `t3` that do not depend on each other. Here's a few ways
|
||||
We have tasks ``t1``, ``t2`` and ``t3`` that do not depend on each other. Here's a few ways
|
||||
you can define dependencies between them:
|
||||
|
||||
.. code:: python
|
||||
|
|
Загрузка…
Ссылка в новой задаче