[AIRFLOW-2523] Add how-to for managing GCP connections

I'd like to have how-to guides for all connection
types, or at least the
different categories of connection types. I found
it difficult to figure
out how to manage a GCP connection, this commit
add a how-to guide for
this.

Also, since creating and editing connections
really aren't all that
different, the PR renames the "creating
connections" how-to to "managing
connections".

Closes #3419 from tswast/howto
This commit is contained in:
Tim Swast 2018-05-25 09:37:29 +01:00 коммит произвёл Kaxil Naik
Родитель 66f00bbf7b
Коммит 4c0d67f0d0
9 изменённых файлов: 152 добавлений и 31 удалений

Просмотреть файл

@ -308,6 +308,8 @@ UI. As slots free up, queued tasks start running based on the
Note that by default tasks aren't assigned to any pool and their
execution parallelism is only limited to the executor's setting.
.. _concepts-connections:
Connections
===========
@ -324,16 +326,12 @@ from ``BaseHook``, Airflow will choose one connection randomly, allowing
for some basic load balancing and fault tolerance when used in conjunction
with retries.
Airflow also has the ability to reference connections via environment
variables from the operating system. The environment variable needs to be
prefixed with ``AIRFLOW_CONN_`` to be considered a connection. When
referencing the connection in the Airflow pipeline, the ``conn_id`` should
be the name of the variable without the prefix. For example, if the ``conn_id``
is named ``postgres_master`` the environment variable should be named
``AIRFLOW_CONN_POSTGRES_MASTER`` (note that the environment variable must be
all uppercase). Airflow assumes the value returned from the environment
variable to be in a URI format (e.g.
``postgres://user:password@localhost:5432/master`` or ``s3://accesskey:secretkey@S3``).
Many hooks have a default ``conn_id``, where operators using that hook do not
need to supply an explicit connection ID. For example, the default
``conn_id`` for the :class:`~airflow.hooks.postgres_hook.PostgresHook` is
``postgres_default``.
See :doc:`howto/manage-connections` for how to create and manage connections.
Queues
======
@ -410,7 +408,7 @@ Variables
Variables are a generic way to store and retrieve arbitrary content or
settings as a simple key value store within Airflow. Variables can be
listed, created, updated and deleted from the UI (``Admin -> Variables``),
code or CLI. In addition, json settings files can be bulk uploaded through
code or CLI. In addition, json settings files can be bulk uploaded through
the UI. While your pipeline code definition and most of your constants
and variables should be defined in code and stored in source control,
it can be useful to have some variables or configuration items
@ -427,18 +425,18 @@ The second call assumes ``json`` content and will be deserialized into
``bar``. Note that ``Variable`` is a sqlalchemy model and can be used
as such.
You can use a variable from a jinja template with the syntax :
You can use a variable from a jinja template with the syntax :
.. code:: bash
echo {{ var.value.<variable_name> }}
or if you need to deserialize a json object from the variable :
or if you need to deserialize a json object from the variable :
.. code:: bash
echo {{ var.json.<variable_name> }}
Branching
=========

Просмотреть файл

@ -1,8 +0,0 @@
Creating a Connection
=====================
Connections in Airflow pipelines can be created using environment variables.
The environment variable needs to have a prefix of ``AIRFLOW_CONN_`` for
Airflow with the value in a URI format to use the connection properly. Please
see the :doc:`../../concepts` documentation for more information on environment
variables and connections.

Просмотреть файл

@ -12,8 +12,8 @@ configuring an Airflow environment.
set-config
initialize-database
manage-connections
secure-connections
create-connection
write-logs
executor/use-celery
executor/use-dask

Просмотреть файл

@ -0,0 +1,135 @@
Managing Connections
=====================
Airflow needs to know how to connect to your environment. Information
such as hostname, port, login and passwords to other systems and services is
handled in the ``Admin->Connection`` section of the UI. The pipeline code you
will author will reference the 'conn_id' of the Connection objects.
.. image:: ../img/connections.png
Connections can be created and managed using either the UI or environment
variables.
See the :ref:`Connenctions Concepts <concepts-connections>` documentation for
more information.
Creating a Connection with the UI
---------------------------------
Open the ``Admin->Connection`` section of the UI. Click the ``Create`` link
to create a new connection.
.. image:: ../img/connection_create.png
1. Fill in the ``Conn Id`` field with the desired connection ID. It is
recommended that you use lower-case characters and separate words with
underscores.
2. Choose the connection type with the ``Conn Type`` field.
3. Fill in the remaining fields. See
:ref:`manage-connections-connection-types` for a description of the fields
belonging to the different connection types.
4. Click the ``Save`` button to create the connection.
Editing a Connection with the UI
--------------------------------
Open the ``Admin->Connection`` section of the UI. Click the pencil icon next
to the connection you wish to edit in the connection list.
.. image:: ../img/connection_edit.png
Modify the connection properties and click the ``Save`` button to save your
changes.
Creating a Connection with Environment Variables
------------------------------------------------
Connections in Airflow pipelines can be created using environment variables.
The environment variable needs to have a prefix of ``AIRFLOW_CONN_`` for
Airflow with the value in a URI format to use the connection properly.
When referencing the connection in the Airflow pipeline, the ``conn_id``
should be the name of the variable without the prefix. For example, if the
``conn_id`` is named ``postgres_master`` the environment variable should be
named ``AIRFLOW_CONN_POSTGRES_MASTER`` (note that the environment variable
must be all uppercase). Airflow assumes the value returned from the
environment variable to be in a URI format (e.g.
``postgres://user:password@localhost:5432/master`` or
``s3://accesskey:secretkey@S3``).
.. _manage-connections-connection-types:
Connection Types
----------------
.. _connection-type-GCP:
Google Cloud Platform
~~~~~~~~~~~~~~~~~~~~~
The Google Cloud Platform connection type enables the :ref:`GCP Integrations
<GCP>`.
Authenticating to GCP
'''''''''''''''''''''
There are two ways to connect to GCP using Airflow.
1. Use `Application Default Credentials
<https://google-auth.readthedocs.io/en/latest/reference/google.auth.html#google.auth.default>`_,
such as via the metadata server when running on Google Compute Engine.
2. Use a `service account
<https://cloud.google.com/docs/authentication/#service_accounts>`_ key
file (JSON format) on disk.
Default Connection IDs
''''''''''''''''''''''
The following connection IDs are used by default.
``bigquery_default``
Used by the :class:`~airflow.contrib.hooks.bigquery_hook.BigQueryHook`
hook.
``google_cloud_datastore_default``
Used by the :class:`~airflow.contrib.hooks.datastore_hook.DatastoreHook`
hook.
``google_cloud_default``
Used by the
:class:`~airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook`,
:class:`~airflow.contrib.hooks.gcp_dataflow_hook.DataFlowHook`,
:class:`~airflow.contrib.hooks.gcp_dataproc_hook.DataProcHook`,
:class:`~airflow.contrib.hooks.gcp_mlengine_hook.MLEngineHook`, and
:class:`~airflow.contrib.hooks.gcs_hook.GoogleCloudStorageHook` hooks.
Configuring the Connection
''''''''''''''''''''''''''
Project Id (required)
The Google Cloud project ID to connect to.
Keyfile Path
Path to a `service account
<https://cloud.google.com/docs/authentication/#service_accounts>`_ key
file (JSON format) on disk.
Not required if using application default credentials.
Keyfile JSON
Contents of a `service account
<https://cloud.google.com/docs/authentication/#service_accounts>`_ key
file (JSON format) on disk. It is recommended to :doc:`Secure your connections <secure-connections>` if using this method to authenticate.
Not required if using application default credentials.
Scopes (comma separated)
A list of comma-separated `Google Cloud scopes
<https://developers.google.com/identity/protocols/googlescopes>`_ to
authenticate with.
.. note::
Scopes are ignored when using application default credentials. See
issue `AIRFLOW-2522
<https://issues.apache.org/jira/browse/AIRFLOW-2522>`_.

Просмотреть файл

@ -1,13 +1,6 @@
Securing Connections
====================
Airflow needs to know how to connect to your environment. Information
such as hostname, port, login and passwords to other systems and services is
handled in the ``Admin->Connection`` section of the UI. The pipeline code you
will author will reference the 'conn_id' of the Connection objects.
.. image:: ../img/connections.png
By default, Airflow will save the passwords for the connection in plain text
within the metadata database. The ``crypto`` package is highly recommended
during installation. The ``crypto`` package does require that your operating

Двоичные данные
docs/img/connection_create.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 41 KiB

Двоичные данные
docs/img/connection_edit.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 52 KiB

Двоичные данные
docs/img/connections.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 91 KiB

После

Ширина:  |  Высота:  |  Размер: 47 KiB

Просмотреть файл

@ -316,6 +316,9 @@ Airflow has extensive support for the Google Cloud Platform. But note that most
Operators are in the contrib section. Meaning that they have a *beta* status, meaning that
they can have breaking changes between minor releases.
See the :ref:`GCP connection type <connection-type-GCP>` documentation to
configure connections to GCP.
Logging
'''''''