[AIRFLOW-2523] Add how-to for managing GCP connections
I'd like to have how-to guides for all connection types, or at least the different categories of connection types. I found it difficult to figure out how to manage a GCP connection, this commit add a how-to guide for this. Also, since creating and editing connections really aren't all that different, the PR renames the "creating connections" how-to to "managing connections". Closes #3419 from tswast/howto
This commit is contained in:
Родитель
66f00bbf7b
Коммит
4c0d67f0d0
|
@ -308,6 +308,8 @@ UI. As slots free up, queued tasks start running based on the
|
|||
Note that by default tasks aren't assigned to any pool and their
|
||||
execution parallelism is only limited to the executor's setting.
|
||||
|
||||
.. _concepts-connections:
|
||||
|
||||
Connections
|
||||
===========
|
||||
|
||||
|
@ -324,16 +326,12 @@ from ``BaseHook``, Airflow will choose one connection randomly, allowing
|
|||
for some basic load balancing and fault tolerance when used in conjunction
|
||||
with retries.
|
||||
|
||||
Airflow also has the ability to reference connections via environment
|
||||
variables from the operating system. The environment variable needs to be
|
||||
prefixed with ``AIRFLOW_CONN_`` to be considered a connection. When
|
||||
referencing the connection in the Airflow pipeline, the ``conn_id`` should
|
||||
be the name of the variable without the prefix. For example, if the ``conn_id``
|
||||
is named ``postgres_master`` the environment variable should be named
|
||||
``AIRFLOW_CONN_POSTGRES_MASTER`` (note that the environment variable must be
|
||||
all uppercase). Airflow assumes the value returned from the environment
|
||||
variable to be in a URI format (e.g.
|
||||
``postgres://user:password@localhost:5432/master`` or ``s3://accesskey:secretkey@S3``).
|
||||
Many hooks have a default ``conn_id``, where operators using that hook do not
|
||||
need to supply an explicit connection ID. For example, the default
|
||||
``conn_id`` for the :class:`~airflow.hooks.postgres_hook.PostgresHook` is
|
||||
``postgres_default``.
|
||||
|
||||
See :doc:`howto/manage-connections` for how to create and manage connections.
|
||||
|
||||
Queues
|
||||
======
|
||||
|
@ -410,7 +408,7 @@ Variables
|
|||
Variables are a generic way to store and retrieve arbitrary content or
|
||||
settings as a simple key value store within Airflow. Variables can be
|
||||
listed, created, updated and deleted from the UI (``Admin -> Variables``),
|
||||
code or CLI. In addition, json settings files can be bulk uploaded through
|
||||
code or CLI. In addition, json settings files can be bulk uploaded through
|
||||
the UI. While your pipeline code definition and most of your constants
|
||||
and variables should be defined in code and stored in source control,
|
||||
it can be useful to have some variables or configuration items
|
||||
|
@ -427,18 +425,18 @@ The second call assumes ``json`` content and will be deserialized into
|
|||
``bar``. Note that ``Variable`` is a sqlalchemy model and can be used
|
||||
as such.
|
||||
|
||||
You can use a variable from a jinja template with the syntax :
|
||||
You can use a variable from a jinja template with the syntax :
|
||||
|
||||
.. code:: bash
|
||||
|
||||
echo {{ var.value.<variable_name> }}
|
||||
|
||||
or if you need to deserialize a json object from the variable :
|
||||
|
||||
or if you need to deserialize a json object from the variable :
|
||||
|
||||
.. code:: bash
|
||||
|
||||
echo {{ var.json.<variable_name> }}
|
||||
|
||||
|
||||
|
||||
Branching
|
||||
=========
|
||||
|
|
|
@ -1,8 +0,0 @@
|
|||
Creating a Connection
|
||||
=====================
|
||||
|
||||
Connections in Airflow pipelines can be created using environment variables.
|
||||
The environment variable needs to have a prefix of ``AIRFLOW_CONN_`` for
|
||||
Airflow with the value in a URI format to use the connection properly. Please
|
||||
see the :doc:`../../concepts` documentation for more information on environment
|
||||
variables and connections.
|
|
@ -12,8 +12,8 @@ configuring an Airflow environment.
|
|||
|
||||
set-config
|
||||
initialize-database
|
||||
manage-connections
|
||||
secure-connections
|
||||
create-connection
|
||||
write-logs
|
||||
executor/use-celery
|
||||
executor/use-dask
|
||||
|
|
|
@ -0,0 +1,135 @@
|
|||
Managing Connections
|
||||
=====================
|
||||
|
||||
Airflow needs to know how to connect to your environment. Information
|
||||
such as hostname, port, login and passwords to other systems and services is
|
||||
handled in the ``Admin->Connection`` section of the UI. The pipeline code you
|
||||
will author will reference the 'conn_id' of the Connection objects.
|
||||
|
||||
.. image:: ../img/connections.png
|
||||
|
||||
Connections can be created and managed using either the UI or environment
|
||||
variables.
|
||||
|
||||
See the :ref:`Connenctions Concepts <concepts-connections>` documentation for
|
||||
more information.
|
||||
|
||||
Creating a Connection with the UI
|
||||
---------------------------------
|
||||
|
||||
Open the ``Admin->Connection`` section of the UI. Click the ``Create`` link
|
||||
to create a new connection.
|
||||
|
||||
.. image:: ../img/connection_create.png
|
||||
|
||||
1. Fill in the ``Conn Id`` field with the desired connection ID. It is
|
||||
recommended that you use lower-case characters and separate words with
|
||||
underscores.
|
||||
2. Choose the connection type with the ``Conn Type`` field.
|
||||
3. Fill in the remaining fields. See
|
||||
:ref:`manage-connections-connection-types` for a description of the fields
|
||||
belonging to the different connection types.
|
||||
4. Click the ``Save`` button to create the connection.
|
||||
|
||||
Editing a Connection with the UI
|
||||
--------------------------------
|
||||
|
||||
Open the ``Admin->Connection`` section of the UI. Click the pencil icon next
|
||||
to the connection you wish to edit in the connection list.
|
||||
|
||||
.. image:: ../img/connection_edit.png
|
||||
|
||||
Modify the connection properties and click the ``Save`` button to save your
|
||||
changes.
|
||||
|
||||
Creating a Connection with Environment Variables
|
||||
------------------------------------------------
|
||||
|
||||
Connections in Airflow pipelines can be created using environment variables.
|
||||
The environment variable needs to have a prefix of ``AIRFLOW_CONN_`` for
|
||||
Airflow with the value in a URI format to use the connection properly.
|
||||
|
||||
When referencing the connection in the Airflow pipeline, the ``conn_id``
|
||||
should be the name of the variable without the prefix. For example, if the
|
||||
``conn_id`` is named ``postgres_master`` the environment variable should be
|
||||
named ``AIRFLOW_CONN_POSTGRES_MASTER`` (note that the environment variable
|
||||
must be all uppercase). Airflow assumes the value returned from the
|
||||
environment variable to be in a URI format (e.g.
|
||||
``postgres://user:password@localhost:5432/master`` or
|
||||
``s3://accesskey:secretkey@S3``).
|
||||
|
||||
.. _manage-connections-connection-types:
|
||||
|
||||
Connection Types
|
||||
----------------
|
||||
|
||||
.. _connection-type-GCP:
|
||||
|
||||
Google Cloud Platform
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The Google Cloud Platform connection type enables the :ref:`GCP Integrations
|
||||
<GCP>`.
|
||||
|
||||
Authenticating to GCP
|
||||
'''''''''''''''''''''
|
||||
|
||||
There are two ways to connect to GCP using Airflow.
|
||||
|
||||
1. Use `Application Default Credentials
|
||||
<https://google-auth.readthedocs.io/en/latest/reference/google.auth.html#google.auth.default>`_,
|
||||
such as via the metadata server when running on Google Compute Engine.
|
||||
2. Use a `service account
|
||||
<https://cloud.google.com/docs/authentication/#service_accounts>`_ key
|
||||
file (JSON format) on disk.
|
||||
|
||||
Default Connection IDs
|
||||
''''''''''''''''''''''
|
||||
|
||||
The following connection IDs are used by default.
|
||||
|
||||
``bigquery_default``
|
||||
Used by the :class:`~airflow.contrib.hooks.bigquery_hook.BigQueryHook`
|
||||
hook.
|
||||
|
||||
``google_cloud_datastore_default``
|
||||
Used by the :class:`~airflow.contrib.hooks.datastore_hook.DatastoreHook`
|
||||
hook.
|
||||
|
||||
``google_cloud_default``
|
||||
Used by the
|
||||
:class:`~airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook`,
|
||||
:class:`~airflow.contrib.hooks.gcp_dataflow_hook.DataFlowHook`,
|
||||
:class:`~airflow.contrib.hooks.gcp_dataproc_hook.DataProcHook`,
|
||||
:class:`~airflow.contrib.hooks.gcp_mlengine_hook.MLEngineHook`, and
|
||||
:class:`~airflow.contrib.hooks.gcs_hook.GoogleCloudStorageHook` hooks.
|
||||
|
||||
Configuring the Connection
|
||||
''''''''''''''''''''''''''
|
||||
|
||||
Project Id (required)
|
||||
The Google Cloud project ID to connect to.
|
||||
|
||||
Keyfile Path
|
||||
Path to a `service account
|
||||
<https://cloud.google.com/docs/authentication/#service_accounts>`_ key
|
||||
file (JSON format) on disk.
|
||||
|
||||
Not required if using application default credentials.
|
||||
|
||||
Keyfile JSON
|
||||
Contents of a `service account
|
||||
<https://cloud.google.com/docs/authentication/#service_accounts>`_ key
|
||||
file (JSON format) on disk. It is recommended to :doc:`Secure your connections <secure-connections>` if using this method to authenticate.
|
||||
|
||||
Not required if using application default credentials.
|
||||
|
||||
Scopes (comma separated)
|
||||
A list of comma-separated `Google Cloud scopes
|
||||
<https://developers.google.com/identity/protocols/googlescopes>`_ to
|
||||
authenticate with.
|
||||
|
||||
.. note::
|
||||
Scopes are ignored when using application default credentials. See
|
||||
issue `AIRFLOW-2522
|
||||
<https://issues.apache.org/jira/browse/AIRFLOW-2522>`_.
|
|
@ -1,13 +1,6 @@
|
|||
Securing Connections
|
||||
====================
|
||||
|
||||
Airflow needs to know how to connect to your environment. Information
|
||||
such as hostname, port, login and passwords to other systems and services is
|
||||
handled in the ``Admin->Connection`` section of the UI. The pipeline code you
|
||||
will author will reference the 'conn_id' of the Connection objects.
|
||||
|
||||
.. image:: ../img/connections.png
|
||||
|
||||
By default, Airflow will save the passwords for the connection in plain text
|
||||
within the metadata database. The ``crypto`` package is highly recommended
|
||||
during installation. The ``crypto`` package does require that your operating
|
||||
|
|
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 41 KiB |
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 52 KiB |
Двоичные данные
docs/img/connections.png
Двоичные данные
docs/img/connections.png
Двоичный файл не отображается.
До Ширина: | Высота: | Размер: 91 KiB После Ширина: | Высота: | Размер: 47 KiB |
|
@ -316,6 +316,9 @@ Airflow has extensive support for the Google Cloud Platform. But note that most
|
|||
Operators are in the contrib section. Meaning that they have a *beta* status, meaning that
|
||||
they can have breaking changes between minor releases.
|
||||
|
||||
See the :ref:`GCP connection type <connection-type-GCP>` documentation to
|
||||
configure connections to GCP.
|
||||
|
||||
Logging
|
||||
'''''''
|
||||
|
||||
|
|
Загрузка…
Ссылка в новой задаче