2015-03-22 23:38:49 +03:00
|
|
|
Installation
|
|
|
|
------------
|
|
|
|
Setting up the sandbox from the :doc:`start` section was easy, now
|
|
|
|
working towards a production grade environment is a bit more work.
|
|
|
|
|
2015-06-12 22:31:33 +03:00
|
|
|
Note that Airflow is only
|
|
|
|
tested under Python 2.7.* as many of our dependencies don't support
|
|
|
|
python3 (as of 2015-06).
|
|
|
|
|
2015-03-22 23:38:49 +03:00
|
|
|
Extra Packages
|
|
|
|
''''''''''''''
|
2015-03-15 02:01:11 +03:00
|
|
|
The ``airflow`` PyPI basic package only installs what's needed to get started.
|
2015-03-22 23:38:49 +03:00
|
|
|
Subpackages can be installed depending on what will be useful in your
|
|
|
|
environment. For instance, if you don't need connectivity with Postgres,
|
2015-03-15 02:01:11 +03:00
|
|
|
you won't have to go through the trouble of install the ``postgres-devel`` yum
|
2015-03-22 23:38:49 +03:00
|
|
|
package, or whatever equivalent on the distribution you are using.
|
|
|
|
|
|
|
|
Behind the scene, we do conditional imports on operators that require
|
|
|
|
these extra dependencies.
|
|
|
|
|
|
|
|
Here's the list of the subpackages and that they enable:
|
|
|
|
|
|
|
|
+-------------+------------------------------------+---------------------------------------+
|
|
|
|
| subpackage | install command | enables |
|
|
|
|
+=============+====================================+=======================================+
|
2015-03-15 02:01:11 +03:00
|
|
|
| mysql | ``pip install airflow[mysql]`` | MySQL operators and hook, support as |
|
2015-03-22 23:38:49 +03:00
|
|
|
| | | an Airflow backend |
|
|
|
|
+-------------+------------------------------------+---------------------------------------+
|
2015-03-15 02:01:11 +03:00
|
|
|
| postgres | ``pip install airflow[postgres]`` | Postgres operators and hook, support |
|
2015-03-22 23:38:49 +03:00
|
|
|
| | | as an Airflow backend |
|
|
|
|
+-------------+------------------------------------+---------------------------------------+
|
2015-03-15 02:01:11 +03:00
|
|
|
| samba | ``pip install airflow[samba]`` | ``Hive2SambaOperator`` |
|
2015-03-22 23:38:49 +03:00
|
|
|
+-------------+------------------------------------+---------------------------------------+
|
2015-03-15 02:01:11 +03:00
|
|
|
| s3 | ``pip install airflow[s3]`` | ``S3KeySensor``, ``S3PrefixSensor`` |
|
2015-03-22 23:38:49 +03:00
|
|
|
+-------------+------------------------------------+---------------------------------------+
|
2015-03-15 02:01:11 +03:00
|
|
|
| all | ``pip install airflow[all]`` | All Airflow features known to man |
|
2015-03-22 23:38:49 +03:00
|
|
|
+-------------+------------------------------------+---------------------------------------+
|
2014-10-13 09:05:34 +04:00
|
|
|
|
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
Setting up a Backend
|
|
|
|
''''''''''''''''''''
|
|
|
|
If you want to take a real test drive of Airflow, you should consider
|
|
|
|
setting up a real database backend and switching to the LocalExecutor.
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
As Airflow was built to interact with its metadata using the great SqlAlchemy
|
|
|
|
library, you should be able to use any database backend supported as a
|
|
|
|
SqlAlchemy backend. We recommend using **MySQL** or **Postgres**.
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
Once you've setup your database to host Airflow, you'll need to alter the
|
2015-02-02 22:12:04 +03:00
|
|
|
SqlAlchemy connection string located in your configuration file
|
|
|
|
``$AIRFLOW_HOME/airflow.cfg``. You should then also change the "executor"
|
2015-01-18 02:27:01 +03:00
|
|
|
setting to use "LocalExecutor", an executor that can parallelize task
|
|
|
|
instances locally.
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-02-02 22:12:04 +03:00
|
|
|
.. code-block:: bash
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
# initialize the database
|
|
|
|
airflow initdb
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
Connections
|
|
|
|
'''''''''''
|
|
|
|
Airflow needs to know how to connect to your environment. Information
|
|
|
|
such as hostname, port, login and password to other systems and services is
|
2015-02-02 22:12:04 +03:00
|
|
|
handled ``Admin->Connection`` section of the UI. The pipeline code you will
|
2015-01-18 02:27:01 +03:00
|
|
|
author will reference the 'conn_id' of the Connection objects.
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
.. image:: img/connections.png
|
2014-10-13 09:05:34 +04:00
|
|
|
|
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
Scaling Out
|
|
|
|
'''''''''''
|
|
|
|
CeleryExecutor is the way you can scale out the number of workers. For this
|
2015-02-02 22:12:04 +03:00
|
|
|
to work, you need to setup a Celery backend (**RabbitMQ**, **Redis**, ...) and
|
|
|
|
change your ``airflow.cfg`` to point the executor parameter to
|
2015-01-18 02:27:01 +03:00
|
|
|
CeleryExecutor and provide the related Celery settings.
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
To kick off a worker, you need to setup Airflow and quick off the worker
|
|
|
|
subcommand
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-02-02 22:12:04 +03:00
|
|
|
.. code-block:: bash
|
2014-10-15 05:28:29 +04:00
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
airflow worker
|
|
|
|
|
|
|
|
Your worker should start picking up tasks as soon as they get fired up in
|
|
|
|
its direction.
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
Note that you can also run "Celery Flower" a web UI build on top of Celery
|
|
|
|
to monitor your workers.
|
2015-03-15 02:01:11 +03:00
|
|
|
|
|
|
|
|
|
|
|
Web Authentication
|
|
|
|
''''''''''''''''''
|
|
|
|
|
|
|
|
By default, all gates are opened. An easy way to restrict access
|
|
|
|
to the web application is to do it at the network level, or by using
|
|
|
|
ssh tunnels.
|
|
|
|
|
|
|
|
However, it is possible to switch on
|
|
|
|
authentication and define exactly how your users should login
|
|
|
|
into your Airflow environment. Airflow uses ``flask_login`` and
|
|
|
|
exposes a set of hooks in the ``airflow.default_login`` module. You can
|
|
|
|
alter the content of this module by overriding it as a ``airflow_login``
|
|
|
|
module. To do this, you would typically copy/paste ``airflow.default_login``
|
|
|
|
in a ``airflow_login.py`` and put it directly in your ``PYTHONPATH``.
|