2015-03-22 23:38:49 +03:00
|
|
|
Installation
|
|
|
|
------------
|
|
|
|
Setting up the sandbox from the :doc:`start` section was easy, now
|
|
|
|
working towards a production grade environment is a bit more work.
|
|
|
|
|
|
|
|
Extra Packages
|
|
|
|
''''''''''''''
|
|
|
|
The `airflow` PyPI basic package only installs what's needed to get started.
|
|
|
|
Subpackages can be installed depending on what will be useful in your
|
|
|
|
environment. For instance, if you don't need connectivity with Postgres,
|
|
|
|
you won't have to go through the trouble of install the `postgres-devel` yum
|
|
|
|
package, or whatever equivalent on the distribution you are using.
|
|
|
|
|
|
|
|
Behind the scene, we do conditional imports on operators that require
|
|
|
|
these extra dependencies.
|
|
|
|
|
|
|
|
Here's the list of the subpackages and that they enable:
|
|
|
|
|
|
|
|
+-------------+------------------------------------+---------------------------------------+
|
|
|
|
| subpackage | install command | enables |
|
|
|
|
+=============+====================================+=======================================+
|
|
|
|
| mysql | pip install airflow[mysql] | MySQL operators and hook, support as |
|
|
|
|
| | | an Airflow backend |
|
|
|
|
+-------------+------------------------------------+---------------------------------------+
|
|
|
|
| postgres | pip install airflow[postgres] | Postgres operators and hook, support |
|
|
|
|
| | | as an Airflow backend |
|
|
|
|
+-------------+------------------------------------+---------------------------------------+
|
|
|
|
| samba | pip install airflow[samba] | Hive2SambaOperator |
|
|
|
|
+-------------+------------------------------------+---------------------------------------+
|
|
|
|
| s3 | pip install airflow[s3] | S3KeySensor, S3PrefixSensor |
|
|
|
|
+-------------+------------------------------------+---------------------------------------+
|
|
|
|
| all | pip install airflow[all] | All Airflow features known to man |
|
|
|
|
+-------------+------------------------------------+---------------------------------------+
|
2014-10-13 09:05:34 +04:00
|
|
|
|
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
Setting up a Backend
|
|
|
|
''''''''''''''''''''
|
|
|
|
If you want to take a real test drive of Airflow, you should consider
|
|
|
|
setting up a real database backend and switching to the LocalExecutor.
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
As Airflow was built to interact with its metadata using the great SqlAlchemy
|
|
|
|
library, you should be able to use any database backend supported as a
|
|
|
|
SqlAlchemy backend. We recommend using **MySQL** or **Postgres**.
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
Once you've setup your database to host Airflow, you'll need to alter the
|
2015-02-02 22:12:04 +03:00
|
|
|
SqlAlchemy connection string located in your configuration file
|
|
|
|
``$AIRFLOW_HOME/airflow.cfg``. You should then also change the "executor"
|
2015-01-18 02:27:01 +03:00
|
|
|
setting to use "LocalExecutor", an executor that can parallelize task
|
|
|
|
instances locally.
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-02-02 22:12:04 +03:00
|
|
|
.. code-block:: bash
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
# initialize the database
|
|
|
|
airflow initdb
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
Connections
|
|
|
|
'''''''''''
|
|
|
|
Airflow needs to know how to connect to your environment. Information
|
|
|
|
such as hostname, port, login and password to other systems and services is
|
2015-02-02 22:12:04 +03:00
|
|
|
handled ``Admin->Connection`` section of the UI. The pipeline code you will
|
2015-01-18 02:27:01 +03:00
|
|
|
author will reference the 'conn_id' of the Connection objects.
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
.. image:: img/connections.png
|
2014-10-13 09:05:34 +04:00
|
|
|
|
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
Scaling Out
|
|
|
|
'''''''''''
|
|
|
|
CeleryExecutor is the way you can scale out the number of workers. For this
|
2015-02-02 22:12:04 +03:00
|
|
|
to work, you need to setup a Celery backend (**RabbitMQ**, **Redis**, ...) and
|
|
|
|
change your ``airflow.cfg`` to point the executor parameter to
|
2015-01-18 02:27:01 +03:00
|
|
|
CeleryExecutor and provide the related Celery settings.
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
To kick off a worker, you need to setup Airflow and quick off the worker
|
|
|
|
subcommand
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-02-02 22:12:04 +03:00
|
|
|
.. code-block:: bash
|
2014-10-15 05:28:29 +04:00
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
airflow worker
|
|
|
|
|
|
|
|
Your worker should start picking up tasks as soon as they get fired up in
|
|
|
|
its direction.
|
2014-10-13 09:05:34 +04:00
|
|
|
|
2015-01-18 02:27:01 +03:00
|
|
|
Note that you can also run "Celery Flower" a web UI build on top of Celery
|
|
|
|
to monitor your workers.
|