Clarified installation docs around worker reqs

This commit is contained in:
Maxime Beauchemin 2016-02-21 07:50:33 -08:00
Родитель 43df15c5f3
Коммит d428cebfc6
1 изменённых файлов: 30 добавлений и 7 удалений

Просмотреть файл

@ -154,14 +154,31 @@ variables and connections.
Scaling Out with Celery
'''''''''''''''''''''''
CeleryExecutor is the way you can scale out the number of workers. For this
``CeleryExecutor`` is one of the ways you can scale out the number of workers. For this
to work, you need to setup a Celery backend (**RabbitMQ**, **Redis**, ...) and
change your ``airflow.cfg`` to point the executor parameter to
CeleryExecutor and provide the related Celery settings.
``CeleryExecutor`` and provide the related Celery settings.
For more information about setting up a Celery broker, refer to the
exhaustive `Celery documentation on the topic <http://docs.celeryproject.org/en/latest/getting-started/brokers/index.html>`_.
Here are a few imperative requirements for your workers:
- ``airflow`` needs to be installed, and the CLI needs to be in the path
- Airflow configuration settings should be homogeneous across the cluster
- Operators that are executed on the worker need to have their dependencies
met in that context. For example, if you use the ``HiveOperator``,
the hive CLI needs to be installed on that box, or if you use the
``MySqlOperator``, the required Python library needs to be available in
the ``PYTHONPATH`` somehow
- The worker needs to have access to its ``DAGS_FOLDER``, and you need to
synchronize the filesystems by your own mean. A common setup would be to
store your DAGS_FOLDER in a Git repository and sync it across machines using
Chef, Puppet, Ansible, or whatever you use to configure machines in your
environment. If all your boxes have a common mount point, having your
pipelines files shared there should work as well
To kick off a worker, you need to setup Airflow and kick off the worker
subcommand
@ -173,13 +190,19 @@ Your worker should start picking up tasks as soon as they get fired in
its direction.
Note that you can also run "Celery Flower", a web UI built on top of Celery,
to monitor your workers.
to monitor your workers. You can use the shortcut command ``airflow flower``
to start a Flower web server.
Logs
''''
Users can specify a logs folder in ``airflow.cfg``. By default, it is in the ``AIRFLOW_HOME`` directory.
Users can specify a logs folder in ``airflow.cfg``. By default, it is in
the ``AIRFLOW_HOME`` directory.
In addition, users can supply an S3 location for storing log backups. If logs are not found in the local filesystem (for example, if a worker is lost or reset), the S3 logs will be displayed in the Airflow UI. Note that logs are only sent to S3 once a task completes (including failure).
In addition, users can supply an S3 location for storing log backups. If
logs are not found in the local filesystem (for example, if a worker is
lost or reset), the S3 logs will be displayed in the Airflow UI. Note that
logs are only sent to S3 once a task completes (including failure).
.. code-block:: bash
@ -189,11 +212,11 @@ In addition, users can supply an S3 location for storing log backups. If logs ar
Scaling Out on Mesos (community contributed)
''''''''''''''''''''''''''''''''''''''''''''
MesosExecutor allows you to schedule airflow tasks on a Mesos cluster.
``MesosExecutor`` allows you to schedule airflow tasks on a Mesos cluster.
For this to work, you need a running mesos cluster and you must perform the following
steps -
1. Install airflow on a machine where webserver and scheduler will run,
1. Install airflow on a machine where web server and scheduler will run,
let's refer to this as the "Airflow server".
2. On the Airflow server, install mesos python eggs from `mesos downloads <http://open.mesosphere.com/downloads/mesos/>`_.
3. On the Airflow server, use a database (such as mysql) which can be accessed from mesos