Clarified installation docs around worker reqs

This commit is contained in:
Maxime Beauchemin 2016-02-21 07:50:33 -08:00
Родитель 43df15c5f3
Коммит d428cebfc6
1 изменённых файлов: 30 добавлений и 7 удалений

Просмотреть файл

@ -154,14 +154,31 @@ variables and connections.
Scaling Out with Celery Scaling Out with Celery
''''''''''''''''''''''' '''''''''''''''''''''''
CeleryExecutor is the way you can scale out the number of workers. For this ``CeleryExecutor`` is one of the ways you can scale out the number of workers. For this
to work, you need to setup a Celery backend (**RabbitMQ**, **Redis**, ...) and to work, you need to setup a Celery backend (**RabbitMQ**, **Redis**, ...) and
change your ``airflow.cfg`` to point the executor parameter to change your ``airflow.cfg`` to point the executor parameter to
CeleryExecutor and provide the related Celery settings. ``CeleryExecutor`` and provide the related Celery settings.
For more information about setting up a Celery broker, refer to the For more information about setting up a Celery broker, refer to the
exhaustive `Celery documentation on the topic <http://docs.celeryproject.org/en/latest/getting-started/brokers/index.html>`_. exhaustive `Celery documentation on the topic <http://docs.celeryproject.org/en/latest/getting-started/brokers/index.html>`_.
Here are a few imperative requirements for your workers:
- ``airflow`` needs to be installed, and the CLI needs to be in the path
- Airflow configuration settings should be homogeneous across the cluster
- Operators that are executed on the worker need to have their dependencies
met in that context. For example, if you use the ``HiveOperator``,
the hive CLI needs to be installed on that box, or if you use the
``MySqlOperator``, the required Python library needs to be available in
the ``PYTHONPATH`` somehow
- The worker needs to have access to its ``DAGS_FOLDER``, and you need to
synchronize the filesystems by your own mean. A common setup would be to
store your DAGS_FOLDER in a Git repository and sync it across machines using
Chef, Puppet, Ansible, or whatever you use to configure machines in your
environment. If all your boxes have a common mount point, having your
pipelines files shared there should work as well
To kick off a worker, you need to setup Airflow and kick off the worker To kick off a worker, you need to setup Airflow and kick off the worker
subcommand subcommand
@ -173,13 +190,19 @@ Your worker should start picking up tasks as soon as they get fired in
its direction. its direction.
Note that you can also run "Celery Flower", a web UI built on top of Celery, Note that you can also run "Celery Flower", a web UI built on top of Celery,
to monitor your workers. to monitor your workers. You can use the shortcut command ``airflow flower``
to start a Flower web server.
Logs Logs
'''' ''''
Users can specify a logs folder in ``airflow.cfg``. By default, it is in the ``AIRFLOW_HOME`` directory. Users can specify a logs folder in ``airflow.cfg``. By default, it is in
the ``AIRFLOW_HOME`` directory.
In addition, users can supply an S3 location for storing log backups. If logs are not found in the local filesystem (for example, if a worker is lost or reset), the S3 logs will be displayed in the Airflow UI. Note that logs are only sent to S3 once a task completes (including failure). In addition, users can supply an S3 location for storing log backups. If
logs are not found in the local filesystem (for example, if a worker is
lost or reset), the S3 logs will be displayed in the Airflow UI. Note that
logs are only sent to S3 once a task completes (including failure).
.. code-block:: bash .. code-block:: bash
@ -189,11 +212,11 @@ In addition, users can supply an S3 location for storing log backups. If logs ar
Scaling Out on Mesos (community contributed) Scaling Out on Mesos (community contributed)
'''''''''''''''''''''''''''''''''''''''''''' ''''''''''''''''''''''''''''''''''''''''''''
MesosExecutor allows you to schedule airflow tasks on a Mesos cluster. ``MesosExecutor`` allows you to schedule airflow tasks on a Mesos cluster.
For this to work, you need a running mesos cluster and you must perform the following For this to work, you need a running mesos cluster and you must perform the following
steps - steps -
1. Install airflow on a machine where webserver and scheduler will run, 1. Install airflow on a machine where web server and scheduler will run,
let's refer to this as the "Airflow server". let's refer to this as the "Airflow server".
2. On the Airflow server, install mesos python eggs from `mesos downloads <http://open.mesosphere.com/downloads/mesos/>`_. 2. On the Airflow server, install mesos python eggs from `mesos downloads <http://open.mesosphere.com/downloads/mesos/>`_.
3. On the Airflow server, use a database (such as mysql) which can be accessed from mesos 3. On the Airflow server, use a database (such as mysql) which can be accessed from mesos