Clarified installation docs around worker reqs
This commit is contained in:
Родитель
43df15c5f3
Коммит
d428cebfc6
|
@ -154,14 +154,31 @@ variables and connections.
|
||||||
|
|
||||||
Scaling Out with Celery
|
Scaling Out with Celery
|
||||||
'''''''''''''''''''''''
|
'''''''''''''''''''''''
|
||||||
CeleryExecutor is the way you can scale out the number of workers. For this
|
``CeleryExecutor`` is one of the ways you can scale out the number of workers. For this
|
||||||
to work, you need to setup a Celery backend (**RabbitMQ**, **Redis**, ...) and
|
to work, you need to setup a Celery backend (**RabbitMQ**, **Redis**, ...) and
|
||||||
change your ``airflow.cfg`` to point the executor parameter to
|
change your ``airflow.cfg`` to point the executor parameter to
|
||||||
CeleryExecutor and provide the related Celery settings.
|
``CeleryExecutor`` and provide the related Celery settings.
|
||||||
|
|
||||||
For more information about setting up a Celery broker, refer to the
|
For more information about setting up a Celery broker, refer to the
|
||||||
exhaustive `Celery documentation on the topic <http://docs.celeryproject.org/en/latest/getting-started/brokers/index.html>`_.
|
exhaustive `Celery documentation on the topic <http://docs.celeryproject.org/en/latest/getting-started/brokers/index.html>`_.
|
||||||
|
|
||||||
|
Here are a few imperative requirements for your workers:
|
||||||
|
|
||||||
|
- ``airflow`` needs to be installed, and the CLI needs to be in the path
|
||||||
|
- Airflow configuration settings should be homogeneous across the cluster
|
||||||
|
- Operators that are executed on the worker need to have their dependencies
|
||||||
|
met in that context. For example, if you use the ``HiveOperator``,
|
||||||
|
the hive CLI needs to be installed on that box, or if you use the
|
||||||
|
``MySqlOperator``, the required Python library needs to be available in
|
||||||
|
the ``PYTHONPATH`` somehow
|
||||||
|
- The worker needs to have access to its ``DAGS_FOLDER``, and you need to
|
||||||
|
synchronize the filesystems by your own mean. A common setup would be to
|
||||||
|
store your DAGS_FOLDER in a Git repository and sync it across machines using
|
||||||
|
Chef, Puppet, Ansible, or whatever you use to configure machines in your
|
||||||
|
environment. If all your boxes have a common mount point, having your
|
||||||
|
pipelines files shared there should work as well
|
||||||
|
|
||||||
|
|
||||||
To kick off a worker, you need to setup Airflow and kick off the worker
|
To kick off a worker, you need to setup Airflow and kick off the worker
|
||||||
subcommand
|
subcommand
|
||||||
|
|
||||||
|
@ -173,13 +190,19 @@ Your worker should start picking up tasks as soon as they get fired in
|
||||||
its direction.
|
its direction.
|
||||||
|
|
||||||
Note that you can also run "Celery Flower", a web UI built on top of Celery,
|
Note that you can also run "Celery Flower", a web UI built on top of Celery,
|
||||||
to monitor your workers.
|
to monitor your workers. You can use the shortcut command ``airflow flower``
|
||||||
|
to start a Flower web server.
|
||||||
|
|
||||||
|
|
||||||
Logs
|
Logs
|
||||||
''''
|
''''
|
||||||
Users can specify a logs folder in ``airflow.cfg``. By default, it is in the ``AIRFLOW_HOME`` directory.
|
Users can specify a logs folder in ``airflow.cfg``. By default, it is in
|
||||||
|
the ``AIRFLOW_HOME`` directory.
|
||||||
|
|
||||||
In addition, users can supply an S3 location for storing log backups. If logs are not found in the local filesystem (for example, if a worker is lost or reset), the S3 logs will be displayed in the Airflow UI. Note that logs are only sent to S3 once a task completes (including failure).
|
In addition, users can supply an S3 location for storing log backups. If
|
||||||
|
logs are not found in the local filesystem (for example, if a worker is
|
||||||
|
lost or reset), the S3 logs will be displayed in the Airflow UI. Note that
|
||||||
|
logs are only sent to S3 once a task completes (including failure).
|
||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
|
@ -189,11 +212,11 @@ In addition, users can supply an S3 location for storing log backups. If logs ar
|
||||||
|
|
||||||
Scaling Out on Mesos (community contributed)
|
Scaling Out on Mesos (community contributed)
|
||||||
''''''''''''''''''''''''''''''''''''''''''''
|
''''''''''''''''''''''''''''''''''''''''''''
|
||||||
MesosExecutor allows you to schedule airflow tasks on a Mesos cluster.
|
``MesosExecutor`` allows you to schedule airflow tasks on a Mesos cluster.
|
||||||
For this to work, you need a running mesos cluster and you must perform the following
|
For this to work, you need a running mesos cluster and you must perform the following
|
||||||
steps -
|
steps -
|
||||||
|
|
||||||
1. Install airflow on a machine where webserver and scheduler will run,
|
1. Install airflow on a machine where web server and scheduler will run,
|
||||||
let's refer to this as the "Airflow server".
|
let's refer to this as the "Airflow server".
|
||||||
2. On the Airflow server, install mesos python eggs from `mesos downloads <http://open.mesosphere.com/downloads/mesos/>`_.
|
2. On the Airflow server, install mesos python eggs from `mesos downloads <http://open.mesosphere.com/downloads/mesos/>`_.
|
||||||
3. On the Airflow server, use a database (such as mysql) which can be accessed from mesos
|
3. On the Airflow server, use a database (such as mysql) which can be accessed from mesos
|
||||||
|
|
Загрузка…
Ссылка в новой задаче