Clarified installation docs around worker reqs
This commit is contained in:
Родитель
43df15c5f3
Коммит
d428cebfc6
|
@ -154,14 +154,31 @@ variables and connections.
|
|||
|
||||
Scaling Out with Celery
|
||||
'''''''''''''''''''''''
|
||||
CeleryExecutor is the way you can scale out the number of workers. For this
|
||||
``CeleryExecutor`` is one of the ways you can scale out the number of workers. For this
|
||||
to work, you need to setup a Celery backend (**RabbitMQ**, **Redis**, ...) and
|
||||
change your ``airflow.cfg`` to point the executor parameter to
|
||||
CeleryExecutor and provide the related Celery settings.
|
||||
``CeleryExecutor`` and provide the related Celery settings.
|
||||
|
||||
For more information about setting up a Celery broker, refer to the
|
||||
exhaustive `Celery documentation on the topic <http://docs.celeryproject.org/en/latest/getting-started/brokers/index.html>`_.
|
||||
|
||||
Here are a few imperative requirements for your workers:
|
||||
|
||||
- ``airflow`` needs to be installed, and the CLI needs to be in the path
|
||||
- Airflow configuration settings should be homogeneous across the cluster
|
||||
- Operators that are executed on the worker need to have their dependencies
|
||||
met in that context. For example, if you use the ``HiveOperator``,
|
||||
the hive CLI needs to be installed on that box, or if you use the
|
||||
``MySqlOperator``, the required Python library needs to be available in
|
||||
the ``PYTHONPATH`` somehow
|
||||
- The worker needs to have access to its ``DAGS_FOLDER``, and you need to
|
||||
synchronize the filesystems by your own mean. A common setup would be to
|
||||
store your DAGS_FOLDER in a Git repository and sync it across machines using
|
||||
Chef, Puppet, Ansible, or whatever you use to configure machines in your
|
||||
environment. If all your boxes have a common mount point, having your
|
||||
pipelines files shared there should work as well
|
||||
|
||||
|
||||
To kick off a worker, you need to setup Airflow and kick off the worker
|
||||
subcommand
|
||||
|
||||
|
@ -173,13 +190,19 @@ Your worker should start picking up tasks as soon as they get fired in
|
|||
its direction.
|
||||
|
||||
Note that you can also run "Celery Flower", a web UI built on top of Celery,
|
||||
to monitor your workers.
|
||||
to monitor your workers. You can use the shortcut command ``airflow flower``
|
||||
to start a Flower web server.
|
||||
|
||||
|
||||
Logs
|
||||
''''
|
||||
Users can specify a logs folder in ``airflow.cfg``. By default, it is in the ``AIRFLOW_HOME`` directory.
|
||||
Users can specify a logs folder in ``airflow.cfg``. By default, it is in
|
||||
the ``AIRFLOW_HOME`` directory.
|
||||
|
||||
In addition, users can supply an S3 location for storing log backups. If logs are not found in the local filesystem (for example, if a worker is lost or reset), the S3 logs will be displayed in the Airflow UI. Note that logs are only sent to S3 once a task completes (including failure).
|
||||
In addition, users can supply an S3 location for storing log backups. If
|
||||
logs are not found in the local filesystem (for example, if a worker is
|
||||
lost or reset), the S3 logs will be displayed in the Airflow UI. Note that
|
||||
logs are only sent to S3 once a task completes (including failure).
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
|
@ -189,11 +212,11 @@ In addition, users can supply an S3 location for storing log backups. If logs ar
|
|||
|
||||
Scaling Out on Mesos (community contributed)
|
||||
''''''''''''''''''''''''''''''''''''''''''''
|
||||
MesosExecutor allows you to schedule airflow tasks on a Mesos cluster.
|
||||
``MesosExecutor`` allows you to schedule airflow tasks on a Mesos cluster.
|
||||
For this to work, you need a running mesos cluster and you must perform the following
|
||||
steps -
|
||||
|
||||
1. Install airflow on a machine where webserver and scheduler will run,
|
||||
1. Install airflow on a machine where web server and scheduler will run,
|
||||
let's refer to this as the "Airflow server".
|
||||
2. On the Airflow server, install mesos python eggs from `mesos downloads <http://open.mesosphere.com/downloads/mesos/>`_.
|
||||
3. On the Airflow server, use a database (such as mysql) which can be accessed from mesos
|
||||
|
|
Загрузка…
Ссылка в новой задаче