Bug 1330702 - Improve the Pulse ingestion docs (#3377)

This commit is contained in:
Cameron Dawson 2018-04-03 17:07:07 -07:00 коммит произвёл GitHub
Родитель 82f1200e35
Коммит 454cde322d
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
2 изменённых файлов: 96 добавлений и 27 удалений

Просмотреть файл

@ -1,57 +1,124 @@
Loading Pulse data
==================
For ingestion from Pulse exchanges, on your local machine, you can choose
For ingestion from **Pulse** exchanges, on your local machine, you can choose
to ingest from any exchange you like. Some exchanges will be registered in
``settings.py`` for use by the Treeherder servers. You can use those to get the
same data as Treeherder. Or you can specify your own and experiment with
posting your own data.
Configuration
-------------
The Simple Case
---------------
If you don't want all the sources provided by default in ``settings.py``, you can specify the exchange, the projects, or destinations to read from using an environment variable in your vagrant shell. A working example::
If you just want to get the same data that Treeherder gets, then you have 3 steps:
export PULSE_DATA_INGESTION_SOURCES='[{"exchange": "exchange/taskcluster-treeherder/v1/jobs", "destinations": ["#"], "projects": ["#"]}]'
1. Create a user on `Pulse Guardian`_ if you don't already have one
2. Create your ``PULSE_DATA_INGESTION_CONFIG`` string
3. Open a Vagrant terminal to read Pushes
4. Open a Vagrant terminal to read Jobs
5. Open a Vagrant terminal to run **Celery**
To be able to ingest from exchanges, you need to create a Pulse user with
`Pulse Guardian`_, so
Treeherder can create your Queues for listening to the Pulse exchanges. For
this, you must specify the connection URL in the ``PULSE_DATA_INGESTION_CONFIG``
environment variable. e.g.::
export PULSE_DATA_INGESTION_CONFIG="amqp://mypulseuserid:mypassword@pulse.mozilla.org:5671/?ssl=1"
1. Pulse Guardian
~~~~~~~~~~~~~~~~~
Ingesting Data
--------------
Visit `Pulse Guardian`_, sign in, and create a **Pulse User**. It will ask you to set a
username and password. Remember these as you'll use them in the next step.
Unfortunately, **Pulse** doesn't support creating queues with a guest account, so
this step is necessary.
First, you need to begin the *Celery* queue processing.
Then to get those jobs loaded into Treeherder, start the periodic tasks with
*Celery*. At the minimum, you will need::
2. Environment Variable
~~~~~~~~~~~~~~~~~~~~~~~
celery -A treeherder worker -B -Q pushlog,store_pulse_jobs --concurrency 5
If your **Pulse User** was username: ``foo`` and password: ``bar``, your config
string would be::
.. note:: It is important to run the ``pushlog`` queue processing as well as ``store_pulse_jobs`` because jobs that come in from pulse for which Treeherder does not already have a push will be skipped.
PULSE_DATA_INGESTION_CONFIG="amqp://foo:bar@pulse.mozilla.org:5671/?ssl=1"
If you want to just run all the Treeherder *Celery* tasks to enable things like
log parsing, etc, then don't specify the ``-Q`` param and it will default to
all::
3. Read Pushes
~~~~~~~~~~~~~~
celery -A treeherder worker -B --concurrency 5
.. note:: Be sure your Vagrant environment is up-to-date. Reload it and run ``vagrant provision`` if you're not sure.
To begin listening to the Pulse exchanges specified above, run this management
command::
``ssh`` into Vagrant, then set your config environment variable::
export PULSE_DATA_INGESTION_CONFIG="amqp://foo:bar@pulse.mozilla.org:5671/?ssl=1"
Next, run the Treeherder management command to read Pushes from the default **Pulse**
exchange::
./manage.py read_pulse_pushes
You will see a list of the exchanges it has mounted to and a message for each
push as it is read. This process does not ingest the push into Treeherder. It
adds that Push message to a local **Celery** queue for ingestion. They will be
ingested in step 5.
4. Read Jobs
~~~~~~~~~~~~
As in step 3, open a Vagrant terminal and export your ``PULSE_DATA_INGESTION_CONFIG``
variable. Then run the following management command::
./manage.py read_pulse_jobs
Once that is running, you will see jobs start to appear from the Pulse
exchanges.
You will again see the list of exchanges that your queue is now mounted to and
a message for each Job as it is read into your local **Celery** queue.
5. Celery
~~~~~~~~~
Open your next Vagrant terminal. You don't need to set your environment variable
in this one. Just run **Celery**::
celery -A treeherder worker -B --concurrency 5
That's it! With those processes running, you will begin ingesting Treeherder
data. To see the data, you will need to run the Treeherder UI.
See :ref:`unminified_ui` for more info.
Advanced Configuration
----------------------
Changing which data to ingest
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you don't want all the sources provided by default in ``settings.py``, you
can specify the exchange(s) to listen to for jobs by modifying
``PULSE_DATA_INGESTION_SOURCES``. For instance, you could specify the projects
as only ``try`` and ``mozilla-central`` by setting::
export PULSE_DATA_INGESTION_SOURCES='[{"exchange": "exchange/taskcluster-treeherder/v1/jobs", "destinations": ["#"], "projects": ["try", "mozilla-central"]}]'
To change which exchanges you listen to for pushes, you would modify
``PULSE_PUSH_SOURCES``. For instance, to get only **Gitbub** pushes for Bugzilla,
you would set::
export PULSE_PUSH_SOURCES='[{"exchange": "exchange/taskcluster-github/v1/push","routing_keys": ["bugzilla#"]}]'
Advanced Celery options
~~~~~~~~~~~~~~~~~~~~~~~
If you only want to ingest the Pushes and Jobs, but don't care about log parsing
and all the other processing Treeherder does, then you can minimize the **Celery**
task. You will need::
celery -A treeherder worker -B -Q pushlog,store_pulse_jobs,store_pulse_resultsets --concurrency 5
* The ``pushlog`` queue loads up to the last 10 Mercurial pushes that exist.
* The ``store_pulse_resultsets`` queue will ingest all the pushes from the exchanges
specified in ``PULSE_PUSH_SOURCES``. This can be Mercurial and Github
* The ``store_pulse_jobs`` queue will ingest all the jobs from the exchanges
specified in ``PULSE_DATA_INGESTION_SOURCES``.
.. note:: Any job that comes from **Pulse** that does not have an associated push will be skipped.
.. note:: It is slightly confusing to see ``store_pulse_resultsets`` there. It is there for legacy reasons and will change to ``store_pulse_pushes`` at some point.
Posting Data
------------
To post data to your own pulse exchange, you can use the ``publish_to_pulse``
To post data to your own **Pulse** exchange, you can use the ``publish_to_pulse``
management command. This command takes the ``routing_key``, ``connection_url``
and ``payload_file``. The payload file must be a ``JSON`` representation of
a job as specified in the `YML Schema`_.

Просмотреть файл

@ -40,6 +40,8 @@ production site. You do not need to set up the Vagrant VM, but login will be una
This will run the unminified UI using ``<url>`` as the service domain.
.. _unminified_ui:
Running the unminified UI with Vagrant
--------------------------------------
You may also run the unminified UI using the full treeherder Vagrant project.