From 454cde322def26cfd6d18167c5c3c505bf75ab10 Mon Sep 17 00:00:00 2001 From: Cameron Dawson Date: Tue, 3 Apr 2018 17:07:07 -0700 Subject: [PATCH] Bug 1330702 - Improve the Pulse ingestion docs (#3377) --- docs/pulseload.rst | 121 ++++++++++++++++++++++++++++++--------- docs/ui/installation.rst | 2 + 2 files changed, 96 insertions(+), 27 deletions(-) diff --git a/docs/pulseload.rst b/docs/pulseload.rst index c3ba53eb0..47d440293 100644 --- a/docs/pulseload.rst +++ b/docs/pulseload.rst @@ -1,57 +1,124 @@ Loading Pulse data ================== -For ingestion from Pulse exchanges, on your local machine, you can choose +For ingestion from **Pulse** exchanges, on your local machine, you can choose to ingest from any exchange you like. Some exchanges will be registered in ``settings.py`` for use by the Treeherder servers. You can use those to get the same data as Treeherder. Or you can specify your own and experiment with posting your own data. -Configuration -------------- +The Simple Case +--------------- -If you don't want all the sources provided by default in ``settings.py``, you can specify the exchange, the projects, or destinations to read from using an environment variable in your vagrant shell. A working example:: +If you just want to get the same data that Treeherder gets, then you have 3 steps: - export PULSE_DATA_INGESTION_SOURCES='[{"exchange": "exchange/taskcluster-treeherder/v1/jobs", "destinations": ["#"], "projects": ["#"]}]' + 1. Create a user on `Pulse Guardian`_ if you don't already have one + 2. Create your ``PULSE_DATA_INGESTION_CONFIG`` string + 3. Open a Vagrant terminal to read Pushes + 4. Open a Vagrant terminal to read Jobs + 5. Open a Vagrant terminal to run **Celery** -To be able to ingest from exchanges, you need to create a Pulse user with -`Pulse Guardian`_, so -Treeherder can create your Queues for listening to the Pulse exchanges. For -this, you must specify the connection URL in the ``PULSE_DATA_INGESTION_CONFIG`` -environment variable. e.g.:: - export PULSE_DATA_INGESTION_CONFIG="amqp://mypulseuserid:mypassword@pulse.mozilla.org:5671/?ssl=1" +1. Pulse Guardian +~~~~~~~~~~~~~~~~~ -Ingesting Data --------------- +Visit `Pulse Guardian`_, sign in, and create a **Pulse User**. It will ask you to set a +username and password. Remember these as you'll use them in the next step. +Unfortunately, **Pulse** doesn't support creating queues with a guest account, so +this step is necessary. -First, you need to begin the *Celery* queue processing. -Then to get those jobs loaded into Treeherder, start the periodic tasks with -*Celery*. At the minimum, you will need:: +2. Environment Variable +~~~~~~~~~~~~~~~~~~~~~~~ - celery -A treeherder worker -B -Q pushlog,store_pulse_jobs --concurrency 5 +If your **Pulse User** was username: ``foo`` and password: ``bar``, your config +string would be:: -.. note:: It is important to run the ``pushlog`` queue processing as well as ``store_pulse_jobs`` because jobs that come in from pulse for which Treeherder does not already have a push will be skipped. + PULSE_DATA_INGESTION_CONFIG="amqp://foo:bar@pulse.mozilla.org:5671/?ssl=1" -If you want to just run all the Treeherder *Celery* tasks to enable things like -log parsing, etc, then don't specify the ``-Q`` param and it will default to -all:: +3. Read Pushes +~~~~~~~~~~~~~~ - celery -A treeherder worker -B --concurrency 5 +.. note:: Be sure your Vagrant environment is up-to-date. Reload it and run ``vagrant provision`` if you're not sure. -To begin listening to the Pulse exchanges specified above, run this management -command:: +``ssh`` into Vagrant, then set your config environment variable:: + + export PULSE_DATA_INGESTION_CONFIG="amqp://foo:bar@pulse.mozilla.org:5671/?ssl=1" + +Next, run the Treeherder management command to read Pushes from the default **Pulse** +exchange:: + + ./manage.py read_pulse_pushes + +You will see a list of the exchanges it has mounted to and a message for each +push as it is read. This process does not ingest the push into Treeherder. It +adds that Push message to a local **Celery** queue for ingestion. They will be +ingested in step 5. + +4. Read Jobs +~~~~~~~~~~~~ + +As in step 3, open a Vagrant terminal and export your ``PULSE_DATA_INGESTION_CONFIG`` +variable. Then run the following management command:: ./manage.py read_pulse_jobs -Once that is running, you will see jobs start to appear from the Pulse -exchanges. +You will again see the list of exchanges that your queue is now mounted to and +a message for each Job as it is read into your local **Celery** queue. + +5. Celery +~~~~~~~~~ + +Open your next Vagrant terminal. You don't need to set your environment variable +in this one. Just run **Celery**:: + + celery -A treeherder worker -B --concurrency 5 + +That's it! With those processes running, you will begin ingesting Treeherder +data. To see the data, you will need to run the Treeherder UI. +See :ref:`unminified_ui` for more info. + +Advanced Configuration +---------------------- + +Changing which data to ingest +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If you don't want all the sources provided by default in ``settings.py``, you +can specify the exchange(s) to listen to for jobs by modifying +``PULSE_DATA_INGESTION_SOURCES``. For instance, you could specify the projects +as only ``try`` and ``mozilla-central`` by setting:: + + export PULSE_DATA_INGESTION_SOURCES='[{"exchange": "exchange/taskcluster-treeherder/v1/jobs", "destinations": ["#"], "projects": ["try", "mozilla-central"]}]' + +To change which exchanges you listen to for pushes, you would modify +``PULSE_PUSH_SOURCES``. For instance, to get only **Gitbub** pushes for Bugzilla, +you would set:: + + export PULSE_PUSH_SOURCES='[{"exchange": "exchange/taskcluster-github/v1/push","routing_keys": ["bugzilla#"]}]' + +Advanced Celery options +~~~~~~~~~~~~~~~~~~~~~~~ + +If you only want to ingest the Pushes and Jobs, but don't care about log parsing +and all the other processing Treeherder does, then you can minimize the **Celery** +task. You will need:: + + celery -A treeherder worker -B -Q pushlog,store_pulse_jobs,store_pulse_resultsets --concurrency 5 + +* The ``pushlog`` queue loads up to the last 10 Mercurial pushes that exist. +* The ``store_pulse_resultsets`` queue will ingest all the pushes from the exchanges + specified in ``PULSE_PUSH_SOURCES``. This can be Mercurial and Github +* The ``store_pulse_jobs`` queue will ingest all the jobs from the exchanges + specified in ``PULSE_DATA_INGESTION_SOURCES``. + +.. note:: Any job that comes from **Pulse** that does not have an associated push will be skipped. +.. note:: It is slightly confusing to see ``store_pulse_resultsets`` there. It is there for legacy reasons and will change to ``store_pulse_pushes`` at some point. Posting Data ------------ -To post data to your own pulse exchange, you can use the ``publish_to_pulse`` +To post data to your own **Pulse** exchange, you can use the ``publish_to_pulse`` management command. This command takes the ``routing_key``, ``connection_url`` and ``payload_file``. The payload file must be a ``JSON`` representation of a job as specified in the `YML Schema`_. diff --git a/docs/ui/installation.rst b/docs/ui/installation.rst index 7218b8686..e08c18500 100644 --- a/docs/ui/installation.rst +++ b/docs/ui/installation.rst @@ -40,6 +40,8 @@ production site. You do not need to set up the Vagrant VM, but login will be una This will run the unminified UI using ```` as the service domain. +.. _unminified_ui: + Running the unminified UI with Vagrant -------------------------------------- You may also run the unminified UI using the full treeherder Vagrant project.