2019-01-22 19:27:56 +03:00
|
|
|
# Loading Pulse data
|
2015-09-25 23:31:45 +03:00
|
|
|
|
2018-04-04 03:07:07 +03:00
|
|
|
For ingestion from **Pulse** exchanges, on your local machine, you can choose
|
2019-01-22 19:27:56 +03:00
|
|
|
to ingest from any exchange you like. Some exchanges will be registered in
|
2019-07-23 23:24:13 +03:00
|
|
|
`settings.py` for use by the Treeherder servers. You can use those to get the
|
2019-01-22 19:27:56 +03:00
|
|
|
same data as Treeherder. Or you can specify your own and experiment with
|
2015-09-25 23:31:45 +03:00
|
|
|
posting your own data.
|
|
|
|
|
2019-01-22 19:27:56 +03:00
|
|
|
## The Simple Case
|
2015-09-25 23:31:45 +03:00
|
|
|
|
2018-04-04 03:07:07 +03:00
|
|
|
If you just want to get the same data that Treeherder gets, then you have 3 steps:
|
2015-09-25 23:31:45 +03:00
|
|
|
|
2019-01-22 19:27:56 +03:00
|
|
|
1. Create a user on [Pulse Guardian] if you don't already have one
|
|
|
|
2. Create your `PULSE_URL` string
|
2019-05-17 00:56:02 +03:00
|
|
|
3. Run a backend Docker container to read Pushes
|
|
|
|
4. Run a backend Docker container to read Jobs
|
|
|
|
5. Run a backend Docker container for **Celery**
|
2015-09-25 23:31:45 +03:00
|
|
|
|
2018-07-27 19:41:11 +03:00
|
|
|
### 1. Pulse Guardian
|
2015-09-25 23:31:45 +03:00
|
|
|
|
2019-01-22 19:27:56 +03:00
|
|
|
Visit [Pulse Guardian], sign in, and create a **Pulse User**. It will ask you to set a
|
|
|
|
username and password. Remember these as you'll use them in the next step.
|
2018-04-04 03:07:07 +03:00
|
|
|
Unfortunately, **Pulse** doesn't support creating queues with a guest account, so
|
|
|
|
this step is necessary.
|
2015-09-25 23:31:45 +03:00
|
|
|
|
2018-07-27 19:41:11 +03:00
|
|
|
### 2. Environment Variable
|
2015-09-25 23:31:45 +03:00
|
|
|
|
2019-01-22 19:27:56 +03:00
|
|
|
If your **Pulse User** was username: `foo` and password: `bar`, your config
|
2018-07-27 19:41:11 +03:00
|
|
|
string would be:
|
2015-09-25 23:31:45 +03:00
|
|
|
|
2019-05-17 00:56:02 +03:00
|
|
|
`amqp://foo:bar@pulse.mozilla.org:5671/?ssl=1`
|
2015-09-25 23:31:45 +03:00
|
|
|
|
2018-07-27 19:41:11 +03:00
|
|
|
### 3. Read Pushes
|
|
|
|
|
2019-07-23 23:24:13 +03:00
|
|
|
On the **host machine**, set your Pulse config environment variable, so that it's available
|
|
|
|
for docker-compose to use:
|
2018-04-04 03:07:07 +03:00
|
|
|
|
2018-07-27 19:41:11 +03:00
|
|
|
```bash
|
2018-08-01 18:33:02 +03:00
|
|
|
export PULSE_URL="amqp://foo:bar@pulse.mozilla.org:5671/?ssl=1"
|
2018-07-27 19:41:11 +03:00
|
|
|
```
|
2018-04-04 03:07:07 +03:00
|
|
|
|
|
|
|
Next, run the Treeherder management command to read Pushes from the default **Pulse**
|
2018-07-27 19:41:11 +03:00
|
|
|
exchange:
|
2018-04-04 03:07:07 +03:00
|
|
|
|
2018-07-27 19:41:11 +03:00
|
|
|
```bash
|
2019-05-17 00:56:02 +03:00
|
|
|
docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_pushes
|
2018-07-27 19:41:11 +03:00
|
|
|
```
|
2018-04-04 03:07:07 +03:00
|
|
|
|
|
|
|
You will see a list of the exchanges it has mounted to and a message for each
|
2019-01-22 19:27:56 +03:00
|
|
|
push as it is read. This process does not ingest the push into Treeherder. It
|
|
|
|
adds that Push message to a local **Celery** queue for ingestion. They will be
|
2018-04-04 03:07:07 +03:00
|
|
|
ingested in step 5.
|
2015-09-25 23:31:45 +03:00
|
|
|
|
2018-07-27 19:41:11 +03:00
|
|
|
### 4. Read Jobs
|
2018-04-04 03:07:07 +03:00
|
|
|
|
2019-05-17 00:56:02 +03:00
|
|
|
As in step 3, open a new terminal and export your `PULSE_URL` variable.
|
|
|
|
|
|
|
|
Then run the management command for listing to jobs:
|
2015-09-25 23:31:45 +03:00
|
|
|
|
2018-07-27 19:41:11 +03:00
|
|
|
```bash
|
2019-05-17 00:56:02 +03:00
|
|
|
docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_jobs
|
2018-07-27 19:41:11 +03:00
|
|
|
```
|
2015-09-25 23:31:45 +03:00
|
|
|
|
2018-04-04 03:07:07 +03:00
|
|
|
You will again see the list of exchanges that your queue is now mounted to and
|
|
|
|
a message for each Job as it is read into your local **Celery** queue.
|
|
|
|
|
2018-07-27 19:41:11 +03:00
|
|
|
### 5. Celery
|
2018-04-04 03:07:07 +03:00
|
|
|
|
2019-05-17 00:56:02 +03:00
|
|
|
Open your next terminal. You don't need to set your environment variable
|
2019-01-22 19:27:56 +03:00
|
|
|
in this one. Just run **Celery**:
|
2018-04-04 03:07:07 +03:00
|
|
|
|
2018-07-27 19:41:11 +03:00
|
|
|
```bash
|
2019-05-17 00:56:02 +03:00
|
|
|
docker-compose run backend celery -A treeherder worker -B --concurrency 5
|
2018-07-27 19:41:11 +03:00
|
|
|
```
|
2018-04-04 03:07:07 +03:00
|
|
|
|
2019-01-22 19:27:56 +03:00
|
|
|
That's it! With those processes running, you will begin ingesting Treeherder
|
|
|
|
data. To see the data, you will need to run the Treeherder UI and API.
|
2019-05-17 00:56:02 +03:00
|
|
|
See [Starting a local Treeherder instance] for more info.
|
2018-07-27 19:41:11 +03:00
|
|
|
|
2019-05-17 00:56:02 +03:00
|
|
|
[starting a local treeherder instance]: installation.md#starting-a-local-treeherder-instance
|
2018-04-04 03:07:07 +03:00
|
|
|
|
2019-01-22 19:27:56 +03:00
|
|
|
## Advanced Configuration
|
2018-04-04 03:07:07 +03:00
|
|
|
|
2018-08-03 11:29:47 +03:00
|
|
|
### Changing which Data to Ingest
|
2018-04-04 03:07:07 +03:00
|
|
|
|
2019-01-22 19:27:56 +03:00
|
|
|
`treeherder.services.pulse.sources` provides default sources for both Jobs and Pushes.
|
2018-04-04 03:07:07 +03:00
|
|
|
|
2018-08-03 11:29:47 +03:00
|
|
|
#### Pushes
|
2019-01-22 19:27:56 +03:00
|
|
|
|
|
|
|
`push_sources` defines a list of exchanges with routing keys.
|
2018-08-15 21:17:49 +03:00
|
|
|
It's rare you'll need to change this so it's not configurable via the environment.
|
|
|
|
However if you wanted to, say, only get pushes to GitHub you would edit the list to look like this:
|
2019-01-22 19:27:56 +03:00
|
|
|
|
2018-08-15 21:17:49 +03:00
|
|
|
```python
|
|
|
|
push_sources = [
|
|
|
|
"exchange/taskcluster-github/v1/push.#",
|
|
|
|
]
|
2018-08-03 11:29:47 +03:00
|
|
|
```
|
|
|
|
|
|
|
|
#### Jobs
|
2019-01-22 19:27:56 +03:00
|
|
|
|
2019-07-23 23:24:13 +03:00
|
|
|
Job Exchanges and Projects are defined in `job_sources`, however can
|
|
|
|
also be configured in the environment like so:
|
|
|
|
|
|
|
|
`PULSE_JOB_SOURCES` defines a list of exchanges with projects.
|
2019-01-22 19:27:56 +03:00
|
|
|
|
2018-08-03 11:29:47 +03:00
|
|
|
```bash
|
2019-07-23 23:24:13 +03:00
|
|
|
export PULSE_JOB_SOURCES="exchange/taskcluster-treeherder/v1/jobs.mozilla-central:mozilla-inbound",
|
2018-07-27 19:41:11 +03:00
|
|
|
```
|
2018-04-04 03:07:07 +03:00
|
|
|
|
2019-07-23 23:24:13 +03:00
|
|
|
In this example we've defined one exchange:
|
|
|
|
|
|
|
|
- `exchange/taskcluster-treeherder/v1/jobs`
|
|
|
|
|
|
|
|
The taskcluster-treeherder exchange defines two projects:
|
2018-08-17 15:42:48 +03:00
|
|
|
|
2019-07-23 23:24:13 +03:00
|
|
|
- `mozilla-central`
|
|
|
|
- `mozilla-inbound`
|
2018-08-17 15:42:48 +03:00
|
|
|
|
2019-01-22 19:27:56 +03:00
|
|
|
When Jobs are read from Pulse and added to Treeherder's celery queue we generate a routing key by prepending `#.` to each project key.
|
2018-08-03 11:29:47 +03:00
|
|
|
|
2018-07-27 19:41:11 +03:00
|
|
|
### Advanced Celery options
|
2018-04-04 03:07:07 +03:00
|
|
|
|
|
|
|
If you only want to ingest the Pushes and Jobs, but don't care about log parsing
|
|
|
|
and all the other processing Treeherder does, then you can minimize the **Celery**
|
2019-01-22 19:27:56 +03:00
|
|
|
task. You will need:
|
2018-04-04 03:07:07 +03:00
|
|
|
|
2018-07-27 19:41:11 +03:00
|
|
|
```bash
|
2019-02-26 18:49:20 +03:00
|
|
|
celery -A treeherder worker -B -Q pushlog,store_pulse_jobs,store_pulse_pushes --concurrency 5
|
2018-07-27 19:41:11 +03:00
|
|
|
```
|
2018-04-04 03:07:07 +03:00
|
|
|
|
2019-01-22 19:27:56 +03:00
|
|
|
- The `pushlog` queue loads up to the last 10 Mercurial pushes that exist.
|
2019-02-26 18:49:20 +03:00
|
|
|
- The `store_pulse_pushes` queue will ingest all the pushes from the exchanges
|
2019-01-22 19:27:56 +03:00
|
|
|
specified in `push_sources`. This can be Mercurial and Github
|
|
|
|
- The `store_pulse_jobs` queue will ingest all the jobs from the exchanges
|
|
|
|
specified in `job_sources` (or `PULSE_JOB_SOURCES`).
|
2018-04-04 03:07:07 +03:00
|
|
|
|
2019-02-06 22:34:08 +03:00
|
|
|
<!-- prettier-ignore -->
|
|
|
|
!!! note
|
|
|
|
Any job that comes from Pulse that does not have an associated push will be skipped.
|
|
|
|
|
2019-01-22 19:27:56 +03:00
|
|
|
## Posting Data
|
2015-09-25 23:31:45 +03:00
|
|
|
|
2019-01-22 19:27:56 +03:00
|
|
|
To post data to your own **Pulse** exchange, you can use the `publish_to_pulse`
|
|
|
|
management command. This command takes the `routing_key`, `connection_url`
|
|
|
|
and `payload_file`. The payload file must be a `JSON` representation of
|
2018-07-27 19:41:11 +03:00
|
|
|
a job as specified in the [YML Schema].
|
2015-09-25 23:31:45 +03:00
|
|
|
|
2018-07-27 19:41:11 +03:00
|
|
|
Here is a set of example parameters that could be used to run it:
|
2015-09-25 23:31:45 +03:00
|
|
|
|
2018-07-27 19:41:11 +03:00
|
|
|
```bash
|
|
|
|
./manage.py publish_to_pulse mozilla-inbound.staging amqp://treeherder-test:mypassword@pulse.mozilla.org:5672/ ./scratch/test_job.json
|
|
|
|
```
|
2015-09-25 23:31:45 +03:00
|
|
|
|
2018-07-27 19:41:11 +03:00
|
|
|
You can use the handy [Pulse Inspector] to view messages in your exchange to
|
2015-09-25 23:31:45 +03:00
|
|
|
test that they are arriving at Pulse the way you expect.
|
|
|
|
|
2019-01-22 19:27:56 +03:00
|
|
|
[pulse guardian]: https://pulseguardian.mozilla.org/whats_pulse
|
|
|
|
[pulse inspector]: https://tools.taskcluster.net/pulse-inspector/
|
|
|
|
[yml schema]: https://github.com/mozilla/treeherder/blob/master/schemas/pulse-job.yml
|