Improvements to ingesting data locally - doc and docker changes (#7134)

* Update docker-shared-user for pulse_url and add PROJECTS_TO_INGEST to backend container
* Update docs to make them clearer
* Fix exception caught in pytest.raises
This commit is contained in:
Sarah Clements 2021-05-13 14:20:07 -07:00 коммит произвёл GitHub
Родитель 84b2974f5b
Коммит 0b4602c8a3
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
4 изменённых файлов: 32 добавлений и 38 удалений

Просмотреть файл

@ -26,6 +26,7 @@ services:
- SITE_URL=http://backend:8000/
- TREEHERDER_DEBUG=True
- NEW_RELIC_INSIGHTS_API_KEY=${NEW_RELIC_INSIGHTS_API_KEY:-}
- PROJECTS_TO_INGEST=${PROJECTS_TO_INGEST:-autoland,try}
entrypoint: './docker/entrypoint.sh'
# We *ONLY* initialize the data when we're running the backend
command: './initialize_data.sh ./manage.py runserver 0.0.0.0:8000'
@ -102,7 +103,7 @@ services:
context: .
dockerfile: docker/dev.Dockerfile
environment:
- PULSE_URL=${PULSE_URL:-amqp://docker-shared-user:8r5VFxpJHtJahTVV5bYutykgDsXhGF@pulse.mozilla.org:5671/?ssl=1}
- PULSE_URL=${PULSE_URL:-amqp://docker-shared-user:oGv7P5%H94@pulse.mozilla.org:5671/?ssl=1}
- LOGGING_LEVEL=INFO
- PULSE_AUTO_DELETE_QUEUES=True
- DATABASE_URL=mysql://root@mysql:3306/treeherder
@ -123,7 +124,7 @@ services:
environment:
- CELERY_BROKER_URL=amqp://guest:guest@rabbitmq:5672//
- DATABASE_URL=mysql://root@mysql:3306/treeherder
- PROJECTS_TO_INGEST=${PROJECTS_TO_INGEST:-autoland}
- PROJECTS_TO_INGEST=${PROJECTS_TO_INGEST:-autoland,try}
entrypoint: './docker/entrypoint.sh'
command: celery worker -A treeherder --uid=nobody --gid=nogroup --without-gossip --without-mingle --without-heartbeat -Q store_pulse_pushes,store_pulse_tasks --concurrency=1 --loglevel=WARNING
volumes:

Просмотреть файл

@ -67,6 +67,9 @@ To get started:
### Starting a local Treeherder instance
By default, data will be ingested from autoland and try using a shared pulse account. However, if you want to use your own pulse account or change the repositories being ingested, you need to
export those env variables in the shell first or inline with the command below. See [Pulse Ingestion Configuration](pulseload.md#pulse-ingestion-configuration) for more details.
- Open a shell, cd into the root of the Treeherder repository, and type:
```bash
@ -78,10 +81,7 @@ To get started:
- Visit <http://localhost:5000> in your browser (NB: not port 8000).
Both Django's runserver and webpack-dev-server will automatically refresh every time there's a change in the code.
<!-- prettier-ignore -->
!!! note
There will be no data to display until the ingestion tasks are run.
Proceed to [Running the ingestion tasks](#running-the-ingestion-tasks) to get data.
### Using the minified UI
@ -106,6 +106,8 @@ production UI from `.build/` instead. In addition to being minified and using th
non-debug versions of React, the assets are served with the same `Content-Security-Policy`
header as production.
Proceed to [Running the ingestion tasks](#running-the-ingestion-tasks) to get data.
### Running full stack with a custom DB setting
If you want to develop both the frontend and backend, but have the database pointing to
@ -121,7 +123,7 @@ Alternatively, you can `export` that value in your terminal prior to executing
`docker-compose up` or just specify it on the command line as you execute:
```bash
DATABASE_URL=mysql://user:password@hostname/treeherder docker-compose up
DATABASE_URL=mysql://user:password@hostname/treeherder SKIP_INGESTION=True docker-compose up
```
<!-- prettier-ignore -->
@ -143,19 +145,28 @@ docker volume rm treeherder_mysql_data
### Running the ingestion tasks
Ingestion tasks populate the database with version control push logs, queued/running/completed jobs & output from log parsing, as well as maintain a cache of intermittent failure bugs. To run these:
Celery tasks include storing of pushes, tasks and parsing logs (which provides failure lines and performance data) and generating alerts (for Perfherder). You can either run all the queues or run only specific queues.
- Start up a celery worker to process async tasks:
Open a new shell tab. To run all the queues type:
```bash
docker-compose run backend celery -A treeherder worker --concurrency 1
```
```bash
docker-compose run -e PROJECTS_TO_INGEST=autoland backend celery -A treeherder worker --concurrency 1
```
- Then in a new terminal window, run `docker-compose run backend bash`, and follow the steps from the [loading pulse data](pulseload.md) page.
<!-- prettier-ignore -->
!!! note
If you skip the `PROJECTS_TO_INGEST` flag, this command will store pushes and tasks from all repositories and will take a very long time for data to load. However, you can change it to ingest from other repositories if you wish.
You can find a list of different celery queues in the the CELERY_TASK_QUEUES variable in the [settings] file. For instance, to
only store tasks and pushes and omit log parsing:
```bash
docker-compose run -e PROJECTS_TO_INGEST=autoland backend celery -A treeherder worker -Q store_pulse_tasks,store_pulse_pushes --concurrency 1
```
### Manual ingestion
`NOTE`; You have to include `--root-url https://community-tc.services.mozilla.com` in order to ingest from the [Taskcluster Community instance](https://community-tc.services.mozilla.com), otherwise, it will default to the Firefox CI.
`NOTE`: You have to include `--root-url https://community-tc.services.mozilla.com` in order to ingest from the [Taskcluster Community instance](https://community-tc.services.mozilla.com), otherwise, it will default to the Firefox CI.
Open a terminal window and run `docker-compose up`. All following sections assume this step.
@ -216,3 +227,4 @@ Continue to **Working with the Server** section after looking at the [Code Style
[yarn]: https://yarnpkg.com/en/docs/install
[package.json]: https://github.com/mozilla/treeherder/blob/master/package.json
[eslint]: https://eslint.org
[settings]: https://github.com/mozilla/treeherder/blob/master/treeherder/config/settings.py#L318

Просмотреть файл

@ -1,7 +1,7 @@
# Loading Pulse data
# Pulse Ingestion Configuration
By default, running the Docker container with `docker-compose up` will ingest data
from the `autoland` repo using a shared [Pulse Guardian] user. You can configure this the following ways:
from the `autoland` and `try` repositories using a shared [Pulse Guardian] user. You can configure this the following ways:
1. Specify a custom set of repositories for which to ingest data
2. Create a custom **Pulse User** on [Pulse Guardian]
@ -42,26 +42,6 @@ See [Starting a local Treeherder instance] for more info.
[starting a local treeherder instance]: installation.md#starting-a-local-treeherder-instance
## Advanced Celery Configuration
If you only want to ingest the Pushes and Tasks, then the default will do that for you.
But if you want to do other processing like parsing logs, etc, then you can specify the other queues
you would like to process.
Open a new terminal window. To run all the queues do:
```bash
docker-compose run backend celery -A treeherder worker --concurrency 1
```
You will see a list of activated queues. If you wanted to narrow that down, then note
which queues you'd like to run and add them to a comma-separated list. For instance, to
only do Log Parsing for sheriffed trees (autoland, mozilla-*):
```bash
docker-compose run backend celery -A treeherder worker -Q log_parser,log_parser_fail_raw_sheriffed,log_parser_fail_json_sheriffed --concurrency 1
```
## Posting Data
To post data to your own **Pulse** exchange, you can use the `publish_to_pulse`
@ -82,3 +62,4 @@ ex: <https://community-tc.services.mozilla.com/pulse-messages/>
[pulse guardian]: https://pulseguardian.mozilla.org/whats_pulse
[yml schema]: https://github.com/mozilla/treeherder/blob/master/schemas/pulse-job.yml
[settings]: https://github.com/mozilla/treeherder/blob/master/treeherder/config/settings.py#L318

Просмотреть файл

@ -5,10 +5,10 @@ import pytest
import responses
import slugid
from treeherder.etl.exceptions import MissingPushException
from treeherder.etl.job_loader import JobLoader
from treeherder.etl.taskcluster_pulse.handler import handleMessage
from treeherder.model.models import Job, JobLog, TaskclusterMetadata
from django.core.exceptions import ObjectDoesNotExist
@pytest.fixture
@ -233,7 +233,7 @@ def test_ingest_pulse_jobs_with_missing_push(pulse_jobs):
status=200,
)
with pytest.raises(MissingPushException):
with pytest.raises(ObjectDoesNotExist):
for pulse_job in pulse_jobs:
jl.process_job(pulse_job, 'https://firefox-ci-tc.services.mozilla.com')