* Make changes to docs
* Add cert for prototype connections
* Add TLS_CERT_PATH variable to docker yaml file
* Change troubleshooting and database sections of docs
This commit is contained in:
Sarah Clements 2021-06-22 16:49:41 -07:00 коммит произвёл GitHub
Родитель c8154f8a85
Коммит 091d422ca0
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
7 изменённых файлов: 71 добавлений и 7 удалений

Просмотреть файл

@ -0,0 +1,21 @@
-----BEGIN CERTIFICATE-----
MIIDfzCCAmegAwIBAgIBADANBgkqhkiG9w0BAQsFADB3MS0wKwYDVQQuEyQwNzZm
MWMzNS0xNGM0LTRiYjktYjMzMS0wYTg4ZGU3YWMwYjQxIzAhBgNVBAMTGkdvb2ds
ZSBDbG91ZCBTUUwgU2VydmVyIENBMRQwEgYDVQQKEwtHb29nbGUsIEluYzELMAkG
A1UEBhMCVVMwHhcNMjEwNDIxMTU1NjI0WhcNMzEwNDE5MTU1NzI0WjB3MS0wKwYD
VQQuEyQwNzZmMWMzNS0xNGM0LTRiYjktYjMzMS0wYTg4ZGU3YWMwYjQxIzAhBgNV
BAMTGkdvb2dsZSBDbG91ZCBTUUwgU2VydmVyIENBMRQwEgYDVQQKEwtHb29nbGUs
IEluYzELMAkGA1UEBhMCVVMwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIB
AQCsaU0vLnfh6hKWATYzRO0ucyE/KzWyRdRZJPMCM0Ol1HoVruiZVVCh8z7C6dMj
pAXjkrYxe7mYafQaTivqOqVsCg85mpClD/8k6Q///tM0E0qjISlw31fFYc2Q8U3c
1+4fflB2mfItHghWJkLxoBdWy9TKL2nslApHBzo5CL/fdLjpB5USDREi1UcF5OSx
VcHYaCHaTzMU+dQQ8BHdzdebxAi9MqxzgQCXrWk2U5PYLf9jbHMV4p0m/p6rp4vm
lKIuR4e4fnAbPJTuUoYrAh7NWKzfg2oAxh7VrnxZoZRXgCcFRNVkbTTy/YKLJ/wt
ktafU+tOjhXW0v/dbpJolbjXAgMBAAGjFjAUMBIGA1UdEwEB/wQIMAYBAf8CAQAw
DQYJKoZIhvcNAQELBQADggEBAISQCe48Jn+d8OtGpK3gtEmSJoqMJJGCuIJfadAM
J4SDCPeFll8tBsohHHps6ABdcIgk9UmqX1xEfirt/Ea/eTnjDc4T8rZ4qdHtcEbb
tRJf6ECNrab2ZBcJ5b9ooLXOHktuUc3FHSwISoDToUHUm3oLwFoICuhSZJSW/AC4
jdSl52ACG5NHis6LhCXhq2C1SbZNMWj0nICK+i7PRbeb5373DWkrUS642Nfa4cSX
MhMzYQdjQ1S0zCA8KMZc89F/I5+U597VBczxMqg92etWvfiFDeRMGCnXuiB660O+
k936C7hSGNuFDi8c0ZSNH9sbYFST1rkGOncmK/D+W2ZJfvc=
-----END CERTIFICATE-----

Просмотреть файл

@ -27,6 +27,7 @@ services:
- TREEHERDER_DEBUG=True
- NEW_RELIC_INSIGHTS_API_KEY=${NEW_RELIC_INSIGHTS_API_KEY:-}
- PROJECTS_TO_INGEST=${PROJECTS_TO_INGEST:-autoland,try}
- TLS_CERT_PATH=${TLS_CERT_PATH}
entrypoint: './docker/entrypoint.sh'
# We *ONLY* initialize the data when we're running the backend
command: './initialize_data.sh ./manage.py runserver 0.0.0.0:8000'

Просмотреть файл

@ -105,11 +105,13 @@ other data sets available there.
## Direct database access
If the use-cases above aren't sufficient or you're working on a fullstack Perfherder bug,
we can provide read-only access to Treeherder's production MySQL replica.
Please [file a bug] requesting that someone from the cloudOps team [grant access to the read-only replica].
we can provide read-only access to Treeherder's stage MySQL replica.
Please [file a bug] requesting that someone from the cloudOps team grant access to the read-only stage replica.
Be sure to follow the instructions for [connecting to the databases](#connecting-to-databases) if you're using it
outside of the docker container.
For users with permission to access the prototype database locally, you'll need to export `TLS_CERT_PATH='deployment/gcp/ca-cert-prototype.pem'`, or pass it as an argument along with the DATABASE_URL, so that SSL connections are made with the appropriate certificate in the docker container.
<!-- prettier-ignore -->
!!! note
You won't be able to login when using a read-only replica like the above.
@ -124,11 +126,13 @@ settings for each database, speeding up future use and reducing the chance of fo
to enable TLS.
When setting up a connection make sure to change the "Use SSL" option to `require` and set
the "SSL CA File" option to point at the AWS public CA certificate, which for convenience can
be used [directly from the Treeherder repository][gcp-cert].
the "SSL CA File" option to point at the public CA certificate, which for convenience can
be used directly from the Treeherder repository [here][gcp-cert] for the stage replica or
[here][gcp-prototype-cert] for prototype.
[MySQL workbench]: https://www.MySQL.com/products/workbench/
[gcp-cert]: https://github.com/mozilla/treeherder/blob/master/deployment/gcp/ca-cert.pem
[gcp-prototype-cert]: https://github.com/mozilla/treeherder/blob/master/deployment/gcp/ca-cert-prototype.pem
## Import performance data from upstream

Просмотреть файл

@ -7,7 +7,6 @@ that makes working on the docs locally much easier.
```console
% pip install poetry
% poetry install --extras "docs"
% poetry run mkdocs serve
```

Просмотреть файл

@ -36,7 +36,7 @@ These are managed by cloudOps and any deletion of data or access to replica or p
### Granting access to the read-only replica
One of the ways in which we allow users to [access Treeherder data](../accessing_data.md)
is via direct access to our read-only MySQL replica. Mozilla's
is via direct access to our read-only MySQL stage replica (only a restricted group of performance test team members and Treeherder maintainers should have access to the prototype database). Mozilla's
ReDash instance use this approach. Only cloudOps can grant access.
Generate the password like this:

Просмотреть файл

@ -34,3 +34,40 @@ For less urgent issues or general support, you can file a bug with [cloudOps](ht
- [all other deployments](https://console.cloud.google.com/kubernetes/list?project=moz-fx-treeherde-nonprod-34ec)
Most useful information can be found by clicking the workload tab and clicking on any "pod", which could be a cron job, celery task
or the application. Select any one of those to access the container logs (select Container logs)
## Scenarios
A general approach to troubleshooting is to look in New Relic in the errors tab for treeherder-production and the gcp console (logs can be found in the console). For specific data ingestion issues, follow the steps below:
### Celery queue backlogs
If push, task or log parsing is slow or has stopped, it could indicate a backlog with any of the associated workers or it could
indicate some other error.
1. A cloudOps team member should check CloudAMQP "RabbitMQ Manager" dashboard to check the per-queue breakdown
of incoming and delivery message rates.
2. Check New Relic's "Error Analytics" section, in case tasks are failing and being
retried due to a Python exception.
3. In the New Relic's "Transactions" section, switch to the "Non-web" transactions view
(or use the direct links above), and click the relevant Celery task to see if there
has been a change in either throughput or time per task.
4. Depending on the information discovered above, you may want to try scaling resources or fixing any errors
causing the backlogged queues.
### New pushes or jobs not appearing
If new pushes or CI job results are not appearing in Treeherder's UI:
1. Follow the steps in [Celery queue backlogs](#celery-queue-backlogs) to rule out
task backlogs/Python exceptions.
2. Check the upstream Pulse queues [using Pulse Guardian] (you must be an co-owner of
the Treeherder queues to see them listed). If there is a Pulse queue backlog,
it suggests that Treeherder's `pulse_listener_{pushes,jobs}` workers have stopped
consuming Pulse events and a cloudOps team member will need to investigate if the
cause is infrastructure-related.
3. Failing that, it's possible the issue might lie in the services that send events to
those Pulse exchanges, such as `taskcluster-github` or
the Taskcluster systems upstream of those. Ask for help in the IRC channel
`#taskcluster`.
[using pulse guardian]: https://pulseguardian.mozilla.org/queues

Просмотреть файл

@ -159,7 +159,9 @@ for alias in DATABASES:
'init_command': "SET sql_mode='STRICT_TRANS_TABLES'",
}
if connection_should_use_tls(DATABASES[alias]['HOST']):
# Use TLS when connecting to RDS.
# The default cert is for access to the stage replica; for accessing
# prototype this variable will need to reference deployment/gcp/ca-cert-prototype.pem.
# See treeherder docs for more details.
DATABASES[alias]['OPTIONS']['ssl'] = {
'ca': env("TLS_CERT_PATH", default='deployment/gcp/ca-cert.pem'),
}