treeherder/misc/compare_pushes.py

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

78 строки
2.7 KiB
Python
Исходник Обычный вид История

Bug 1395254 - Consume Taskcluster Pulse messages from standard queue exchanges Currently, Treeherder consumes Pulse messages from an intermediary service called `taskcluster-treeherder`. Such service needs to be shut down and its functionality imported into Treeherder. In order to do this we need to switch to the standard Taskcluster exchanges as defined in here: https://docs.taskcluster.net/docs/reference/platform/queue/exchanges On a first pass we are only including the code from `taskcluster-treeherder` without changing much of Treeherder's code. The code is translated from Javascript to Python and only some minor code changes were done to reduce the difficulty on porting the code without introducing bugs. Internally, on this first pass, we will still have an intermediary data structure representing what `taskcluster-treeherder` is emitting, however, we will stop consuming the messages from it and be able to shut it down. Instead of consuming from one single exchange we will be consuming multiple ones. Each one representing a different kind of task (e.g. pending vs running). In order to test this change you need to open 5 terminal windows and follow these steps: * On the first window run `docker-compose up` * On the next three windows `export PULSE_URL="amqp://foo:bar@pulse.mozilla.org:5671/?ssl=1"` and run the following commands: * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_jobs` * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_tasks` * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_pushes` * On the last window run `docker-compose run backend celery -A treeherder worker -B --concurrency 5` * Open on your browser `http://localhost:5000` This is just a summary from [the docs](https://treeherder.readthedocs.io/pulseload.html). = ETL management commands = This change also introduces two ETL management command that can be executed like this: == Ingest push and tasks == This script can ingest into Treeherder all tasks associated to a push. It uses Python's asyncio to speed up the ingestion of tasks. ```bash ./manage.py ingest_push_and_tasks ``` == Update Pulse test fixtures == ```bash ./manage.py update_pulse_test_fixtures ``` This command will read 100 Taskcluster Pulse messages, process them and store them as test fixtures under these two files: `tests/sample_data/pulse_consumer/taskcluster_{jobs,metadata}.json` Following this work would be to get rid of the intermediary job representation ([bug 1560596](https://bugzilla.mozilla.org/show_bug.cgi?id=1560596) which will clean up some of the code and some of the old tests. = Extra script = Script that permits comparing pushes from two different Treeherder instances. ``` usage: Compare a push from a Treeherder instance to the production instance. [-h] [--host HOST] --revision REVISION [--project PROJECT] optional arguments: -h, --help show this help message and exit --host HOST Host to compare. It defaults to localhost --revision REVISION Revision to compare --project PROJECT Project to compare. It defaults to mozilla-central ``` = Other changes = Other changes included: * Import `taskcluster-treeherder`'s validation to ensure we're not fed garbage. * Change `yaml.load(f)` for `yaml.load(f, Loader=yaml.FullLoader)`. Read [this](https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation) for details * Introduce `taskcluster` and `taskcluster-urls` as dependencies * The test `test_retry_missing_revision_never_succeeds` makes no sense because we make Json validation on the Pulse message
2019-06-06 17:24:32 +03:00
#!/usr/bin/env python
""" Script to compare pushes from a Treeherder instance against production.
This is useful to compare if pushes between two different instances have been
ingested differently.
"""
Bug 1395254 - Consume Taskcluster Pulse messages from standard queue exchanges Currently, Treeherder consumes Pulse messages from an intermediary service called `taskcluster-treeherder`. Such service needs to be shut down and its functionality imported into Treeherder. In order to do this we need to switch to the standard Taskcluster exchanges as defined in here: https://docs.taskcluster.net/docs/reference/platform/queue/exchanges On a first pass we are only including the code from `taskcluster-treeherder` without changing much of Treeherder's code. The code is translated from Javascript to Python and only some minor code changes were done to reduce the difficulty on porting the code without introducing bugs. Internally, on this first pass, we will still have an intermediary data structure representing what `taskcluster-treeherder` is emitting, however, we will stop consuming the messages from it and be able to shut it down. Instead of consuming from one single exchange we will be consuming multiple ones. Each one representing a different kind of task (e.g. pending vs running). In order to test this change you need to open 5 terminal windows and follow these steps: * On the first window run `docker-compose up` * On the next three windows `export PULSE_URL="amqp://foo:bar@pulse.mozilla.org:5671/?ssl=1"` and run the following commands: * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_jobs` * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_tasks` * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_pushes` * On the last window run `docker-compose run backend celery -A treeherder worker -B --concurrency 5` * Open on your browser `http://localhost:5000` This is just a summary from [the docs](https://treeherder.readthedocs.io/pulseload.html). = ETL management commands = This change also introduces two ETL management command that can be executed like this: == Ingest push and tasks == This script can ingest into Treeherder all tasks associated to a push. It uses Python's asyncio to speed up the ingestion of tasks. ```bash ./manage.py ingest_push_and_tasks ``` == Update Pulse test fixtures == ```bash ./manage.py update_pulse_test_fixtures ``` This command will read 100 Taskcluster Pulse messages, process them and store them as test fixtures under these two files: `tests/sample_data/pulse_consumer/taskcluster_{jobs,metadata}.json` Following this work would be to get rid of the intermediary job representation ([bug 1560596](https://bugzilla.mozilla.org/show_bug.cgi?id=1560596) which will clean up some of the code and some of the old tests. = Extra script = Script that permits comparing pushes from two different Treeherder instances. ``` usage: Compare a push from a Treeherder instance to the production instance. [-h] [--host HOST] --revision REVISION [--project PROJECT] optional arguments: -h, --help show this help message and exit --host HOST Host to compare. It defaults to localhost --revision REVISION Revision to compare --project PROJECT Project to compare. It defaults to mozilla-central ``` = Other changes = Other changes included: * Import `taskcluster-treeherder`'s validation to ensure we're not fed garbage. * Change `yaml.load(f)` for `yaml.load(f, Loader=yaml.FullLoader)`. Read [this](https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation) for details * Introduce `taskcluster` and `taskcluster-urls` as dependencies * The test `test_retry_missing_revision_never_succeeds` makes no sense because we make Json validation on the Pulse message
2019-06-06 17:24:32 +03:00
import argparse
import logging
from deepdiff import DeepDiff
from thclient import TreeherderClient
logging.basicConfig()
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
Bug 1395254 - Consume Taskcluster Pulse messages from standard queue exchanges Currently, Treeherder consumes Pulse messages from an intermediary service called `taskcluster-treeherder`. Such service needs to be shut down and its functionality imported into Treeherder. In order to do this we need to switch to the standard Taskcluster exchanges as defined in here: https://docs.taskcluster.net/docs/reference/platform/queue/exchanges On a first pass we are only including the code from `taskcluster-treeherder` without changing much of Treeherder's code. The code is translated from Javascript to Python and only some minor code changes were done to reduce the difficulty on porting the code without introducing bugs. Internally, on this first pass, we will still have an intermediary data structure representing what `taskcluster-treeherder` is emitting, however, we will stop consuming the messages from it and be able to shut it down. Instead of consuming from one single exchange we will be consuming multiple ones. Each one representing a different kind of task (e.g. pending vs running). In order to test this change you need to open 5 terminal windows and follow these steps: * On the first window run `docker-compose up` * On the next three windows `export PULSE_URL="amqp://foo:bar@pulse.mozilla.org:5671/?ssl=1"` and run the following commands: * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_jobs` * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_tasks` * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_pushes` * On the last window run `docker-compose run backend celery -A treeherder worker -B --concurrency 5` * Open on your browser `http://localhost:5000` This is just a summary from [the docs](https://treeherder.readthedocs.io/pulseload.html). = ETL management commands = This change also introduces two ETL management command that can be executed like this: == Ingest push and tasks == This script can ingest into Treeherder all tasks associated to a push. It uses Python's asyncio to speed up the ingestion of tasks. ```bash ./manage.py ingest_push_and_tasks ``` == Update Pulse test fixtures == ```bash ./manage.py update_pulse_test_fixtures ``` This command will read 100 Taskcluster Pulse messages, process them and store them as test fixtures under these two files: `tests/sample_data/pulse_consumer/taskcluster_{jobs,metadata}.json` Following this work would be to get rid of the intermediary job representation ([bug 1560596](https://bugzilla.mozilla.org/show_bug.cgi?id=1560596) which will clean up some of the code and some of the old tests. = Extra script = Script that permits comparing pushes from two different Treeherder instances. ``` usage: Compare a push from a Treeherder instance to the production instance. [-h] [--host HOST] --revision REVISION [--project PROJECT] optional arguments: -h, --help show this help message and exit --host HOST Host to compare. It defaults to localhost --revision REVISION Revision to compare --project PROJECT Project to compare. It defaults to mozilla-central ``` = Other changes = Other changes included: * Import `taskcluster-treeherder`'s validation to ensure we're not fed garbage. * Change `yaml.load(f)` for `yaml.load(f, Loader=yaml.FullLoader)`. Read [this](https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation) for details * Introduce `taskcluster` and `taskcluster-urls` as dependencies * The test `test_retry_missing_revision_never_succeeds` makes no sense because we make Json validation on the Pulse message
2019-06-06 17:24:32 +03:00
HOSTS = {
"localhost": "http://localhost:8000",
"stage": "https://treeherder.allizom.org",
"production": "https://treeherder.mozilla.org",
Bug 1395254 - Consume Taskcluster Pulse messages from standard queue exchanges Currently, Treeherder consumes Pulse messages from an intermediary service called `taskcluster-treeherder`. Such service needs to be shut down and its functionality imported into Treeherder. In order to do this we need to switch to the standard Taskcluster exchanges as defined in here: https://docs.taskcluster.net/docs/reference/platform/queue/exchanges On a first pass we are only including the code from `taskcluster-treeherder` without changing much of Treeherder's code. The code is translated from Javascript to Python and only some minor code changes were done to reduce the difficulty on porting the code without introducing bugs. Internally, on this first pass, we will still have an intermediary data structure representing what `taskcluster-treeherder` is emitting, however, we will stop consuming the messages from it and be able to shut it down. Instead of consuming from one single exchange we will be consuming multiple ones. Each one representing a different kind of task (e.g. pending vs running). In order to test this change you need to open 5 terminal windows and follow these steps: * On the first window run `docker-compose up` * On the next three windows `export PULSE_URL="amqp://foo:bar@pulse.mozilla.org:5671/?ssl=1"` and run the following commands: * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_jobs` * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_tasks` * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_pushes` * On the last window run `docker-compose run backend celery -A treeherder worker -B --concurrency 5` * Open on your browser `http://localhost:5000` This is just a summary from [the docs](https://treeherder.readthedocs.io/pulseload.html). = ETL management commands = This change also introduces two ETL management command that can be executed like this: == Ingest push and tasks == This script can ingest into Treeherder all tasks associated to a push. It uses Python's asyncio to speed up the ingestion of tasks. ```bash ./manage.py ingest_push_and_tasks ``` == Update Pulse test fixtures == ```bash ./manage.py update_pulse_test_fixtures ``` This command will read 100 Taskcluster Pulse messages, process them and store them as test fixtures under these two files: `tests/sample_data/pulse_consumer/taskcluster_{jobs,metadata}.json` Following this work would be to get rid of the intermediary job representation ([bug 1560596](https://bugzilla.mozilla.org/show_bug.cgi?id=1560596) which will clean up some of the code and some of the old tests. = Extra script = Script that permits comparing pushes from two different Treeherder instances. ``` usage: Compare a push from a Treeherder instance to the production instance. [-h] [--host HOST] --revision REVISION [--project PROJECT] optional arguments: -h, --help show this help message and exit --host HOST Host to compare. It defaults to localhost --revision REVISION Revision to compare --project PROJECT Project to compare. It defaults to mozilla-central ``` = Other changes = Other changes included: * Import `taskcluster-treeherder`'s validation to ensure we're not fed garbage. * Change `yaml.load(f)` for `yaml.load(f, Loader=yaml.FullLoader)`. Read [this](https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation) for details * Introduce `taskcluster` and `taskcluster-urls` as dependencies * The test `test_retry_missing_revision_never_succeeds` makes no sense because we make Json validation on the Pulse message
2019-06-06 17:24:32 +03:00
}
def main(args):
compare_to_client = TreeherderClient(server_url=HOSTS[args.host])
production_client = TreeherderClient(server_url=HOSTS["production"])
# Support comma separated projects
projects = args.projects.split(",")
for _project in projects:
logger.info(f"Comparing {_project} against production.")
# Remove properties that are irrelevant for the comparison
pushes = compare_to_client.get_pushes(_project, count=50)
for _push in sorted(pushes, key=lambda push: push["revision"]):
del _push["id"]
for _rev in _push["revisions"]:
del _rev["result_set_id"]
production_pushes = production_client.get_pushes(_project, count=50)
for _push in sorted(production_pushes, key=lambda push: push["revision"]):
del _push["id"]
for _rev in _push["revisions"]:
del _rev["result_set_id"]
for index in range(0, len(pushes)):
assert pushes[index]["revision"] == production_pushes[index]["revision"]
difference = DeepDiff(pushes[index], production_pushes[index])
if difference:
logger.info(difference.to_json())
logger.info(
"{}/#/jobs?repo={}&revision={}".format(
compare_to_client.server_url, _project, pushes[index]["revision"]
)
)
logger.info(
"{}/#/jobs?repo={}&revision={}".format(
production_client.server_url, _project, production_pushes[index]["revision"]
)
)
def get_args():
parser = argparse.ArgumentParser(
"Compare a push from a Treeherder instance to the production instance."
)
parser.add_argument("--host", default="stage", help="Host to compare. It defaults to stage")
parser.add_argument(
"--projects",
default="android-components,fenix",
help="Projects (comma separated) to compare. It defaults to android-components & fenix",
)
Bug 1395254 - Consume Taskcluster Pulse messages from standard queue exchanges Currently, Treeherder consumes Pulse messages from an intermediary service called `taskcluster-treeherder`. Such service needs to be shut down and its functionality imported into Treeherder. In order to do this we need to switch to the standard Taskcluster exchanges as defined in here: https://docs.taskcluster.net/docs/reference/platform/queue/exchanges On a first pass we are only including the code from `taskcluster-treeherder` without changing much of Treeherder's code. The code is translated from Javascript to Python and only some minor code changes were done to reduce the difficulty on porting the code without introducing bugs. Internally, on this first pass, we will still have an intermediary data structure representing what `taskcluster-treeherder` is emitting, however, we will stop consuming the messages from it and be able to shut it down. Instead of consuming from one single exchange we will be consuming multiple ones. Each one representing a different kind of task (e.g. pending vs running). In order to test this change you need to open 5 terminal windows and follow these steps: * On the first window run `docker-compose up` * On the next three windows `export PULSE_URL="amqp://foo:bar@pulse.mozilla.org:5671/?ssl=1"` and run the following commands: * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_jobs` * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_tasks` * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_pushes` * On the last window run `docker-compose run backend celery -A treeherder worker -B --concurrency 5` * Open on your browser `http://localhost:5000` This is just a summary from [the docs](https://treeherder.readthedocs.io/pulseload.html). = ETL management commands = This change also introduces two ETL management command that can be executed like this: == Ingest push and tasks == This script can ingest into Treeherder all tasks associated to a push. It uses Python's asyncio to speed up the ingestion of tasks. ```bash ./manage.py ingest_push_and_tasks ``` == Update Pulse test fixtures == ```bash ./manage.py update_pulse_test_fixtures ``` This command will read 100 Taskcluster Pulse messages, process them and store them as test fixtures under these two files: `tests/sample_data/pulse_consumer/taskcluster_{jobs,metadata}.json` Following this work would be to get rid of the intermediary job representation ([bug 1560596](https://bugzilla.mozilla.org/show_bug.cgi?id=1560596) which will clean up some of the code and some of the old tests. = Extra script = Script that permits comparing pushes from two different Treeherder instances. ``` usage: Compare a push from a Treeherder instance to the production instance. [-h] [--host HOST] --revision REVISION [--project PROJECT] optional arguments: -h, --help show this help message and exit --host HOST Host to compare. It defaults to localhost --revision REVISION Revision to compare --project PROJECT Project to compare. It defaults to mozilla-central ``` = Other changes = Other changes included: * Import `taskcluster-treeherder`'s validation to ensure we're not fed garbage. * Change `yaml.load(f)` for `yaml.load(f, Loader=yaml.FullLoader)`. Read [this](https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation) for details * Introduce `taskcluster` and `taskcluster-urls` as dependencies * The test `test_retry_missing_revision_never_succeeds` makes no sense because we make Json validation on the Pulse message
2019-06-06 17:24:32 +03:00
args = parser.parse_args()
return args
Bug 1395254 - Consume Taskcluster Pulse messages from standard queue exchanges Currently, Treeherder consumes Pulse messages from an intermediary service called `taskcluster-treeherder`. Such service needs to be shut down and its functionality imported into Treeherder. In order to do this we need to switch to the standard Taskcluster exchanges as defined in here: https://docs.taskcluster.net/docs/reference/platform/queue/exchanges On a first pass we are only including the code from `taskcluster-treeherder` without changing much of Treeherder's code. The code is translated from Javascript to Python and only some minor code changes were done to reduce the difficulty on porting the code without introducing bugs. Internally, on this first pass, we will still have an intermediary data structure representing what `taskcluster-treeherder` is emitting, however, we will stop consuming the messages from it and be able to shut it down. Instead of consuming from one single exchange we will be consuming multiple ones. Each one representing a different kind of task (e.g. pending vs running). In order to test this change you need to open 5 terminal windows and follow these steps: * On the first window run `docker-compose up` * On the next three windows `export PULSE_URL="amqp://foo:bar@pulse.mozilla.org:5671/?ssl=1"` and run the following commands: * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_jobs` * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_tasks` * `docker-compose run -e PULSE_URL backend ./manage.py pulse_listener_pushes` * On the last window run `docker-compose run backend celery -A treeherder worker -B --concurrency 5` * Open on your browser `http://localhost:5000` This is just a summary from [the docs](https://treeherder.readthedocs.io/pulseload.html). = ETL management commands = This change also introduces two ETL management command that can be executed like this: == Ingest push and tasks == This script can ingest into Treeherder all tasks associated to a push. It uses Python's asyncio to speed up the ingestion of tasks. ```bash ./manage.py ingest_push_and_tasks ``` == Update Pulse test fixtures == ```bash ./manage.py update_pulse_test_fixtures ``` This command will read 100 Taskcluster Pulse messages, process them and store them as test fixtures under these two files: `tests/sample_data/pulse_consumer/taskcluster_{jobs,metadata}.json` Following this work would be to get rid of the intermediary job representation ([bug 1560596](https://bugzilla.mozilla.org/show_bug.cgi?id=1560596) which will clean up some of the code and some of the old tests. = Extra script = Script that permits comparing pushes from two different Treeherder instances. ``` usage: Compare a push from a Treeherder instance to the production instance. [-h] [--host HOST] --revision REVISION [--project PROJECT] optional arguments: -h, --help show this help message and exit --host HOST Host to compare. It defaults to localhost --revision REVISION Revision to compare --project PROJECT Project to compare. It defaults to mozilla-central ``` = Other changes = Other changes included: * Import `taskcluster-treeherder`'s validation to ensure we're not fed garbage. * Change `yaml.load(f)` for `yaml.load(f, Loader=yaml.FullLoader)`. Read [this](https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation) for details * Introduce `taskcluster` and `taskcluster-urls` as dependencies * The test `test_retry_missing_revision_never_succeeds` makes no sense because we make Json validation on the Pulse message
2019-06-06 17:24:32 +03:00
if __name__ == "__main__":
main(get_args())