Граф коммитов

131 Коммитов

Автор SHA1 Сообщение Дата
Marco Castelluccio 65bf1b4604 Only run integration test after data retrieval and training tasks are done 2019-11-02 17:49:36 +01:00
Marco Castelluccio 4b48bccab5 Make apt-get be quiet in the integration test 2019-11-02 17:08:18 +01:00
Boris Feld 807ecaca85 Misc fixes to enable integration tests at release time (#987)
Fixes #985 and fixes #329
2019-10-24 20:09:32 +02:00
Marco Castelluccio a8866bb562 GDBM doesn't add '.db' at the end of the path
I had tested locally with NDBM, which adds it
2019-10-23 12:01:46 +01:00
Marco Castelluccio 4a642f215f Add the past failures support DB to the artifacts list of the test scheduling history retrieval task 2019-10-22 17:46:15 +01:00
Marco Castelluccio f01badfb11 Add the version file of the test scheduling history DB to the artifacts list of the retrieval task 2019-10-22 17:46:15 +01:00
Marco Castelluccio 898d911013 Fix path in Taskcluster worker to the test scheduling history DB 2019-10-22 17:46:15 +01:00
Marco Castelluccio 0cfacecb57 Fix push_data.json.zst artifact path 2019-10-20 14:04:20 +01:00
Marco Castelluccio 940e97cdcf Be quiet when installing bugbug package in the test scheduling history push data retrieval task 2019-10-19 21:22:42 +01:00
Marco Castelluccio 86a6d0a6b9 Fix dependency name
Regressed by dc3c3b83da
2019-10-18 14:20:57 +01:00
Marco Castelluccio 5713425500 Use relman-svc compute for the ADR task
Since the tasks were split with dc3c3b83da,
the ADR task is not bounded by performance yet.
2019-10-18 13:38:13 +01:00
Marco Castelluccio dc3c3b83da Split test scheduling history retriever task into two 2019-10-18 13:33:53 +01:00
Marco Castelluccio 7f8e08c20d Add a task to train the test selection model 2019-10-12 17:31:28 +01:00
Marco Castelluccio 2cfd8fc01a Try using relman-svc-compute for the test scheduling history retrieval task 2019-10-10 18:52:08 +01:00
Marco Castelluccio 6ace4d78bc Use relman-svc-memory for the test scheduling history retriever task 2019-10-10 11:08:45 +01:00
Marco Castelluccio 251c2712ea Train a more interpretable regressor model 2019-09-30 15:22:02 +02:00
Marco Castelluccio 16e36cb54b Fix similarity model path
It's 25, not 52...

Fixes #939
2019-09-28 18:15:28 +02:00
Marco Castelluccio 2add7ecc21 Temporarily disable integration test
Until #985 is fixed
2019-09-27 15:33:42 +02:00
Boris Feld 5aa036c06d Ensure the integration tests are green before deploying a new HTTP service (#979)
Fixes #949
2019-09-26 15:20:28 +02:00
Marco Castelluccio 0412d894de Offer dataset files from the regressor model as artifacts 2019-09-25 16:56:52 +02:00
Marco Castelluccio 2054e93a1c Generate a DB of past test runs, with their failure history
Also move the artifacts to be in relative directories rather than absolute
2019-09-18 19:58:13 +02:00
Marco Castelluccio 53603d4a4b Don't use bash l option 2019-09-12 14:17:20 +02:00
Marco Castelluccio 7e084dd91a The script is not downloaded in scripts/ 2019-09-12 09:53:23 +02:00
Marco Castelluccio 833c56bb7b Get the raw file from GitHub, not the HTML view 2019-09-12 01:45:46 +02:00
Marco Castelluccio 2492ed58b4 Add a task to retrieve test scheduling history 2019-09-11 21:16:19 +02:00
Marco d65ba69ff3
Add a Dockerfile for tools using bugbug nlp stuff (#934)
* Add a Dockerfile for tools using bugbug nlp stuff

* Use the bugbug-base-nlp image for the similarity training task

Fixes #933
2019-09-07 00:45:47 +02:00
Boris Feld 1b4c47407b Bump training tasks timeout (#932)
We want to keep tasks and metrics artifacts around so we can monitor their
evolution but we don't want to keep the models for a too long period of time
to reduce storage usage.
2019-09-05 11:54:15 +02:00
Ayush Shridhar 59ed555325 Add a task to train a similarity model (the BM25 one) (#874) 2019-09-04 14:38:33 +02:00
Marco Castelluccio cc21b76c51 Use relman-svc instead of releng-svc
Also fix typo in 'svc'
2019-08-09 15:57:17 +02:00
Boris Feld 8b4cfd2dc4 Check metrics evolution (#836)
Fixes #360 and fixes #641.
2019-08-05 10:22:55 +02:00
Marco 8aac03002a
Use relman-* workers instead of releng-svc (#842)
Fixes #324
2019-08-03 00:40:38 +02:00
Boris Feld afd67402e2 Fix copy-paste typo with the new indexing schema (#801) 2019-07-28 20:38:05 +02:00
Boris Feld a43ad03b2a Add a new indexing schema for training tasks (#795)
In order to efficiently solve #614, we need a new indexing schema
so getting all metrics following a given date is easy.
2019-07-26 18:28:04 +02:00
Marco Castelluccio a614d34735 Move download of bugs linked to commits in the bug-retriever script
Also, make the bug-retriever task depend on the commit-retriever one, making the
download of bugs linked to commits actually work :)
2019-07-25 01:05:25 +02:00
Marco Castelluccio 66367584cd Revert "Enable feature importance calculation for the defect/enhancement/task model"
This reverts commit d9cdcdc238.

It's running out of memory on releng-svc-compute workers (c5.4xlarge), so we need to temporarily disable it.
2019-07-15 15:49:28 +02:00
Anurag Aggarwal 656d6e844b Remove bugs_retrieval image and use the base image instead in its place (#691)
* Fixes #633
2019-07-12 14:17:41 +02:00
Marco Castelluccio d9cdcdc238 Enable feature importance calculation for the defect/enhancement/task model 2019-07-11 20:44:07 +02:00
Marco Castelluccio 17b027c767 Enable feature importance calculation at training time for the regressor model 2019-07-10 16:25:38 +02:00
Boris Feld e7add98563 Update task-boot to 0.1.9 (#675) 2019-07-05 15:36:16 +02:00
Marco Castelluccio 6ce18762de 'payload.command' should be an array 2019-07-02 13:26:46 +02:00
Marco Castelluccio d12a25f644 Upload feature visualization image as an artifact of the training tasks 2019-07-01 13:10:39 +02:00
Boris Feld 7459f79317 Use the base image for training models (#656)
Fixes #350
2019-06-29 00:01:51 +02:00
Boris Feld d24993d0ac Remove dependency on rollbacktest in docker build. (#653)
Fixes #651
2019-06-28 15:32:39 +02:00
Boris Feld 54e41d1497 Use taskboot 0.1.8 (#645)
The new taskboot release solves the double build on non-tag commits and
allows the heroku deploy to be fully atomic.
2019-06-28 11:11:48 +02:00
x249wang ab28e8ace2 Use zstandard instead of xz (#524)
Fixes #461.
2019-06-24 13:16:44 +02:00
Boris Feld 9834053a36 Start tracking training metrics as Taskcluster artifacts (#604)
Fixes #342
2019-06-22 14:18:08 -07:00
Boris Feld 27f9104fb5 Make sure the Docker build task uses the tagged code (#610)
If not, new master code might get released and conflict with the code in the
bugbug images.

 Fixes #609
2019-06-21 08:20:08 -07:00
Boris Feld c06db28442 Bump taskboot to version 1.0.7 (#583)
Now that https://github.com/mozilla/task-boot/issues/39 is fixed, let's update
task-boot version to use it.

Also add missing tags and cache option when building Docker images in
data-pipeline.yml
2019-06-12 20:11:34 +02:00
Marco Castelluccio 89b37b96ae Upload version file too in the bugs retrieval task 2019-06-09 00:13:20 +02:00
Marco Castelluccio 353d21d01b Clone repository quietly 2019-06-08 11:19:01 +02:00
Marco Castelluccio 4a991ac6ef Fix download of bugs DB in the rollback test 2019-06-08 11:17:15 +02:00
Marco Castelluccio 9de91456f6 Update to taskboot 0.1.6 2019-06-07 22:03:00 +02:00
Boris Feld a8faa48d8a Support classifying batches of bugs with a background worker (#321) 2019-06-07 21:22:14 +02:00
Marco Castelluccio 82d9c0ece0 Update to taskboot 0.1.5 2019-06-07 16:47:28 +02:00
Boris Feld 2e05e57be2 Build docker images data pipeline tag (#566)
* Build the HTTP Docker image with the right tag

* Ensure the builded docker image has the right parent image
2019-06-07 16:46:05 +02:00
Boris Feld 2988700028 Use tagged index urls for pushing artifacts (#561)
* Use tagged index urls for pushing artifacts

Also replace previous code that updated Docker image tag to use JSON-e
templating instead.
2019-06-07 12:52:29 +02:00
Boris Feld 7906380e6f Bump version of taskboot to use latest version of img tool (#562)
It is necessary to support mulit-tag Docker image building
2019-06-07 12:21:09 +02:00
Marco Castelluccio 44e26ff0e8 Add a training task for the Regressor model 2019-06-03 22:15:18 +02:00
Marco Castelluccio 4ce438a35a Fix typo in artifact name for the commits retrieval task 2019-06-03 21:37:39 +02:00
Marco d8b84ca798
Support retrieving commits in steps (#536)
* Support retrieving commits in steps

* Store component mapping ETag to actually avoid downloading it again when not needed

* Store a version file alongside the DBs

* Export the commits DB version file and the experiences values as artifacts of the commit-retriever task
2019-06-03 19:29:08 +02:00
Marco Castelluccio e62dd6f37d Make rollback-test task verbose 2019-06-03 11:06:32 +02:00
Ayush Shridhar 9d71677667 Add a training task for the Duplicate model (#525) 2019-05-31 17:05:58 +02:00
Marco Castelluccio bd3e4c7900 Increase the maximum runtime for the commits retrieval task 2019-05-30 13:27:23 +02:00
Marco Castelluccio 42d2ff2db8 Add a training task for the Backout model 2019-05-30 13:27:06 +02:00
Boris Feld 6ee9fb57f0 Fix Docker build by downloading the models inside the image. Fix #504 (#516)
The data pipeline failed before because it tried downloading the model from
outside the Docker image and didn't had bugbug installed.

The clean way of solving this would be to build a base http service image on
release and build another one where we simply download the models but let's
fix it this way for now.
2019-05-29 20:43:58 +02:00
Boris Feld 1bae5834ab Implement deployment to Heroku (#458) 2019-05-23 20:39:02 +02:00
Ayush Shridhar b41170baa5 Add training task for the StepsToReproduce model (#441) 2019-05-22 21:43:11 +02:00
Ayush Shridhar 91bf939fb7 Add training task for the RegressionRange model (#466) 2019-05-22 18:58:47 +02:00
Boris Feld d3c3bcbece Bump version of taskboot used in taskcluster and data pipeline (#446) 2019-05-16 13:02:58 +02:00
Marco Castelluccio ff9ea35ed0 Reduce deadlines to maximum of 5 days
Taskcluster only allows up to 5 days
2019-05-14 20:39:00 +02:00
Marco 9223954520
Remove training tasks' unneeded dependencies on commit retrieval task (#407)
Fixes #390
2019-05-14 15:22:44 +02:00
Marco c4bd01278e
Add 'expires' to all tasks to avoid them expiring in a too long time (#393)
Fixes #391.
2019-05-12 21:46:58 +02:00
Marco e3230ca999
Increase deadline of data pipeline tasks (#389)
Fixes #388.
2019-05-10 16:12:46 +02:00
Marco 6f09488573
Rename mozilla/bugbug-train-defect image to mozilla/bugbug-train-defectenhancementtask (#375)
Fixes #364.
2019-05-09 23:36:38 +02:00
Marco Castelluccio c3f55e682a Rename train-defect to train-defectenhancementtask 2019-05-07 13:16:22 +02:00
Marco Castelluccio 2eaf90be20 Add a cache to the commit retrieval task
Fixes #347
2019-05-07 11:38:02 +02:00
Boris Feld 6937e0e5e8 Add the rollback test in the data pipeline (#337)
Add the rollback test in the data pipeline and move the bug snapshot test to a pytest test
2019-05-03 14:20:43 +02:00
Marco 9995b8c236
Make training code more generic to make it possible to train on other kinds of objects (e.g. commits) (#335)
* Move feature cleanup functions in a separate module

As they can be shared for different objectives, e.g. both training on bugs and on commits.

* Make Model more generic to make it possible to train on different objects

Introduce BugModel and CommitModel, as base classes for models training on bugs and on commits.

Update all models to use BugModel and to use the new feature_cleanup module.

Fixes #306.

* Update ID and description of the defect/enhancement/task Taskcluster task definition

* Add a module to extract features from commit data

* Add an example model training on commits to predict commits which will be backed out

* Update defect model name, and add possibility to train backout model
2019-05-03 11:57:48 +02:00
Boris Feld 297963e4ce Skip checking models while building the http service image, and only push it as part of the pipeline (#331)
* Add a way to skip checking models while building the http service image

* Don't push the http service on release

It isn't built with the real models on release

* Use taskboot 0.1.1
2019-05-02 23:18:51 +02:00
Boris Feld 369b44ea02 Update the index URLs in bugbug (#328)
* Update the index URLs in bugbug

* Split the http service Docker image in two

This way we can both:
- Build the first half (code + dependencies) in the usual CI.
- Build the second half at the end of the data pipeline with updated models.

Taskboot build-compose doesn't support building all services except a
specific one and it might be cumbersome to add this feature so move the second
half of the Docker image to a separate docker-compose file.
2019-05-02 17:00:32 +02:00
Boris Feld 6e7ca892cd Introduce a new Docker image for data-pipeline spawning (#320) 2019-05-02 14:36:50 +02:00