Boris Feld
afd67402e2
Fix copy-paste typo with the new indexing schema ( #801 )
2019-07-28 20:38:05 +02:00
Boris Feld
a43ad03b2a
Add a new indexing schema for training tasks ( #795 )
...
In order to efficiently solve #614 , we need a new indexing schema
so getting all metrics following a given date is easy.
2019-07-26 18:28:04 +02:00
Marco Castelluccio
a614d34735
Move download of bugs linked to commits in the bug-retriever script
...
Also, make the bug-retriever task depend on the commit-retriever one, making the
download of bugs linked to commits actually work :)
2019-07-25 01:05:25 +02:00
Marco Castelluccio
66367584cd
Revert "Enable feature importance calculation for the defect/enhancement/task model"
...
This reverts commit d9cdcdc238
.
It's running out of memory on releng-svc-compute workers (c5.4xlarge), so we need to temporarily disable it.
2019-07-15 15:49:28 +02:00
Anurag Aggarwal
656d6e844b
Remove bugs_retrieval image and use the base image instead in its place ( #691 )
...
* Fixes #633
2019-07-12 14:17:41 +02:00
Marco Castelluccio
d9cdcdc238
Enable feature importance calculation for the defect/enhancement/task model
2019-07-11 20:44:07 +02:00
Marco Castelluccio
17b027c767
Enable feature importance calculation at training time for the regressor model
2019-07-10 16:25:38 +02:00
Boris Feld
e7add98563
Update task-boot to 0.1.9 ( #675 )
2019-07-05 15:36:16 +02:00
Marco Castelluccio
6ce18762de
'payload.command' should be an array
2019-07-02 13:26:46 +02:00
Marco Castelluccio
d12a25f644
Upload feature visualization image as an artifact of the training tasks
2019-07-01 13:10:39 +02:00
Boris Feld
7459f79317
Use the base image for training models ( #656 )
...
Fixes #350
2019-06-29 00:01:51 +02:00
Boris Feld
d24993d0ac
Remove dependency on rollbacktest in docker build. ( #653 )
...
Fixes #651
2019-06-28 15:32:39 +02:00
Boris Feld
54e41d1497
Use taskboot 0.1.8 ( #645 )
...
The new taskboot release solves the double build on non-tag commits and
allows the heroku deploy to be fully atomic.
2019-06-28 11:11:48 +02:00
x249wang
ab28e8ace2
Use zstandard instead of xz ( #524 )
...
Fixes #461 .
2019-06-24 13:16:44 +02:00
Boris Feld
9834053a36
Start tracking training metrics as Taskcluster artifacts ( #604 )
...
Fixes #342
2019-06-22 14:18:08 -07:00
Boris Feld
27f9104fb5
Make sure the Docker build task uses the tagged code ( #610 )
...
If not, new master code might get released and conflict with the code in the
bugbug images.
Fixes #609
2019-06-21 08:20:08 -07:00
Boris Feld
c06db28442
Bump taskboot to version 1.0.7 ( #583 )
...
Now that https://github.com/mozilla/task-boot/issues/39 is fixed, let's update
task-boot version to use it.
Also add missing tags and cache option when building Docker images in
data-pipeline.yml
2019-06-12 20:11:34 +02:00
Marco Castelluccio
89b37b96ae
Upload version file too in the bugs retrieval task
2019-06-09 00:13:20 +02:00
Marco Castelluccio
353d21d01b
Clone repository quietly
2019-06-08 11:19:01 +02:00
Marco Castelluccio
4a991ac6ef
Fix download of bugs DB in the rollback test
2019-06-08 11:17:15 +02:00
Marco Castelluccio
9de91456f6
Update to taskboot 0.1.6
2019-06-07 22:03:00 +02:00
Boris Feld
a8faa48d8a
Support classifying batches of bugs with a background worker ( #321 )
2019-06-07 21:22:14 +02:00
Marco Castelluccio
82d9c0ece0
Update to taskboot 0.1.5
2019-06-07 16:47:28 +02:00
Boris Feld
2e05e57be2
Build docker images data pipeline tag ( #566 )
...
* Build the HTTP Docker image with the right tag
* Ensure the builded docker image has the right parent image
2019-06-07 16:46:05 +02:00
Boris Feld
2988700028
Use tagged index urls for pushing artifacts ( #561 )
...
* Use tagged index urls for pushing artifacts
Also replace previous code that updated Docker image tag to use JSON-e
templating instead.
2019-06-07 12:52:29 +02:00
Boris Feld
7906380e6f
Bump version of taskboot to use latest version of img tool ( #562 )
...
It is necessary to support mulit-tag Docker image building
2019-06-07 12:21:09 +02:00
Marco Castelluccio
44e26ff0e8
Add a training task for the Regressor model
2019-06-03 22:15:18 +02:00
Marco Castelluccio
4ce438a35a
Fix typo in artifact name for the commits retrieval task
2019-06-03 21:37:39 +02:00
Marco
d8b84ca798
Support retrieving commits in steps ( #536 )
...
* Support retrieving commits in steps
* Store component mapping ETag to actually avoid downloading it again when not needed
* Store a version file alongside the DBs
* Export the commits DB version file and the experiences values as artifacts of the commit-retriever task
2019-06-03 19:29:08 +02:00
Marco Castelluccio
e62dd6f37d
Make rollback-test task verbose
2019-06-03 11:06:32 +02:00
Ayush Shridhar
9d71677667
Add a training task for the Duplicate model ( #525 )
2019-05-31 17:05:58 +02:00
Marco Castelluccio
bd3e4c7900
Increase the maximum runtime for the commits retrieval task
2019-05-30 13:27:23 +02:00
Marco Castelluccio
42d2ff2db8
Add a training task for the Backout model
2019-05-30 13:27:06 +02:00
Boris Feld
6ee9fb57f0
Fix Docker build by downloading the models inside the image. Fix #504 ( #516 )
...
The data pipeline failed before because it tried downloading the model from
outside the Docker image and didn't had bugbug installed.
The clean way of solving this would be to build a base http service image on
release and build another one where we simply download the models but let's
fix it this way for now.
2019-05-29 20:43:58 +02:00
Boris Feld
1bae5834ab
Implement deployment to Heroku ( #458 )
2019-05-23 20:39:02 +02:00
Ayush Shridhar
b41170baa5
Add training task for the StepsToReproduce model ( #441 )
2019-05-22 21:43:11 +02:00
Ayush Shridhar
91bf939fb7
Add training task for the RegressionRange model ( #466 )
2019-05-22 18:58:47 +02:00
Boris Feld
d3c3bcbece
Bump version of taskboot used in taskcluster and data pipeline ( #446 )
2019-05-16 13:02:58 +02:00
Marco Castelluccio
ff9ea35ed0
Reduce deadlines to maximum of 5 days
...
Taskcluster only allows up to 5 days
2019-05-14 20:39:00 +02:00
Marco
9223954520
Remove training tasks' unneeded dependencies on commit retrieval task ( #407 )
...
Fixes #390
2019-05-14 15:22:44 +02:00
Marco
c4bd01278e
Add 'expires' to all tasks to avoid them expiring in a too long time ( #393 )
...
Fixes #391 .
2019-05-12 21:46:58 +02:00
Marco
e3230ca999
Increase deadline of data pipeline tasks ( #389 )
...
Fixes #388 .
2019-05-10 16:12:46 +02:00
Marco
6f09488573
Rename mozilla/bugbug-train-defect image to mozilla/bugbug-train-defectenhancementtask ( #375 )
...
Fixes #364 .
2019-05-09 23:36:38 +02:00
Marco Castelluccio
c3f55e682a
Rename train-defect to train-defectenhancementtask
2019-05-07 13:16:22 +02:00
Marco Castelluccio
2eaf90be20
Add a cache to the commit retrieval task
...
Fixes #347
2019-05-07 11:38:02 +02:00
Boris Feld
6937e0e5e8
Add the rollback test in the data pipeline ( #337 )
...
Add the rollback test in the data pipeline and move the bug snapshot test to a pytest test
2019-05-03 14:20:43 +02:00
Marco
9995b8c236
Make training code more generic to make it possible to train on other kinds of objects (e.g. commits) ( #335 )
...
* Move feature cleanup functions in a separate module
As they can be shared for different objectives, e.g. both training on bugs and on commits.
* Make Model more generic to make it possible to train on different objects
Introduce BugModel and CommitModel, as base classes for models training on bugs and on commits.
Update all models to use BugModel and to use the new feature_cleanup module.
Fixes #306 .
* Update ID and description of the defect/enhancement/task Taskcluster task definition
* Add a module to extract features from commit data
* Add an example model training on commits to predict commits which will be backed out
* Update defect model name, and add possibility to train backout model
2019-05-03 11:57:48 +02:00
Boris Feld
297963e4ce
Skip checking models while building the http service image, and only push it as part of the pipeline ( #331 )
...
* Add a way to skip checking models while building the http service image
* Don't push the http service on release
It isn't built with the real models on release
* Use taskboot 0.1.1
2019-05-02 23:18:51 +02:00
Boris Feld
369b44ea02
Update the index URLs in bugbug ( #328 )
...
* Update the index URLs in bugbug
* Split the http service Docker image in two
This way we can both:
- Build the first half (code + dependencies) in the usual CI.
- Build the second half at the end of the data pipeline with updated models.
Taskboot build-compose doesn't support building all services except a
specific one and it might be cumbersome to add this feature so move the second
half of the Docker image to a separate docker-compose file.
2019-05-02 17:00:32 +02:00
Boris Feld
6e7ca892cd
Introduce a new Docker image for data-pipeline spawning ( #320 )
2019-05-02 14:36:50 +02:00