Граф коммитов

970 Коммитов

Автор SHA1 Сообщение Дата
Boris Feld 7906380e6f Bump version of taskboot to use latest version of img tool (#562)
It is necessary to support mulit-tag Docker image building
2019-06-07 12:21:09 +02:00
Ayush Shridhar 6e39b0a5a5 Change timedelta to 21 days in the script to generate Duplicate model results (#563) 2019-06-07 12:05:07 +02:00
Sladyn 860bb69c10 Add a basic test for the StepsToReproduce model (#503) 2019-06-07 11:11:39 +02:00
Boris Feld e0accae208 Move string formatting to f-string in spawn_data_pipeline (#559) 2019-06-07 11:04:33 +02:00
pyup.io bot c590278bff Update pyyaml from 5.1 to 5.1.1 (#560) 2019-06-07 10:56:26 +02:00
Marco Castelluccio f3caa72d54 Version 0.0.44 2019-06-06 19:15:37 +02:00
Boris Feld 5a31c99ac9 Add support for specific Docker tag in spawn_data_pipeline.py (#553)
* Revert "Revert "Add support for specific Docker tag in spawn_data_pipeline.py (#489)" (#499)"

This reverts commit 249ed40eb6.

* Ignore task with a tagged docker image

* Restrict Docker tag update to bugbug related images
2019-06-06 19:14:27 +02:00
Marco ee935d6e5b
Download previous commits DB and experiences, and only mine data for new commits landed since then (#546)
* Download previous commits DB and experiences, and only mine data for new commits landed since then

Fixes #537

* Simplify db methods

* Add an option to return mined commits
2019-06-06 18:55:17 +02:00
Boris Feld 32f56a3962 Add a script to update the hook definition with the TAG during release (#507)
Fixes #501, fixed relanding of #491.
2019-06-06 18:11:59 +02:00
Boris Feld 08e36a7d8a Build tagged Docker images (#554) 2019-06-06 18:06:16 +02:00
pyup.io bot 75045dac44 Update pre-commit from 1.16.1 to 1.17.0 (#555) 2019-06-06 18:00:14 +02:00
Ayush Shridhar f21b4ee9d8 Add a script to generate duplicate classifier results (#548) 2019-06-06 16:30:23 +02:00
Marco Castelluccio b8f0a14f4e Store ETag when downloading DBs and only redownload if necessary 2019-06-05 14:54:36 +02:00
Marco Castelluccio 47eb2da7ec Don't overwrite the first_pushdate value
Follow up to bea28a17f6
2019-06-05 01:07:12 +02:00
Marco Castelluccio f5951ad63a Support retrieving some label files at runtime, and do it for the regressor labels 2019-06-05 00:37:26 +02:00
Marco 5165524b62
Iterate over bugs only once during training (#527)
Fixes #515
2019-06-04 18:45:55 +02:00
Marco Castelluccio b0207e2448 Version 0.0.43 2019-06-04 16:34:36 +02:00
Marco Castelluccio 1e6cb79573 Avoid all types of weirdness with ', ' in 'added' or 'removed' 2019-06-04 16:34:18 +02:00
Ayush Shridhar 551af5ff1c Remove sampler from Duplicate model (#543) 2019-06-04 16:22:58 +02:00
Ayush Shridhar 3f2b1d4efa Randomly choose non-duplicate bugs for Duplicate model training (#542) 2019-06-04 15:51:14 +02:00
Marco Castelluccio 218e100b3e Version 0.0.42 2019-06-04 13:49:15 +02:00
Marco Castelluccio 7790f5e3d5 Use raw CSV file, not GitHub's HTML page 2019-06-04 13:08:24 +02:00
Marco Castelluccio d57177f1e4 Fix destination path of the regressor.csv label file 2019-06-04 13:07:59 +02:00
Marco Castelluccio b1ddef742a Download bugs DB when the model is a BugCoupleModel too 2019-06-04 12:56:52 +02:00
Marco Castelluccio dfbe7a5ed4 Add functions to download and extract DB support files
For example, the version file.
2019-06-04 12:55:03 +02:00
Marco Castelluccio bea28a17f6 Get first pushdate from hg log on following runs of the repository mining script
Otherwise we'd use the pushdate of the first new commit as the first pushdate.
2019-06-04 12:52:08 +02:00
Marco Castelluccio 36d7d4449e Store first_commit_time dict too in the experiences file
Otherwise on following runs of calculate_experiences we'd have wrong seniority.
2019-06-04 12:51:29 +02:00
Marco Castelluccio 86babf8222 Transform results should be available when merge_data is False too 2019-06-04 11:33:52 +02:00
Marco Castelluccio 089f8dd7ca Don't rollback the same bug multiple times in case of bug couples 2019-06-04 11:33:15 +02:00
Marco Castelluccio 9afa655651 Increase number of duplicate and non-duplicate bugs to consider 2019-06-04 01:36:43 +02:00
Marco Castelluccio 9357b91c16 Build set of all IDs in one go 2019-06-04 01:36:03 +02:00
Marco Castelluccio 6cfe6fe8e1 Remove duplicate duplicate IDs 2019-06-04 01:34:56 +02:00
Marco Castelluccio 8e3f2d58eb Limit the number of duplicates to consider, without leaking duplicates into non-duplicates
We were stopping to iterate bugs when we reached the number of duplicates we wanted.
This meant that we were considering some duplicate bugs to be non-duplicate.
2019-06-04 01:33:23 +02:00
Marco Castelluccio 17626e16d1 No need to declare non_duplicate_ids as empty list 2019-06-04 01:01:14 +02:00
Marco Castelluccio b441709a26 Print number of labels consistently in the Duplicate model 2019-06-04 01:00:41 +02:00
Marco Castelluccio baf8650399 Use a set for storing all IDs, and calculate non-duplicate IDs as the difference between sets of all bugs and of duplicate bugs 2019-06-04 01:00:03 +02:00
Marco Castelluccio 0a7ce5b763 No need to limit the overall number of bug IDs to consider, as long as we limit the number of duplicate bugs to consider 2019-06-04 00:57:16 +02:00
Marco Castelluccio 967a038018 Misc cleanup for the label calculation of the Duplicate model 2019-06-04 00:56:25 +02:00
Marco Castelluccio 4f77ac82e3 Don't fail when the 'cf_has_str' field is not available for a given bug 2019-06-03 23:31:53 +02:00
pyup.io bot 78c5fd7edb Update pytest from 4.6.1 to 4.6.2 (#538) 2019-06-03 22:40:30 +02:00
Marco Castelluccio 9e1f32f03f Version 0.0.41 2019-06-03 22:29:45 +02:00
Marco Castelluccio 44e26ff0e8 Add a training task for the Regressor model 2019-06-03 22:15:18 +02:00
Marco Castelluccio 2804436357 Download regressor labels from marco-c/mozilla-central-regressors repository in the train_regressor Docker image 2019-06-03 22:14:47 +02:00
Marco Castelluccio 72ddfea2e3 Add a Docker image for the task to train the Regressor model 2019-06-03 21:46:35 +02:00
Marco Castelluccio 6b99570349 Sort models by name 2019-06-03 21:46:07 +02:00
Marco Castelluccio ab39f26c2a Add a model to predict patches more likely to cause regressions 2019-06-03 21:45:13 +02:00
Marco Castelluccio 4ce438a35a Fix typo in artifact name for the commits retrieval task 2019-06-03 21:37:39 +02:00
Marco Castelluccio f397033f77 Version 0.0.40 2019-06-03 19:43:25 +02:00
Marco Castelluccio d993ae0d15 Add more defect/enhancement/task labels gathered from changed made by users on Bugzilla 2019-06-03 19:35:28 +02:00
Marco d8b84ca798
Support retrieving commits in steps (#536)
* Support retrieving commits in steps

* Store component mapping ETag to actually avoid downloading it again when not needed

* Store a version file alongside the DBs

* Export the commits DB version file and the experiences values as artifacts of the commit-retriever task
2019-06-03 19:29:08 +02:00