Граф коммитов

751 Коммитов

Автор SHA1 Сообщение Дата
Ayush Shridhar 3f2b1d4efa Randomly choose non-duplicate bugs for Duplicate model training (#542) 2019-06-04 15:51:14 +02:00
Marco Castelluccio 218e100b3e Version 0.0.42 2019-06-04 13:49:15 +02:00
Marco Castelluccio 7790f5e3d5 Use raw CSV file, not GitHub's HTML page 2019-06-04 13:08:24 +02:00
Marco Castelluccio d57177f1e4 Fix destination path of the regressor.csv label file 2019-06-04 13:07:59 +02:00
Marco Castelluccio b1ddef742a Download bugs DB when the model is a BugCoupleModel too 2019-06-04 12:56:52 +02:00
Marco Castelluccio dfbe7a5ed4 Add functions to download and extract DB support files
For example, the version file.
2019-06-04 12:55:03 +02:00
Marco Castelluccio bea28a17f6 Get first pushdate from hg log on following runs of the repository mining script
Otherwise we'd use the pushdate of the first new commit as the first pushdate.
2019-06-04 12:52:08 +02:00
Marco Castelluccio 36d7d4449e Store first_commit_time dict too in the experiences file
Otherwise on following runs of calculate_experiences we'd have wrong seniority.
2019-06-04 12:51:29 +02:00
Marco Castelluccio 86babf8222 Transform results should be available when merge_data is False too 2019-06-04 11:33:52 +02:00
Marco Castelluccio 089f8dd7ca Don't rollback the same bug multiple times in case of bug couples 2019-06-04 11:33:15 +02:00
Marco Castelluccio 9afa655651 Increase number of duplicate and non-duplicate bugs to consider 2019-06-04 01:36:43 +02:00
Marco Castelluccio 9357b91c16 Build set of all IDs in one go 2019-06-04 01:36:03 +02:00
Marco Castelluccio 6cfe6fe8e1 Remove duplicate duplicate IDs 2019-06-04 01:34:56 +02:00
Marco Castelluccio 8e3f2d58eb Limit the number of duplicates to consider, without leaking duplicates into non-duplicates
We were stopping to iterate bugs when we reached the number of duplicates we wanted.
This meant that we were considering some duplicate bugs to be non-duplicate.
2019-06-04 01:33:23 +02:00
Marco Castelluccio 17626e16d1 No need to declare non_duplicate_ids as empty list 2019-06-04 01:01:14 +02:00
Marco Castelluccio b441709a26 Print number of labels consistently in the Duplicate model 2019-06-04 01:00:41 +02:00
Marco Castelluccio baf8650399 Use a set for storing all IDs, and calculate non-duplicate IDs as the difference between sets of all bugs and of duplicate bugs 2019-06-04 01:00:03 +02:00
Marco Castelluccio 0a7ce5b763 No need to limit the overall number of bug IDs to consider, as long as we limit the number of duplicate bugs to consider 2019-06-04 00:57:16 +02:00
Marco Castelluccio 967a038018 Misc cleanup for the label calculation of the Duplicate model 2019-06-04 00:56:25 +02:00
Marco Castelluccio 4f77ac82e3 Don't fail when the 'cf_has_str' field is not available for a given bug 2019-06-03 23:31:53 +02:00
pyup.io bot 78c5fd7edb Update pytest from 4.6.1 to 4.6.2 (#538) 2019-06-03 22:40:30 +02:00
Marco Castelluccio 9e1f32f03f Version 0.0.41 2019-06-03 22:29:45 +02:00
Marco Castelluccio 44e26ff0e8 Add a training task for the Regressor model 2019-06-03 22:15:18 +02:00
Marco Castelluccio 2804436357 Download regressor labels from marco-c/mozilla-central-regressors repository in the train_regressor Docker image 2019-06-03 22:14:47 +02:00
Marco Castelluccio 72ddfea2e3 Add a Docker image for the task to train the Regressor model 2019-06-03 21:46:35 +02:00
Marco Castelluccio 6b99570349 Sort models by name 2019-06-03 21:46:07 +02:00
Marco Castelluccio ab39f26c2a Add a model to predict patches more likely to cause regressions 2019-06-03 21:45:13 +02:00
Marco Castelluccio 4ce438a35a Fix typo in artifact name for the commits retrieval task 2019-06-03 21:37:39 +02:00
Marco Castelluccio f397033f77 Version 0.0.40 2019-06-03 19:43:25 +02:00
Marco Castelluccio d993ae0d15 Add more defect/enhancement/task labels gathered from changed made by users on Bugzilla 2019-06-03 19:35:28 +02:00
Marco d8b84ca798
Support retrieving commits in steps (#536)
* Support retrieving commits in steps

* Store component mapping ETag to actually avoid downloading it again when not needed

* Store a version file alongside the DBs

* Export the commits DB version file and the experiences values as artifacts of the commit-retriever task
2019-06-03 19:29:08 +02:00
Marco Castelluccio 32f024b9e3 Add a rev_start parameter to repository.get_revs and test it 2019-06-03 16:02:01 +02:00
Marco Castelluccio e3b4d3fcc8 Add a test for repository.download_commits
Fixes #379
2019-06-03 15:52:22 +02:00
Marco Castelluccio 203f144781 Download new component mapping only when necessary 2019-06-03 15:48:29 +02:00
Ibraheem Moosa 2c41533c52 Clean the test data for cleanup functions (#530) 2019-06-03 14:37:46 +02:00
Ayush Shridhar a17f7c9ed4 Ignore bugs with the "dupeme" keyword in the Duplicate model (#534) 2019-06-03 11:51:41 +02:00
Marco Castelluccio e62dd6f37d Make rollback-test task verbose 2019-06-03 11:06:32 +02:00
Marco Castelluccio ac6ab827e1 History changes might contain extraneous ', ' at the beginning or the end, ignore them 2019-06-03 11:06:32 +02:00
Marco Castelluccio 7a12860a11 Use tqdm to show progress when rollbacking all bugs 2019-06-03 11:06:32 +02:00
Marco Castelluccio 220e36b2cc Add more expected inconsistent changes while rollbacking 2019-06-03 11:06:32 +02:00
Ayush Shridhar df5c56503f Ignore some reporters (automated tests related) in the Duplicate model (#526) 2019-06-03 11:01:01 +02:00
pyup.io bot 3d7e46fa05 Update pytest from 4.6.0 to 4.6.1 (#533) 2019-06-03 10:22:56 +02:00
Marco Castelluccio ad98a3f911 Add tests for repository.hg_log
Fixes #385
2019-06-03 10:22:08 +02:00
pyup.io bot 5d26382804 Update pytest from 4.5.0 to 4.6.0 (#531) 2019-06-03 01:39:11 +02:00
Marco Castelluccio 221bcff260 Version 0.0.39 2019-05-31 18:09:33 +02:00
Marco Castelluccio aebc3c4414 Add bugbug-train-duplicate to docker-compose.yml 2019-05-31 18:03:03 +02:00
Marco f600350548
Use hgdate template formatting for date and pushdate (#528)
The pushdate template parameter is affected by a bug on machines with UTC timezones,
which only manifests when not using a formatting.
2019-05-31 18:00:16 +02:00
Ayush Shridhar 9d71677667 Add a training task for the Duplicate model (#525) 2019-05-31 17:05:58 +02:00
Marco Castelluccio 98b03e588a Version 0.0.38 2019-05-30 22:02:01 +02:00
Ayush Shridhar b5a473c760 Add a classifier for duplicate bugs (#484) 2019-05-30 21:57:59 +02:00