Ayush Shridhar
3f2b1d4efa
Randomly choose non-duplicate bugs for Duplicate model training ( #542 )
2019-06-04 15:51:14 +02:00
Marco Castelluccio
218e100b3e
Version 0.0.42
2019-06-04 13:49:15 +02:00
Marco Castelluccio
7790f5e3d5
Use raw CSV file, not GitHub's HTML page
2019-06-04 13:08:24 +02:00
Marco Castelluccio
d57177f1e4
Fix destination path of the regressor.csv label file
2019-06-04 13:07:59 +02:00
Marco Castelluccio
b1ddef742a
Download bugs DB when the model is a BugCoupleModel too
2019-06-04 12:56:52 +02:00
Marco Castelluccio
dfbe7a5ed4
Add functions to download and extract DB support files
...
For example, the version file.
2019-06-04 12:55:03 +02:00
Marco Castelluccio
bea28a17f6
Get first pushdate from hg log on following runs of the repository mining script
...
Otherwise we'd use the pushdate of the first new commit as the first pushdate.
2019-06-04 12:52:08 +02:00
Marco Castelluccio
36d7d4449e
Store first_commit_time dict too in the experiences file
...
Otherwise on following runs of calculate_experiences we'd have wrong seniority.
2019-06-04 12:51:29 +02:00
Marco Castelluccio
86babf8222
Transform results should be available when merge_data is False too
2019-06-04 11:33:52 +02:00
Marco Castelluccio
089f8dd7ca
Don't rollback the same bug multiple times in case of bug couples
2019-06-04 11:33:15 +02:00
Marco Castelluccio
9afa655651
Increase number of duplicate and non-duplicate bugs to consider
2019-06-04 01:36:43 +02:00
Marco Castelluccio
9357b91c16
Build set of all IDs in one go
2019-06-04 01:36:03 +02:00
Marco Castelluccio
6cfe6fe8e1
Remove duplicate duplicate IDs
2019-06-04 01:34:56 +02:00
Marco Castelluccio
8e3f2d58eb
Limit the number of duplicates to consider, without leaking duplicates into non-duplicates
...
We were stopping to iterate bugs when we reached the number of duplicates we wanted.
This meant that we were considering some duplicate bugs to be non-duplicate.
2019-06-04 01:33:23 +02:00
Marco Castelluccio
17626e16d1
No need to declare non_duplicate_ids as empty list
2019-06-04 01:01:14 +02:00
Marco Castelluccio
b441709a26
Print number of labels consistently in the Duplicate model
2019-06-04 01:00:41 +02:00
Marco Castelluccio
baf8650399
Use a set for storing all IDs, and calculate non-duplicate IDs as the difference between sets of all bugs and of duplicate bugs
2019-06-04 01:00:03 +02:00
Marco Castelluccio
0a7ce5b763
No need to limit the overall number of bug IDs to consider, as long as we limit the number of duplicate bugs to consider
2019-06-04 00:57:16 +02:00
Marco Castelluccio
967a038018
Misc cleanup for the label calculation of the Duplicate model
2019-06-04 00:56:25 +02:00
Marco Castelluccio
4f77ac82e3
Don't fail when the 'cf_has_str' field is not available for a given bug
2019-06-03 23:31:53 +02:00
pyup.io bot
78c5fd7edb
Update pytest from 4.6.1 to 4.6.2 ( #538 )
2019-06-03 22:40:30 +02:00
Marco Castelluccio
9e1f32f03f
Version 0.0.41
2019-06-03 22:29:45 +02:00
Marco Castelluccio
44e26ff0e8
Add a training task for the Regressor model
2019-06-03 22:15:18 +02:00
Marco Castelluccio
2804436357
Download regressor labels from marco-c/mozilla-central-regressors repository in the train_regressor Docker image
2019-06-03 22:14:47 +02:00
Marco Castelluccio
72ddfea2e3
Add a Docker image for the task to train the Regressor model
2019-06-03 21:46:35 +02:00
Marco Castelluccio
6b99570349
Sort models by name
2019-06-03 21:46:07 +02:00
Marco Castelluccio
ab39f26c2a
Add a model to predict patches more likely to cause regressions
2019-06-03 21:45:13 +02:00
Marco Castelluccio
4ce438a35a
Fix typo in artifact name for the commits retrieval task
2019-06-03 21:37:39 +02:00
Marco Castelluccio
f397033f77
Version 0.0.40
2019-06-03 19:43:25 +02:00
Marco Castelluccio
d993ae0d15
Add more defect/enhancement/task labels gathered from changed made by users on Bugzilla
2019-06-03 19:35:28 +02:00
Marco
d8b84ca798
Support retrieving commits in steps ( #536 )
...
* Support retrieving commits in steps
* Store component mapping ETag to actually avoid downloading it again when not needed
* Store a version file alongside the DBs
* Export the commits DB version file and the experiences values as artifacts of the commit-retriever task
2019-06-03 19:29:08 +02:00
Marco Castelluccio
32f024b9e3
Add a rev_start parameter to repository.get_revs and test it
2019-06-03 16:02:01 +02:00
Marco Castelluccio
e3b4d3fcc8
Add a test for repository.download_commits
...
Fixes #379
2019-06-03 15:52:22 +02:00
Marco Castelluccio
203f144781
Download new component mapping only when necessary
2019-06-03 15:48:29 +02:00
Ibraheem Moosa
2c41533c52
Clean the test data for cleanup functions ( #530 )
2019-06-03 14:37:46 +02:00
Ayush Shridhar
a17f7c9ed4
Ignore bugs with the "dupeme" keyword in the Duplicate model ( #534 )
2019-06-03 11:51:41 +02:00
Marco Castelluccio
e62dd6f37d
Make rollback-test task verbose
2019-06-03 11:06:32 +02:00
Marco Castelluccio
ac6ab827e1
History changes might contain extraneous ', ' at the beginning or the end, ignore them
2019-06-03 11:06:32 +02:00
Marco Castelluccio
7a12860a11
Use tqdm to show progress when rollbacking all bugs
2019-06-03 11:06:32 +02:00
Marco Castelluccio
220e36b2cc
Add more expected inconsistent changes while rollbacking
2019-06-03 11:06:32 +02:00
Ayush Shridhar
df5c56503f
Ignore some reporters (automated tests related) in the Duplicate model ( #526 )
2019-06-03 11:01:01 +02:00
pyup.io bot
3d7e46fa05
Update pytest from 4.6.0 to 4.6.1 ( #533 )
2019-06-03 10:22:56 +02:00
Marco Castelluccio
ad98a3f911
Add tests for repository.hg_log
...
Fixes #385
2019-06-03 10:22:08 +02:00
pyup.io bot
5d26382804
Update pytest from 4.5.0 to 4.6.0 ( #531 )
2019-06-03 01:39:11 +02:00
Marco Castelluccio
221bcff260
Version 0.0.39
2019-05-31 18:09:33 +02:00
Marco Castelluccio
aebc3c4414
Add bugbug-train-duplicate to docker-compose.yml
2019-05-31 18:03:03 +02:00
Marco
f600350548
Use hgdate template formatting for date and pushdate ( #528 )
...
The pushdate template parameter is affected by a bug on machines with UTC timezones,
which only manifests when not using a formatting.
2019-05-31 18:00:16 +02:00
Ayush Shridhar
9d71677667
Add a training task for the Duplicate model ( #525 )
2019-05-31 17:05:58 +02:00
Marco Castelluccio
98b03e588a
Version 0.0.38
2019-05-30 22:02:01 +02:00
Ayush Shridhar
b5a473c760
Add a classifier for duplicate bugs ( #484 )
2019-05-30 21:57:59 +02:00