doc: Document the regressor model (#1582)

This commit is contained in:
Martin Monperrus 2020-05-18 14:58:11 +02:00 коммит произвёл GitHub
Родитель 2ae01d0173
Коммит 35d741b01a
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
3 изменённых файлов: 26 добавлений и 1 удалений

Просмотреть файл

@ -30,7 +30,7 @@ https://hacks.mozilla.org/2019/04/teaching-machines-to-triage-firefox-bugs/
- **regressionrange** - The aim of this classifier is to detect regression bugs that have a regression range vs those that don't.
- **regressor** - The aim of this classifier is to detect patches which are more likely to cause regressions. It could be used to make riskier patches undergo more scrutiny.
- [**regressor**](docs/models/regressor.md) - The aim of this classifier is to detect patches which are more likely to cause regressions. It could be used to make riskier patches undergo more scrutiny.
- **spam** - The aim of this classifier is to detect bugs which are spam.

4
docs/README.md Normal file
Просмотреть файл

@ -0,0 +1,4 @@
Detailed documentation per model
* [Regressor model for predicting risky commits](models/regressor.md)

21
docs/models/regressor.md Normal file
Просмотреть файл

@ -0,0 +1,21 @@
Supported languages
------------------
The regressor model supports all languages supported by rust-code-analysis: https://github.com/mozilla/rust-code-analysis#supported-languages.
Training the model for another project
--------------------------------------
There are quite a few steps to reproduce the results on another project, and they kind of depend on the processes followed by the specific project. Here is the current pipeline, which depends on Mozilla's processes. Some steps might me not necessary for other projects (and some projects might require additional steps).
1. Gather bugs from the project's Bugzilla;
1. Mine commits from the repository;
1. Create a list of commits to ignore (formatting changes and so on, which surely can't have introduced regressions);
1. Classify bugs between actual bugs and feature requests (we recently introduced a new "type" field in Bugzilla that developers fill, so we have a high precision in this step; for old bugs where the type field is absent, we use the "defect" model to classify the bug);
1. Use SZZ to find the commits which introduced the bugs from the list from step 4 (making git blame ignore and skip over commits from step 3);
1. Now we have a dataset of commits which introduced bugs and commits which did not introduce bugs, so we can actually train the regressor model.
* Step 1 is in scripts/bug_retriever.py and bugbug/bugzilla.py;
* Step 2 is scripts/commit_retriever.py and bugbug/repository.py;
* Step 3 and 4 and 5 are in scripts/regressor_finder.py;
* Step 6 is the actual "regressor" model, in bugbug/models/regressor.py.