* Skip commits after calculating experience
Fixes#502
* Skip commits that have 'ignore-this-changeset' in their description
Fixes#496
* Don't increase experience for commits to ignore
* Test skipping a commit when calculating experiences
* Add test for repository.get_commits_to_ignore
* Add implementations of __eq__ and __hash__ in the Commit class
* Calculate experience for backed-out commits too
Fixes#373
* Add a backed-out commit to the list to make sure it doesn't affect normal experiences
* Add backout experience features
* Store experiences directly in the commit object
This way we can avoid the huge dict of dicts of dicts mapping commit hashes to experiences
* Use tqdm to see progress during seniority calculation
* Make 'first_commit_time' a simple dict
* Store seniority directly in the commit object
* Refactor things to avoid multiple sums and set updates
* Store max and min values too for experiences
* Store touched files and directories too as part of the commit data
* Remove useless default value for files_modified_num
* Use f-string instead of string concatenation for feature names
* Add more features about experiences (average, maximum, minimum, number of elements)
Fixes#370
* Remove useless 'pass'
* Remove all_inconsistencies and verbose options
* Add a 'platform' field mapping
* Don't assert during training or during evaluation
These kinds of assertions can be very frequent unfortunately, so we only enable them when needed.
That is:
1) During CI tests;
2) During rollback test, after downloading a new set of bugs (which we want to make sure is "clean enough").
Fixes#450
* Move all inconsistencies checks into separate functions, to make rollback code cleaner
* Fix assertion/log message
* Add a central place where the models are defined
Also add some helpers to load a model.
* Add missing tensorflow dependency in extra-nn-requirements.txt
* Test repository.get_revs function
* Test repository.get_directories function
* Split _hg_log function called by ProcessPoolExecutor in two to make it more easily testable
* Add logging when downloading file->component mapping
* Move the experience calculation code in a separate function
* Don't break experiences when there are days without commits
* Add tests for repository.calculate_experiences
Fixes#382
* When a commit changes multiple files in the same component, don't overcount the experience
The commit itself was being considered as a previous commit touching the same components
* Move feature cleanup functions in a separate module
As they can be shared for different objectives, e.g. both training on bugs and on commits.
* Make Model more generic to make it possible to train on different objects
Introduce BugModel and CommitModel, as base classes for models training on bugs and on commits.
Update all models to use BugModel and to use the new feature_cleanup module.
Fixes#306.
* Update ID and description of the defect/enhancement/task Taskcluster task definition
* Add a module to extract features from commit data
* Add an example model training on commits to predict commits which will be backed out
* Update defect model name, and add possibility to train backout model
* Rename test function in test_bug to reflect reality, and add more assertions
* Add a mock bugs DB for tests
* Don't download bugs DB anymore for running tests
* Add a test for run.py basic functionality
* Remove training test task, as the test is now a pytest
* Add pre-commit configuration
Add auto-formatting configuration using the https://pre-commit.com/ project.
Having auto-formatting setup and automatically enforced helps speeding up
development and review process.
* Apply the auto-formatting on all files in the repository
* Removes flake8-quotes as it conflicts with Black formatting
* Disable some Flake8 rules
Disable Flake8 rules that are handled by Black. The list comes from
https://github.com/ambv/black/issues/429#issuecomment-472687803.