* Download previous commits DB and experiences, and only mine data for new commits landed since then
Fixes#537
* Simplify db methods
* Add an option to return mined commits
* Support retrieving commits in steps
* Store component mapping ETag to actually avoid downloading it again when not needed
* Store a version file alongside the DBs
* Export the commits DB version file and the experiences values as artifacts of the commit-retriever task
* Skip commits after calculating experience
Fixes#502
* Skip commits that have 'ignore-this-changeset' in their description
Fixes#496
* Don't increase experience for commits to ignore
* Test skipping a commit when calculating experiences
* Add test for repository.get_commits_to_ignore
* Add implementations of __eq__ and __hash__ in the Commit class
* Calculate experience for backed-out commits too
Fixes#373
* Add a backed-out commit to the list to make sure it doesn't affect normal experiences
* Add backout experience features
* Store experiences directly in the commit object
This way we can avoid the huge dict of dicts of dicts mapping commit hashes to experiences
* Use tqdm to see progress during seniority calculation
* Make 'first_commit_time' a simple dict
* Store seniority directly in the commit object
* Refactor things to avoid multiple sums and set updates
* Store max and min values too for experiences
* Store touched files and directories too as part of the commit data
* Remove useless default value for files_modified_num
* Use f-string instead of string concatenation for feature names
* Add more features about experiences (average, maximum, minimum, number of elements)
Fixes#370
* Remove useless 'pass'
* Remove all_inconsistencies and verbose options
* Add a 'platform' field mapping
* Don't assert during training or during evaluation
These kinds of assertions can be very frequent unfortunately, so we only enable them when needed.
That is:
1) During CI tests;
2) During rollback test, after downloading a new set of bugs (which we want to make sure is "clean enough").
Fixes#450
* Move all inconsistencies checks into separate functions, to make rollback code cleaner
* Fix assertion/log message
* Add a central place where the models are defined
Also add some helpers to load a model.
* Add missing tensorflow dependency in extra-nn-requirements.txt
* Test repository.get_revs function
* Test repository.get_directories function
* Split _hg_log function called by ProcessPoolExecutor in two to make it more easily testable
* Add logging when downloading file->component mapping
* Move the experience calculation code in a separate function
* Don't break experiences when there are days without commits
* Add tests for repository.calculate_experiences
Fixes#382
* When a commit changes multiple files in the same component, don't overcount the experience
The commit itself was being considered as a previous commit touching the same components