Граф коммитов

53 Коммитов

Автор SHA1 Сообщение Дата
Marco Castelluccio ad98a3f911 Add tests for repository.hg_log
Fixes #385
2019-06-03 10:22:08 +02:00
Marco Castelluccio 0e08d04903 Fix subclass detection in trainer script 2019-05-30 18:55:22 +02:00
Marco Castelluccio e729d68d82 Assert get_all_bug_ids returns only numbers 2019-05-30 16:15:54 +02:00
Marco Castelluccio 51e7297a89 Test type of component and backout models 2019-05-30 13:25:26 +02:00
Marco 62b348d46c
Don't skip commits when calculating experiences (#510)
* Skip commits after calculating experience

Fixes #502

* Skip commits that have 'ignore-this-changeset' in their description

Fixes #496

* Don't increase experience for commits to ignore

* Test skipping a commit when calculating experiences

* Add test for repository.get_commits_to_ignore

* Add implementations of __eq__ and __hash__ in the Commit class
2019-05-29 11:24:11 +02:00
x249wang a5c584afa2 Add basic tests for the BugType model (#513) 2019-05-29 10:39:33 +02:00
Marco 837788428a
Calculate experiences of backed-out commits too (#497)
* Calculate experience for backed-out commits too

Fixes #373

* Add a backed-out commit to the list to make sure it doesn't affect normal experiences

* Add backout experience features
2019-05-28 12:59:55 +02:00
Marco 3f731dad1b
Optimize experience calculation by having lists specific to items instead of carrying over the entire history all the time (#488)
* Stop getting a subset of revs

* Instead of carrying over all history day-by-day, use a deque with experience specific to a single item
2019-05-24 15:33:57 +02:00
Ayush Shridhar ec25e5126b Add tests for seniority calculation (#480) 2019-05-22 00:53:11 +02:00
Marco 1e18cd3fac
Store experiences directly in the commit object (#477)
* Store experiences directly in the commit object

This way we can avoid the huge dict of dicts of dicts mapping commit hashes to experiences

* Use tqdm to see progress during seniority calculation

* Make 'first_commit_time' a simple dict

* Store seniority directly in the commit object
2019-05-21 16:20:46 +02:00
Marco Castelluccio 463c174508 Assert the list of bug IDs returned by get_all_bug_ids is longer than 0 2019-05-21 01:27:45 +02:00
Marco 72dab1a6d8
Gather more features about experiences (#467)
* Refactor things to avoid multiple sums and set updates

* Store max and min values too for experiences

* Store touched files and directories too as part of the commit data

* Remove useless default value for files_modified_num

* Use f-string instead of string concatenation for feature names

* Add more features about experiences (average, maximum, minimum, number of elements)

Fixes #370
2019-05-20 13:14:18 +02:00
Marco 4e26a1e64f
Make bug snapshotting code fail silently when really necessary (#457)
* Remove useless 'pass'

* Remove all_inconsistencies and verbose options

* Add a 'platform' field mapping

* Don't assert during training or during evaluation

These kinds of assertions can be very frequent unfortunately, so we only enable them when needed.
That is:
1) During CI tests;
2) During rollback test, after downloading a new set of bugs (which we want to make sure is "clean enough").

Fixes #450

* Move all inconsistencies checks into separate functions, to make rollback code cleaner

* Fix assertion/log message
2019-05-17 20:06:57 +02:00
Boris Feld 0a5e37439d Add a central place where the models are defined (#398)
* Add a central place where the models are defined

Also add some helpers to load a model.

* Add missing tensorflow dependency in extra-nn-requirements.txt
2019-05-16 15:34:38 +02:00
Marco Castelluccio 3c91af9c60 Exclude previous commits since the beginning
Better implementation of 837169ebce
2019-05-16 01:51:28 +02:00
Ayush Shridhar add9a937b3 Multilabel classifier for detecting type of bug (#395) 2019-05-14 12:17:53 +02:00
Marco ed992d57ec
Initial tests for repository module (#377)
* Test repository.get_revs function

* Test repository.get_directories function

* Split _hg_log function called by ProcessPoolExecutor in two to make it more easily testable

* Add logging when downloading file->component mapping

* Move the experience calculation code in a separate function

* Don't break experiences when there are days without commits

* Add tests for repository.calculate_experiences

Fixes #382

*  When a commit changes multiple files in the same component, don't overcount the experience

The commit itself was being considered as a previous commit touching the same components
2019-05-14 11:30:56 +02:00
Ayush Shridhar c440db7315 Use re.compile to speedup feature cleanups (#351)
Fixes #338.
2019-05-09 15:09:26 +02:00
Marco a779560d37
Remove versioning support, as we are not really using it (#359) 2019-05-09 11:12:30 +02:00
Boris Feld 6937e0e5e8 Add the rollback test in the data pipeline (#337)
Add the rollback test in the data pipeline and move the bug snapshot test to a pytest test
2019-05-03 14:20:43 +02:00
Marco 9995b8c236
Make training code more generic to make it possible to train on other kinds of objects (e.g. commits) (#335)
* Move feature cleanup functions in a separate module

As they can be shared for different objectives, e.g. both training on bugs and on commits.

* Make Model more generic to make it possible to train on different objects

Introduce BugModel and CommitModel, as base classes for models training on bugs and on commits.

Update all models to use BugModel and to use the new feature_cleanup module.

Fixes #306.

* Update ID and description of the defect/enhancement/task Taskcluster task definition

* Add a module to extract features from commit data

* Add an example model training on commits to predict commits which will be backed out

* Update defect model name, and add possibility to train backout model
2019-05-03 11:57:48 +02:00
Marco d27d5f8b2c
Add a mock DB for tests to avoid downloading the full DB (#273)
* Rename test function in test_bug to reflect reality, and add more assertions

* Add a mock bugs DB for tests

* Don't download bugs DB anymore for running tests

* Add a test for run.py basic functionality

* Remove training test task, as the test is now a pytest
2019-04-18 14:01:25 +02:00
Boris Feld bad6a50d8b Pre commit setup (#252)
* Add pre-commit configuration

Add auto-formatting configuration using the https://pre-commit.com/ project.
Having auto-formatting setup and automatically enforced helps speeding up
development and review process.

* Apply the auto-formatting on all files in the repository

* Removes flake8-quotes as it conflicts with Black formatting

* Disable some Flake8 rules

Disable Flake8 rules that are handled by Black. The list comes from
https://github.com/ambv/black/issues/429#issuecomment-472687803.
2019-04-09 15:57:29 +02:00
Marco Castelluccio 3ba7b5bf2b Remove unneeded debugging print statement from test 2019-03-11 21:04:06 +01:00
Marco Castelluccio 94408de363 Add tests for some error cases 2019-03-11 21:03:58 +01:00
Marco Castelluccio f6df1e574f Add support for other serialization formats 2019-03-11 20:58:51 +01:00
Marco Castelluccio 5182b234b5 Use pytest parametrize instead of duplicating test code 2019-03-11 18:54:37 +01:00
Marco Castelluccio 9ed8dabdc6 Use a pytest fixture to register a DB, to avoid some code duplication 2019-03-11 18:54:37 +01:00
Marco Castelluccio 40f3832496 Support databases compressed with zstandard 2019-03-11 18:12:14 +01:00
Marco Castelluccio 1f40a366ed Tests for some of the db methods 2019-03-07 23:15:41 +01:00
Lakshya A Agrawal f3a49fcd7a Differentiate between Firefox and external DLLs during cleanup (#199) 2019-02-28 19:16:57 +01:00
Ayush Shridhar a309cd965c Add rollback to QANeeded model (#189) 2019-02-28 16:45:13 +01:00
Marco Castelluccio 9cfbcaa1d4 Rename bug_has_cve_in_alias feature to has_cve_in_alias to match the naming conventions 2019-02-05 14:17:05 +01:00
Saurabh Daalia 30655e2b46 Add bug reporter as a feature (#124) 2019-01-30 18:49:17 +01:00
Aftaab Zia 03a30daa40 Add 'CVE' in bug alias as feature (#130) 2019-01-30 15:44:22 +01:00
Puneet Saini 5da534ecc9 Add number and length of comments as a feature (#131) 2019-01-30 11:23:45 +01:00
Assiya Khuzyakhmetova 65fb7b04c6 Add number of blocked bugs as a feature (#125) 2019-01-29 10:05:05 +01:00
Ayush Shridhar 62a83fb598 Consider all possible whiteboard values (#119) 2019-01-28 21:04:42 +01:00
dbxnr 8ef3e3da87 Expand cleanup_dll regex to match .so and .dylib (#96) 2019-01-23 00:24:56 +01:00
poojan124 cc55ed22a6 Add test for 'Is Mozillian' feature (#95) 2019-01-21 22:18:51 +01:00
Ayush Shridhar 65149bb217 Add tests for bug feature extractors (#80) 2019-01-20 18:31:20 +01:00
Ayush Shridhar acc501c67e Add a function to replace crash stats references with a token (#77) 2019-01-18 20:36:06 +01:00
Ayush Shridhar 513c16729d Cleanup responses from comments (#52) 2019-01-17 18:31:01 +01:00
Ayush Shridhar 17a648f6e6 Substitute hex numbers in title and comments by a token (#56) 2019-01-17 16:50:46 +01:00
Ayush Shridhar f0bc25ddd6 Cleanup .dll words (#65) 2019-01-17 14:52:23 +01:00
Ayush Shridhar f988216df1 Add underscores around tokens (#50) 2019-01-05 15:27:29 +01:00
Marco Castelluccio b4dd6acb75 Add some more synonyms and ignore case when replacing synonyms 2019-01-02 16:47:54 +01:00
Marco Castelluccio 859cd9fd8b Add function to cleanup synonyms, and clean 'safe mode' synonyms 2019-01-02 16:02:02 +01:00
Marco Castelluccio a6f442b7b2 Move label gathering code in the models themselves 2019-01-02 15:26:43 +01:00
Ayush Shridhar b2046ae777 Replace code reference URLs with a CODE_REFERENCE_URL token (#46) 2018-12-19 15:24:10 +01:00