Граф коммитов

336 Коммитов

Автор SHA1 Сообщение Дата
Marco Castelluccio 16ece06f64 Retry git operations multiple times 2019-06-27 10:25:38 +02:00
Marco Castelluccio a3933a48a4 Don't fail if there's an error while pulling from the repo 2019-06-27 10:25:38 +02:00
Marco Castelluccio 56f224b9dc Generate microannotate repository for mozilla-central 2019-06-26 18:57:36 +02:00
Ayush Shridhar 6788b2e33a Make similarity script more generic and add nearest neighbors similarity with tf-idf encoding (#628) 2019-06-26 13:42:23 +02:00
Marco Castelluccio bd118c58ab Use with statement for hg.open 2019-06-26 11:45:02 +02:00
x249wang ab28e8ace2 Use zstandard instead of xz (#524)
Fixes #461.
2019-06-24 13:16:44 +02:00
Boris Feld 9834053a36 Start tracking training metrics as Taskcluster artifacts (#604)
Fixes #342
2019-06-22 14:18:08 -07:00
cklyyung f4145b4eca Use 'everchanged' operator instead of 'changedafter' operator with 1970 (#598) 2019-06-18 22:01:15 -07:00
AK.py f6289a4468 Don't try to find inconsistencies in all bugs multiple times (#595) 2019-06-18 13:37:01 -07:00
Marco Castelluccio 938eb29bbf Support getting new specific type labels from Bugzilla 2019-06-12 01:35:53 +02:00
Marco Castelluccio 735fccc4a9 In the retrieval task, download only new or changed bugs
To support it, refactor bugzilla methods:
- adding methods to get IDs given a query and given a time period;
- renaming the internal _download method to get, since it's used externally;
- changing delete to be more flexible and allowing to use a lambda to choose which bugs to delete.

Fixes #440.
2019-06-09 00:32:23 +02:00
Marco Castelluccio 36f9a7c8d9 Move comment_level_labeler script in the scripts directory 2019-06-08 20:36:13 +02:00
Boris Feld 5f9be450cf Ensure we download data from INDEX URL containing bugbug version (#564) 2019-06-07 16:14:58 +02:00
Ayush Shridhar 6e39b0a5a5 Change timedelta to 21 days in the script to generate Duplicate model results (#563) 2019-06-07 12:05:07 +02:00
Marco ee935d6e5b
Download previous commits DB and experiences, and only mine data for new commits landed since then (#546)
* Download previous commits DB and experiences, and only mine data for new commits landed since then

Fixes #537

* Simplify db methods

* Add an option to return mined commits
2019-06-06 18:55:17 +02:00
Ayush Shridhar f21b4ee9d8 Add a script to generate duplicate classifier results (#548) 2019-06-06 16:30:23 +02:00
Marco Castelluccio b1ddef742a Download bugs DB when the model is a BugCoupleModel too 2019-06-04 12:56:52 +02:00
Marco d8b84ca798
Support retrieving commits in steps (#536)
* Support retrieving commits in steps

* Store component mapping ETag to actually avoid downloading it again when not needed

* Store a version file alongside the DBs

* Export the commits DB version file and the experiences values as artifacts of the commit-retriever task
2019-06-03 19:29:08 +02:00
Marco Castelluccio e465286df1 Add another missing f-string in the trainer script 2019-05-30 21:02:46 +02:00
Marco Castelluccio f3c76ccb1a Add missing 'f' in f-string in trainer script 2019-05-30 18:55:40 +02:00
Marco Castelluccio 0e08d04903 Fix subclass detection in trainer script 2019-05-30 18:55:22 +02:00
Marco Castelluccio 7f0a6555f2 Add support for downloading different DBs according to model requirements to trainer script 2019-05-30 13:26:48 +02:00
Marco 025e3f4da2
Fix command to train defect/enhancement/task model (#476)
* Fix command to train defect/enhancement/task model

Fixes #475

* Add more logging in the trainer script, and assert the model is generated
2019-05-21 14:46:57 +02:00
Boris Feld dd00d7b9ec Add the support for downloading the model before checking it (#452)
Also put the right configuration in the check pipeline
2019-05-17 11:45:42 +02:00
Boris Feld 0a5e37439d Add a central place where the models are defined (#398)
* Add a central place where the models are defined

Also add some helpers to load a model.

* Add missing tensorflow dependency in extra-nn-requirements.txt
2019-05-16 15:34:38 +02:00
Marco 2d249793e2
Try regenerating the pushlog using pull and update (#444) 2019-05-16 11:33:14 +02:00
Marco 9223954520
Remove training tasks' unneeded dependencies on commit retrieval task (#407)
Fixes #390
2019-05-14 15:22:44 +02:00
Boris Feld f4b2b938be
Add basic check method and check script (#341)
* Add basic check method and check script

* Ensure the check of component will correctly use super result

* Add required infra to schedule model checks

* Add scheduling bits for the model checks

* Remove the filtering on classification

* Extract counting bugs to a new function in bugzilla.py

* Also checks conflated components

* Fix new hook id

* Call bugzilla with the count_only param to speed up the check

* Fix the new hook scope to match the hook id

* Fix component model check after previous refactoring

* Fix component model check method

* Use a bugzilla report for even faster component model check

* Clarify get_product_component_count docstring

We are already filtering out full component with 0 bugs

* Update conflated components mapping check

A conflated component could also be part of the conflated components mapping

* Distinguish between non-existing full components and empty full components

* Remove the filter on resolution and unnecessary url params

* Update component check method

Keep checks as separate as possible for clarity, we could merge them or makes
them faster later

* Generate dynamically the CSV report url

* Fix Docker image name the hook

* Implement component check number 5

Get the meaningful components for the last 6 months

* Handle reviews comments

* Remove extraneous print

* Removes TODO

* Use a different threshold ration when checking for new meaningful components

As we are only checking new bugs for 6 months, adjust the threshold ration to
be less sensitive to occasional burst ob bugs for q given component.

* Reduce the threshold ratio

As we check on a disjoint time window, reduce the chance of false positives

* Handle review nits

* Fix last nits
2019-05-10 12:20:23 +02:00
Boris Feld 369b44ea02 Update the index URLs in bugbug (#328)
* Update the index URLs in bugbug

* Split the http service Docker image in two

This way we can both:
- Build the first half (code + dependencies) in the usual CI.
- Build the second half at the end of the data pipeline with updated models.

Taskboot build-compose doesn't support building all services except a
specific one and it might be cumbersome to add this feature so move the second
half of the Docker image to a separate docker-compose file.
2019-05-02 17:00:32 +02:00
Marco Castelluccio 3105acef95 Add script to gather defect/enhancement/task labels 2019-04-24 14:15:40 +02:00
Boris Feld 4b55b7f4f3 Add support to get secrets from taskcluster (#294) 2019-04-19 16:49:07 +02:00
Boris Feld 6af6e8b927 Import Trainer class from release-services repository (#254)
* Import Trainer class from release-services repository

This basically import the `trainer.py` file from the `release-services`
repository at hash 77cdddd. I removed imports and reference to cli-common
helpers that will likely need to be reimplemented, like the raven support.

Also defines 4 docker images, one per model to train.

* Remove unused imports
2019-04-09 17:49:56 +02:00
Boris Feld b651744b18 Import retriever services and add Docker image definition (#251)
* Import Retriever class from release-services repository

This basically import the `retriever.py` file from the `release-services`
repository at hash 77cdddd. I removed imports and reference to cli-common
helpers that will likely needs to be reimplemented, like the raven support.

The next commit will defines some Dockerfiles that will use the imported file.

* Add docker image definition

Build three Docker image, one is for bugbug itself. It is just installing
bugbug and its dependencies.

One is for retrieving information from the mozilla-central Mercurial
repository, it depends on the first one and install the right Mercurial
version.

The last one is for retrieving information from the Bugzilla instance, it
depends in the first one and needs a valid Bugzilla token.

* Separate the two tasks into separate script files

They share almost no code at all so they don't need to be in the same file

* Apply Black on the scripts to makes Flake8 happy
2019-04-09 16:30:09 +02:00
Boris Feld bad6a50d8b Pre commit setup (#252)
* Add pre-commit configuration

Add auto-formatting configuration using the https://pre-commit.com/ project.
Having auto-formatting setup and automatically enforced helps speeding up
development and review process.

* Apply the auto-formatting on all files in the repository

* Removes flake8-quotes as it conflicts with Black formatting

* Disable some Flake8 rules

Disable Flake8 rules that are handled by Black. The list comes from
https://github.com/ambv/black/issues/429#issuecomment-472687803.
2019-04-09 15:57:29 +02:00
Marco Castelluccio 41f1aa3b1e Calculate important components based on their past occurrences rather than having a hardcoded list
Fixes #220
2019-03-18 20:18:25 +01:00
John Giannelos d29621b84d Add script to compute success rate for component models (#190) 2019-02-26 15:16:39 +01:00