To support it, refactor bugzilla methods:
- adding methods to get IDs given a query and given a time period;
- renaming the internal _download method to get, since it's used externally;
- changing delete to be more flexible and allowing to use a lambda to choose which bugs to delete.
Fixes#440.
* Download previous commits DB and experiences, and only mine data for new commits landed since then
Fixes#537
* Simplify db methods
* Add an option to return mined commits
* Support retrieving commits in steps
* Store component mapping ETag to actually avoid downloading it again when not needed
* Store a version file alongside the DBs
* Export the commits DB version file and the experiences values as artifacts of the commit-retriever task
* Add a central place where the models are defined
Also add some helpers to load a model.
* Add missing tensorflow dependency in extra-nn-requirements.txt
* Add basic check method and check script
* Ensure the check of component will correctly use super result
* Add required infra to schedule model checks
* Add scheduling bits for the model checks
* Remove the filtering on classification
* Extract counting bugs to a new function in bugzilla.py
* Also checks conflated components
* Fix new hook id
* Call bugzilla with the count_only param to speed up the check
* Fix the new hook scope to match the hook id
* Fix component model check after previous refactoring
* Fix component model check method
* Use a bugzilla report for even faster component model check
* Clarify get_product_component_count docstring
We are already filtering out full component with 0 bugs
* Update conflated components mapping check
A conflated component could also be part of the conflated components mapping
* Distinguish between non-existing full components and empty full components
* Remove the filter on resolution and unnecessary url params
* Update component check method
Keep checks as separate as possible for clarity, we could merge them or makes
them faster later
* Generate dynamically the CSV report url
* Fix Docker image name the hook
* Implement component check number 5
Get the meaningful components for the last 6 months
* Handle reviews comments
* Remove extraneous print
* Removes TODO
* Use a different threshold ration when checking for new meaningful components
As we are only checking new bugs for 6 months, adjust the threshold ration to
be less sensitive to occasional burst ob bugs for q given component.
* Reduce the threshold ratio
As we check on a disjoint time window, reduce the chance of false positives
* Handle review nits
* Fix last nits
* Update the index URLs in bugbug
* Split the http service Docker image in two
This way we can both:
- Build the first half (code + dependencies) in the usual CI.
- Build the second half at the end of the data pipeline with updated models.
Taskboot build-compose doesn't support building all services except a
specific one and it might be cumbersome to add this feature so move the second
half of the Docker image to a separate docker-compose file.
* Import Trainer class from release-services repository
This basically import the `trainer.py` file from the `release-services`
repository at hash 77cdddd. I removed imports and reference to cli-common
helpers that will likely need to be reimplemented, like the raven support.
Also defines 4 docker images, one per model to train.
* Remove unused imports
* Import Retriever class from release-services repository
This basically import the `retriever.py` file from the `release-services`
repository at hash 77cdddd. I removed imports and reference to cli-common
helpers that will likely needs to be reimplemented, like the raven support.
The next commit will defines some Dockerfiles that will use the imported file.
* Add docker image definition
Build three Docker image, one is for bugbug itself. It is just installing
bugbug and its dependencies.
One is for retrieving information from the mozilla-central Mercurial
repository, it depends on the first one and install the right Mercurial
version.
The last one is for retrieving information from the Bugzilla instance, it
depends in the first one and needs a valid Bugzilla token.
* Separate the two tasks into separate script files
They share almost no code at all so they don't need to be in the same file
* Apply Black on the scripts to makes Flake8 happy
* Add pre-commit configuration
Add auto-formatting configuration using the https://pre-commit.com/ project.
Having auto-formatting setup and automatically enforced helps speeding up
development and review process.
* Apply the auto-formatting on all files in the repository
* Removes flake8-quotes as it conflicts with Black formatting
* Disable some Flake8 rules
Disable Flake8 rules that are handled by Black. The list comes from
https://github.com/ambv/black/issues/429#issuecomment-472687803.