Platform for Machine Learning projects on Software Engineering
Перейти к файлу
Marco Castelluccio e11bac35b9 Add more labelled bugs
Former-commit-id: d63848e639
2018-10-11 14:04:41 +02:00
.gitignore First commit 2018-03-11 20:12:35 +00:00
.isort.cfg Enable several flake8 checkers 2018-09-21 16:45:04 +02:00
.travis.yml Add Travis CI configuration 2018-09-22 02:01:35 +02:00
LICENSE First commit 2018-03-11 20:12:35 +00:00
README.md Update README 2018-09-24 23:13:29 +01:00
bug_features.py Add a ML-based classifier 2018-09-22 02:10:21 +02:00
bugbug.py Fix flake8 issues in bugbug.py 2018-09-22 02:00:36 +02:00
classes.csv First commit 2018-03-11 20:12:35 +00:00
classes_more.csv Add more labelled bugs 2018-10-11 14:04:41 +02:00
get_bugs.py Small cleanup by making get_labels directly return a dict 2018-09-24 14:14:15 +01:00
handwritten_rules_run.py Small cleanup by making get_labels directly return a dict 2018-09-24 14:14:15 +01:00
requirements.txt Add optional lemmatization using spaCy 2018-10-01 02:40:06 +02:00
run.py Add doc2vec to the list of things to try 2018-10-01 02:41:34 +02:00
setup.cfg Enable several flake8 checkers 2018-09-21 16:45:04 +02:00
test-requirements.txt Enable several flake8 checkers 2018-09-21 16:45:04 +02:00
utils.py Add a ML-based classifier 2018-09-22 02:10:21 +02:00

README.md

bugbug - Classify Bugzilla bugs between actual bugs and bugs that aren't bugs

Bugs on Bugzilla aren't always bugs. Sometimes they are feature requests, refactorings, and so on. The aim of this project is to distinguish between bugs that are actually bugs and bugs that aren't.

The dataset currently contains 1913 bugs, the accuracy of the current classifier is ~92% (precision ~97%, recall ~93%).

Setup

  1. Run pip install -r requirements.txt and pip install -r test-requirements.txt
  2. Install MongoDB
  3. Run mongo bugbug --eval "db.bugs.drop()"
  4. Run cat data/bugs.json.xz.part* | unxz > data/bugs.json
  5. Run mongoimport --db bugbug --collection bugs --file data/bugs.json

If you update the bugs database, run:

  1. mongoexport -d bugbug -c bugs -o data/bugs.json
  2. cat data/bugs.json | xz -v1 - | split -d -b 20MB - data/bugs.json.xz.part