Marco Castelluccio
73633e2e00
Make DB version handling saner to use
...
Fixing a problem where the retriever tasks would always write a version file
containing the version of the old DB.
2019-07-26 23:37:46 +02:00
Marco Castelluccio
60a979be9d
Store commits to ignore in a bugbug DB and generate them progressively
...
In the future, we will be able to get commits to ignore directly from the normal commits DB
generated by bugbug/repositor.py.
2019-07-26 18:14:57 +02:00
Marco Castelluccio
15fe853f58
Remove unneeded parentheses
2019-07-26 15:23:14 +02:00
Marco Castelluccio
633a2c4d3c
Ingore commits that are not in the VCS map
...
This means they are older than "Free the lizard" (3b56a9af51519d2e77e05efa672a13e6be2e9ebc).
2019-07-26 11:57:26 +02:00
Marco Castelluccio
c42b1cc7bb
Don't skip commits with no mirror in the tokenized repo when analyzing the normal repo
2019-07-25 18:53:23 +02:00
Marco Castelluccio
99263864fd
Revert "Increase version number of the DBs, as we changed the format"
...
This reverts commit cb5c96fd48
.
2019-07-25 18:09:13 +02:00
Marco Castelluccio
cb5c96fd48
Increase version number of the DBs, as we changed the format
2019-07-25 16:08:29 +02:00
Marco Castelluccio
f0da3b5b21
Clone tokenized git repository too
2019-07-25 10:38:06 +02:00
Marco Castelluccio
de212602fc
Add a method to evaluate the SZZ results comparing them with the regressed-by information
...
Fixes #772
2019-07-25 01:49:20 +02:00
Marco Castelluccio
d0786502d3
Download bugs which caused regressions fixed by commits too
...
Required for #772
2019-07-25 01:07:08 +02:00
Marco Castelluccio
a614d34735
Move download of bugs linked to commits in the bug-retriever script
...
Also, make the bug-retriever task depend on the commit-retriever one, making the
download of bugs linked to commits actually work :)
2019-07-25 01:05:25 +02:00
Marco Castelluccio
2dc1a17a75
Remove outdated comment
2019-07-24 22:16:22 +02:00
Marco Castelluccio
697a5c8189
Only store mercurial revisions
...
Users of the resulting DBs will take care of converting to git if they need
2019-07-24 22:15:04 +02:00
Marco Castelluccio
22d73e3637
Apply regressor finder also on the microannotated repository with comments removed
...
Fixes #627
2019-07-24 22:15:04 +02:00
Marco Castelluccio
839ebf8fcf
Make git repo URL a parameter, so we can find regressors using different git repositories
2019-07-24 21:01:53 +02:00
Marco Castelluccio
f972646819
Introduce a RegressorFinder class, and ignore mercurial<->git mapping errors
2019-07-24 21:01:53 +02:00
cklyyung
4ace4ef2fb
Add option to disable URL cleanup in similarity models ( #728 )
...
Fixes #725
2019-07-24 13:25:26 +02:00
Marco Castelluccio
dd96b6dd74
Transform 20000 commits at a time, as 40000 are too many
2019-07-24 12:28:43 +02:00
Marco Castelluccio
3e7c2016cd
Use feature legend in the commit classifier
2019-07-23 12:09:28 +02:00
Harshit chittora
25bb130abc
Support feature importance visualization for multiclass problems ( #662 )
...
Fixes #162
2019-07-23 11:27:52 +02:00
Marco Castelluccio
6c074fd4b5
Transform 40000 commits at a time
2019-07-23 10:50:11 +02:00
Marco Castelluccio
ab048e0a6b
Support generating mirror repositories with comments removed
2019-07-23 02:14:22 +02:00
Marco Castelluccio
fbaef0661d
Store regressor finder results in bugbug DBs and make it run only on commits which haven't been analyzed yet
2019-07-23 02:14:22 +02:00
Marco Castelluccio
413da2b87e
More logging in regressor finder script
2019-07-22 23:20:01 +02:00
Marco Castelluccio
9fd44dd19f
Use os.cpu_count() instead of multiprocessing.cpu_count()
2019-07-22 23:20:01 +02:00
Marco Castelluccio
8151b85873
Add missing f in f-string
2019-07-22 15:54:57 +02:00
Marco
77ec8b529d
Add a WIP script to find bug-introducing commits ( #748 )
...
* Install depot_tools in the commit retrieval image
* Add a WIP script to find bug-introducing commits
* Add a task which runs the bug-introducing commits finder script
2019-07-22 14:41:34 +02:00
Anurag Aggarwal
51e6a712ef
Use db.download in trainer.py instead of manually reimplementing download and decompression ( #739 )
...
Fixes #733
2019-07-22 11:42:55 +02:00
Marco Castelluccio
331aa50f1f
Use repository.clone function to clone mozilla-central instead of re-implementing the code in the script
2019-07-19 17:25:11 +02:00
Marco Castelluccio
ada890df21
Convert to string before writing to file
2019-07-15 17:43:42 +02:00
Marco
f877420959
Retrigger microannotate hook if the generation process is not fully done ( #700 )
...
* Generate an artifact specifying if the microannotate generation is fully done
* Retrigger microannotate hook if the generation process is not fully done
Fixes #652
* Update to microannotate 0.0.2
2019-07-15 14:01:56 +02:00
Marco Castelluccio
e12f4cf040
Download bugs DB when CommitModel's bug_data is set to True
...
Fixes #690
2019-07-12 10:18:55 +02:00
Marco Castelluccio
d7b7ccee74
Store most important features in a JSON file too
2019-07-10 14:57:16 +02:00
Marco Castelluccio
481917afac
Use human readable feature names when classifying too
2019-07-09 21:22:37 +02:00
Ayush Shridhar
bc6467da41
Add word2vec similarity option to evaluation script ( #678 )
2019-07-08 12:58:49 +02:00
Marco Castelluccio
22246b6a50
Download model from Taskcluster before running it
2019-07-02 21:04:23 +02:00
Marco Castelluccio
f515afef45
Support classifying a Phabricator diff
2019-07-02 19:39:59 +02:00
Ayush Shridhar
ad7fca9b65
Add neighbors_tfidf and neighbors_tfidf_bigrams to similarity evaluation script's choices ( #665 )
2019-07-02 17:01:17 +02:00
Marco Castelluccio
96ebe8777a
Add a script to classify a patch
2019-07-02 12:21:35 +02:00
Marco Castelluccio
f9f59ef863
Add functions to clone and clean mozilla-central
2019-07-02 12:21:35 +02:00
Ayush Shridhar
1793c48c0b
Support using bigrams for NearestNeighbors similarity ( #654 )
2019-07-01 18:33:49 +02:00
Boris Feld
f999b3ffdf
Track imbalance report metrics too ( #639 )
...
Fixes #619
2019-06-27 18:42:51 +02:00
Boris Feld
89bba8efca
Move log messages to stderr ( #635 )
...
As the retrieve script can output the metrics on the standard output, log
messages would pollute the output and complicate scripts that would want to
parse it. Use logging instead of passing stderr to the print statements as
it's mostly the same amount of code.
2019-06-27 10:58:07 +02:00
Marco Castelluccio
16ece06f64
Retry git operations multiple times
2019-06-27 10:25:38 +02:00
Marco Castelluccio
a3933a48a4
Don't fail if there's an error while pulling from the repo
2019-06-27 10:25:38 +02:00
Marco Castelluccio
56f224b9dc
Generate microannotate repository for mozilla-central
2019-06-26 18:57:36 +02:00
Ayush Shridhar
6788b2e33a
Make similarity script more generic and add nearest neighbors similarity with tf-idf encoding ( #628 )
2019-06-26 13:42:23 +02:00
Marco Castelluccio
bd118c58ab
Use with statement for hg.open
2019-06-26 11:45:02 +02:00
x249wang
ab28e8ace2
Use zstandard instead of xz ( #524 )
...
Fixes #461 .
2019-06-24 13:16:44 +02:00
Boris Feld
9834053a36
Start tracking training metrics as Taskcluster artifacts ( #604 )
...
Fixes #342
2019-06-22 14:18:08 -07:00