Граф коммитов

79 Коммитов

Автор SHA1 Сообщение Дата
Marco Castelluccio 73633e2e00 Make DB version handling saner to use
Fixing a problem where the retriever tasks would always write a version file
containing the version of the old DB.
2019-07-26 23:37:46 +02:00
Marco Castelluccio 60a979be9d Store commits to ignore in a bugbug DB and generate them progressively
In the future, we will be able to get commits to ignore directly from the normal commits DB
generated by bugbug/repositor.py.
2019-07-26 18:14:57 +02:00
Marco Castelluccio 15fe853f58 Remove unneeded parentheses 2019-07-26 15:23:14 +02:00
Marco Castelluccio 633a2c4d3c Ingore commits that are not in the VCS map
This means they are older than "Free the lizard" (3b56a9af51519d2e77e05efa672a13e6be2e9ebc).
2019-07-26 11:57:26 +02:00
Marco Castelluccio c42b1cc7bb Don't skip commits with no mirror in the tokenized repo when analyzing the normal repo 2019-07-25 18:53:23 +02:00
Marco Castelluccio 99263864fd Revert "Increase version number of the DBs, as we changed the format"
This reverts commit cb5c96fd48.
2019-07-25 18:09:13 +02:00
Marco Castelluccio cb5c96fd48 Increase version number of the DBs, as we changed the format 2019-07-25 16:08:29 +02:00
Marco Castelluccio f0da3b5b21 Clone tokenized git repository too 2019-07-25 10:38:06 +02:00
Marco Castelluccio de212602fc Add a method to evaluate the SZZ results comparing them with the regressed-by information
Fixes #772
2019-07-25 01:49:20 +02:00
Marco Castelluccio d0786502d3 Download bugs which caused regressions fixed by commits too
Required for #772
2019-07-25 01:07:08 +02:00
Marco Castelluccio a614d34735 Move download of bugs linked to commits in the bug-retriever script
Also, make the bug-retriever task depend on the commit-retriever one, making the
download of bugs linked to commits actually work :)
2019-07-25 01:05:25 +02:00
Marco Castelluccio 2dc1a17a75 Remove outdated comment 2019-07-24 22:16:22 +02:00
Marco Castelluccio 697a5c8189 Only store mercurial revisions
Users of the resulting DBs will take care of converting to git if they need
2019-07-24 22:15:04 +02:00
Marco Castelluccio 22d73e3637 Apply regressor finder also on the microannotated repository with comments removed
Fixes #627
2019-07-24 22:15:04 +02:00
Marco Castelluccio 839ebf8fcf Make git repo URL a parameter, so we can find regressors using different git repositories 2019-07-24 21:01:53 +02:00
Marco Castelluccio f972646819 Introduce a RegressorFinder class, and ignore mercurial<->git mapping errors 2019-07-24 21:01:53 +02:00
cklyyung 4ace4ef2fb Add option to disable URL cleanup in similarity models (#728)
Fixes #725
2019-07-24 13:25:26 +02:00
Marco Castelluccio dd96b6dd74 Transform 20000 commits at a time, as 40000 are too many 2019-07-24 12:28:43 +02:00
Marco Castelluccio 3e7c2016cd Use feature legend in the commit classifier 2019-07-23 12:09:28 +02:00
Harshit chittora 25bb130abc Support feature importance visualization for multiclass problems (#662)
Fixes #162
2019-07-23 11:27:52 +02:00
Marco Castelluccio 6c074fd4b5 Transform 40000 commits at a time 2019-07-23 10:50:11 +02:00
Marco Castelluccio ab048e0a6b Support generating mirror repositories with comments removed 2019-07-23 02:14:22 +02:00
Marco Castelluccio fbaef0661d Store regressor finder results in bugbug DBs and make it run only on commits which haven't been analyzed yet 2019-07-23 02:14:22 +02:00
Marco Castelluccio 413da2b87e More logging in regressor finder script 2019-07-22 23:20:01 +02:00
Marco Castelluccio 9fd44dd19f Use os.cpu_count() instead of multiprocessing.cpu_count() 2019-07-22 23:20:01 +02:00
Marco Castelluccio 8151b85873 Add missing f in f-string 2019-07-22 15:54:57 +02:00
Marco 77ec8b529d
Add a WIP script to find bug-introducing commits (#748)
* Install depot_tools in the commit retrieval image

* Add a WIP script to find bug-introducing commits

* Add a task which runs the bug-introducing commits finder script
2019-07-22 14:41:34 +02:00
Anurag Aggarwal 51e6a712ef Use db.download in trainer.py instead of manually reimplementing download and decompression (#739)
Fixes #733
2019-07-22 11:42:55 +02:00
Marco Castelluccio 331aa50f1f Use repository.clone function to clone mozilla-central instead of re-implementing the code in the script 2019-07-19 17:25:11 +02:00
Marco Castelluccio ada890df21 Convert to string before writing to file 2019-07-15 17:43:42 +02:00
Marco f877420959
Retrigger microannotate hook if the generation process is not fully done (#700)
* Generate an artifact specifying if the microannotate generation is fully done

* Retrigger microannotate hook if the generation process is not fully done

Fixes #652

* Update to microannotate 0.0.2
2019-07-15 14:01:56 +02:00
Marco Castelluccio e12f4cf040 Download bugs DB when CommitModel's bug_data is set to True
Fixes #690
2019-07-12 10:18:55 +02:00
Marco Castelluccio d7b7ccee74 Store most important features in a JSON file too 2019-07-10 14:57:16 +02:00
Marco Castelluccio 481917afac Use human readable feature names when classifying too 2019-07-09 21:22:37 +02:00
Ayush Shridhar bc6467da41 Add word2vec similarity option to evaluation script (#678) 2019-07-08 12:58:49 +02:00
Marco Castelluccio 22246b6a50 Download model from Taskcluster before running it 2019-07-02 21:04:23 +02:00
Marco Castelluccio f515afef45 Support classifying a Phabricator diff 2019-07-02 19:39:59 +02:00
Ayush Shridhar ad7fca9b65 Add neighbors_tfidf and neighbors_tfidf_bigrams to similarity evaluation script's choices (#665) 2019-07-02 17:01:17 +02:00
Marco Castelluccio 96ebe8777a Add a script to classify a patch 2019-07-02 12:21:35 +02:00
Marco Castelluccio f9f59ef863 Add functions to clone and clean mozilla-central 2019-07-02 12:21:35 +02:00
Ayush Shridhar 1793c48c0b Support using bigrams for NearestNeighbors similarity (#654) 2019-07-01 18:33:49 +02:00
Boris Feld f999b3ffdf Track imbalance report metrics too (#639)
Fixes #619
2019-06-27 18:42:51 +02:00
Boris Feld 89bba8efca Move log messages to stderr (#635)
As the retrieve script can output the metrics on the standard output, log
messages would pollute the output and complicate scripts that would want to
parse it. Use logging instead of passing stderr to the print statements as
it's mostly the same amount of code.
2019-06-27 10:58:07 +02:00
Marco Castelluccio 16ece06f64 Retry git operations multiple times 2019-06-27 10:25:38 +02:00
Marco Castelluccio a3933a48a4 Don't fail if there's an error while pulling from the repo 2019-06-27 10:25:38 +02:00
Marco Castelluccio 56f224b9dc Generate microannotate repository for mozilla-central 2019-06-26 18:57:36 +02:00
Ayush Shridhar 6788b2e33a Make similarity script more generic and add nearest neighbors similarity with tf-idf encoding (#628) 2019-06-26 13:42:23 +02:00
Marco Castelluccio bd118c58ab Use with statement for hg.open 2019-06-26 11:45:02 +02:00
x249wang ab28e8ace2 Use zstandard instead of xz (#524)
Fixes #461.
2019-06-24 13:16:44 +02:00
Boris Feld 9834053a36 Start tracking training metrics as Taskcluster artifacts (#604)
Fixes #342
2019-06-22 14:18:08 -07:00