Marco Castelluccio
|
85912d8c95
|
Add tests for the 'labels' module
|
2018-11-20 11:53:41 +01:00 |
Marco Castelluccio
|
5cbd9d0846
|
Add scipy to requirements.txt
|
2018-11-20 11:32:23 +01:00 |
Marco Castelluccio
|
f4a2fe9e54
|
Automatically extract compressed databases
|
2018-11-20 11:11:22 +01:00 |
Marco Castelluccio
|
b80123cbb8
|
Remove data files from the repo and host them on another service
Former-commit-id: db43a17445659d664fc41c014e9fe2d61c98b4ba
|
2018-11-20 10:36:04 +01:00 |
Marco Castelluccio
|
8de0dc56d5
|
Move package creation and installation test to the end
Former-commit-id: 8fa8998f570b4a35566d529c328fb0f2999047ff
|
2018-11-20 02:22:24 +01:00 |
Marco Castelluccio
|
a971892f50
|
Actually respect the 'augmentation' parameter
Former-commit-id: b990893f4729f139bcc1aed75f70ba2468c0845f
|
2018-11-20 02:22:24 +01:00 |
Marco Castelluccio
|
64d5a0c5ec
|
Rename get_labels to get_bugbug_labels since labels can now include multiple kinds of labels
Former-commit-id: db485d1d25c50438d47932e25d94baeb1e90323b
|
2018-11-20 01:20:38 +01:00 |
Marco Castelluccio
|
5974f3f8f3
|
Move labels into a subdirectory
Former-commit-id: cfb0bbe417113fcca05afed45322702c182f1d35
|
2018-11-20 00:10:56 +01:00 |
Marco
|
00b53f350c
|
Add setup.py and test building bugbug package (#1)
Former-commit-id: 0bea3bc5f1a3dff3dd3813f9ed7de6958ca99f7f
|
2018-11-19 22:53:17 +01:00 |
Marco Castelluccio
|
8961da1d9f
|
Fix imbalanced-learn requirement
Former-commit-id: 178f85712b
|
2018-11-19 22:31:01 +01:00 |
Marco Castelluccio
|
13eed862f6
|
Move Python modules to a 'bugbug' subdirectory
Former-commit-id: d1db546fb0
|
2018-11-19 22:02:31 +01:00 |
Marco Castelluccio
|
e25e83ec13
|
Add function to get labels for tracked/non-tracked bugs
Former-commit-id: 4c2747b56c
|
2018-11-16 17:58:54 +01:00 |
Marco Castelluccio
|
9d451b4502
|
Updated commits data
Former-commit-id: faabd81347
|
2018-11-16 17:58:30 +01:00 |
Marco Castelluccio
|
bcc33779b9
|
Add commit data to bugs, but don't use it yet (doesn't improve results)
Former-commit-id: 554ae35320
|
2018-11-12 17:55:41 +01:00 |
Marco Castelluccio
|
d72d47e604
|
Store commits in order from oldest to newest
Former-commit-id: 5341c28c61
|
2018-11-12 17:55:08 +01:00 |
Marco Castelluccio
|
0bb6403ee6
|
Only store information we are using
Former-commit-id: 0606d9371c
|
2018-11-12 17:54:43 +01:00 |
Marco Castelluccio
|
7d7af120dc
|
Parse bug ID from the commit message and store it with the commit data
Former-commit-id: fcbb5dd362
|
2018-11-12 17:54:14 +01:00 |
Marco Castelluccio
|
2c3bc74546
|
Update requests requirement to 2.20.1
Former-commit-id: 703471476d
|
2018-11-12 12:57:48 +01:00 |
Marco Castelluccio
|
cec52f4b32
|
Add more labelled bugs
Former-commit-id: ef6b2edd34
|
2018-11-12 12:57:34 +01:00 |
Marco Castelluccio
|
a0fb4753ab
|
Ignore extracted bugs.json and commits.json files
Former-commit-id: e546c6e13e
|
2018-11-12 12:56:09 +01:00 |
Marco Castelluccio
|
7401f84714
|
Store commit data in the repo
Former-commit-id: 74758f0ca1
|
2018-11-12 12:55:43 +01:00 |
Marco Castelluccio
|
1c5d89bc80
|
Add script to retrieve data from a Mercurial repository
Former-commit-id: 121b42a9a3
|
2018-11-12 12:55:27 +01:00 |
Marco Castelluccio
|
f741d77717
|
Refactor get_bugs code into multiple modules
Former-commit-id: 466aa8446b
|
2018-10-12 00:46:14 +02:00 |
Marco Castelluccio
|
bfec0f94c6
|
Add another TODO
Former-commit-id: 335cd747a8
|
2018-10-11 22:43:21 +02:00 |
Marco Castelluccio
|
69ac23fe50
|
Read bugs iteratively
Former-commit-id: 581f1bc8b3
|
2018-10-11 22:42:34 +02:00 |
Marco Castelluccio
|
34682521a0
|
Update accuracy numbers
Former-commit-id: b21257ad66
|
2018-10-11 20:21:57 +02:00 |
Marco Castelluccio
|
84a2b08381
|
No need to retrieve both keys and values to generate y
Former-commit-id: 3a20373872
|
2018-10-11 20:13:21 +02:00 |
Marco Castelluccio
|
887514cafa
|
Perform augmentation directly when retrieving labels
Former-commit-id: 8d0d63403b
|
2018-10-11 20:12:43 +02:00 |
Marco Castelluccio
|
8492939b08
|
Skip labels for which we have no bug data directly in get_labels
Former-commit-id: c5075c7e84
|
2018-10-11 20:04:26 +02:00 |
Marco Castelluccio
|
7462bbea6b
|
Split downloading of bugs and retrieval of bugs for training
Former-commit-id: 67c300263f
|
2018-10-11 19:40:53 +02:00 |
Marco Castelluccio
|
d733796ea7
|
Update number of bugs in the dataset
Former-commit-id: 5bb22f8fc9
|
2018-10-11 17:29:20 +02:00 |
Marco Castelluccio
|
a58d187d0d
|
Use higher compression ratio
Former-commit-id: 6a247f0ec6
|
2018-10-11 17:25:29 +02:00 |
Marco Castelluccio
|
b174cec6a4
|
Don't use MongoDB
Former-commit-id: d25ee7891f
|
2018-10-11 15:43:44 +02:00 |
Marco Castelluccio
|
e11bac35b9
|
Add more labelled bugs
Former-commit-id: d63848e639
|
2018-10-11 14:04:41 +02:00 |
Marco Castelluccio
|
31a487cb16
|
Add doc2vec to the list of things to try
Former-commit-id: a1a99b8ea4
|
2018-10-01 02:41:34 +02:00 |
Marco Castelluccio
|
0ce2382213
|
Add TODO about text cleanup
Former-commit-id: 58383767fc
|
2018-10-01 02:41:06 +02:00 |
Marco Castelluccio
|
5f8d64aa7f
|
Add optional lemmatization using spaCy
Former-commit-id: 5262e12663
|
2018-10-01 02:40:06 +02:00 |
Marco Castelluccio
|
f50971f7f9
|
Fix flake8 issues
Former-commit-id: 3eb73e0948
|
2018-10-01 01:22:59 +02:00 |
Marco Castelluccio
|
7dee06e724
|
Refactor code in a function
Former-commit-id: bb7931dade
|
2018-10-01 01:20:26 +02:00 |
Marco Castelluccio
|
0301ca6376
|
Add more labelled bugs
Former-commit-id: 535f4c66a1
|
2018-09-30 21:18:58 +02:00 |
Marco Castelluccio
|
191a25ec5d
|
Add more labelled bugs
Former-commit-id: ca5db78213
|
2018-09-26 16:01:27 +01:00 |
Marco Castelluccio
|
19f9a7f942
|
Add more labelled bugs
Former-commit-id: d56b34face
|
2018-09-25 00:01:32 +01:00 |
Marco Castelluccio
|
1364627cdc
|
Update README
Former-commit-id: bbd594e67b
|
2018-09-24 23:13:29 +01:00 |
Marco Castelluccio
|
186f91c31c
|
Add more labelled bugs
Former-commit-id: 1b312094c7
|
2018-09-24 23:09:12 +01:00 |
Marco Castelluccio
|
2df82f0a1d
|
Set a fixed seed for the random under-sampler, so we get consistent results
Former-commit-id: da8d6fc7b5
|
2018-09-24 23:08:49 +01:00 |
Marco Castelluccio
|
400b04858a
|
Small cleanup by making get_labels directly return a dict
Former-commit-id: 1995e0ccb0
|
2018-09-24 14:14:15 +01:00 |
Marco Castelluccio
|
540b7ebaa7
|
Perform under-sampling of the majority class
Former-commit-id: 8d3c7c3ba4
|
2018-09-24 00:11:49 +01:00 |
Marco Castelluccio
|
f9c03d0f8f
|
Add missing requirements
Former-commit-id: 884f75adfd
|
2018-09-24 00:11:28 +01:00 |
Marco Castelluccio
|
2e8cb9d59c
|
Print confusion matrix
Former-commit-id: 1c2cdbda42
|
2018-09-24 00:10:43 +01:00 |
Marco Castelluccio
|
ab5795a834
|
Add more labelled bugs
Former-commit-id: 882d941044
|
2018-09-23 20:53:02 +01:00 |