Граф коммитов

1768 Коммитов

Автор SHA1 Сообщение Дата
Marco Castelluccio 85912d8c95 Add tests for the 'labels' module 2018-11-20 11:53:41 +01:00
Marco Castelluccio 5cbd9d0846 Add scipy to requirements.txt 2018-11-20 11:32:23 +01:00
Marco Castelluccio f4a2fe9e54 Automatically extract compressed databases 2018-11-20 11:11:22 +01:00
Marco Castelluccio b80123cbb8 Remove data files from the repo and host them on another service
Former-commit-id: db43a17445659d664fc41c014e9fe2d61c98b4ba
2018-11-20 10:36:04 +01:00
Marco Castelluccio 8de0dc56d5 Move package creation and installation test to the end
Former-commit-id: 8fa8998f570b4a35566d529c328fb0f2999047ff
2018-11-20 02:22:24 +01:00
Marco Castelluccio a971892f50 Actually respect the 'augmentation' parameter
Former-commit-id: b990893f4729f139bcc1aed75f70ba2468c0845f
2018-11-20 02:22:24 +01:00
Marco Castelluccio 64d5a0c5ec Rename get_labels to get_bugbug_labels since labels can now include multiple kinds of labels
Former-commit-id: db485d1d25c50438d47932e25d94baeb1e90323b
2018-11-20 01:20:38 +01:00
Marco Castelluccio 5974f3f8f3 Move labels into a subdirectory
Former-commit-id: cfb0bbe417113fcca05afed45322702c182f1d35
2018-11-20 00:10:56 +01:00
Marco 00b53f350c Add setup.py and test building bugbug package (#1)
Former-commit-id: 0bea3bc5f1a3dff3dd3813f9ed7de6958ca99f7f
2018-11-19 22:53:17 +01:00
Marco Castelluccio 8961da1d9f Fix imbalanced-learn requirement
Former-commit-id: 178f85712b
2018-11-19 22:31:01 +01:00
Marco Castelluccio 13eed862f6 Move Python modules to a 'bugbug' subdirectory
Former-commit-id: d1db546fb0
2018-11-19 22:02:31 +01:00
Marco Castelluccio e25e83ec13 Add function to get labels for tracked/non-tracked bugs
Former-commit-id: 4c2747b56c
2018-11-16 17:58:54 +01:00
Marco Castelluccio 9d451b4502 Updated commits data
Former-commit-id: faabd81347
2018-11-16 17:58:30 +01:00
Marco Castelluccio bcc33779b9 Add commit data to bugs, but don't use it yet (doesn't improve results)
Former-commit-id: 554ae35320
2018-11-12 17:55:41 +01:00
Marco Castelluccio d72d47e604 Store commits in order from oldest to newest
Former-commit-id: 5341c28c61
2018-11-12 17:55:08 +01:00
Marco Castelluccio 0bb6403ee6 Only store information we are using
Former-commit-id: 0606d9371c
2018-11-12 17:54:43 +01:00
Marco Castelluccio 7d7af120dc Parse bug ID from the commit message and store it with the commit data
Former-commit-id: fcbb5dd362
2018-11-12 17:54:14 +01:00
Marco Castelluccio 2c3bc74546 Update requests requirement to 2.20.1
Former-commit-id: 703471476d
2018-11-12 12:57:48 +01:00
Marco Castelluccio cec52f4b32 Add more labelled bugs
Former-commit-id: ef6b2edd34
2018-11-12 12:57:34 +01:00
Marco Castelluccio a0fb4753ab Ignore extracted bugs.json and commits.json files
Former-commit-id: e546c6e13e
2018-11-12 12:56:09 +01:00
Marco Castelluccio 7401f84714 Store commit data in the repo
Former-commit-id: 74758f0ca1
2018-11-12 12:55:43 +01:00
Marco Castelluccio 1c5d89bc80 Add script to retrieve data from a Mercurial repository
Former-commit-id: 121b42a9a3
2018-11-12 12:55:27 +01:00
Marco Castelluccio f741d77717 Refactor get_bugs code into multiple modules
Former-commit-id: 466aa8446b
2018-10-12 00:46:14 +02:00
Marco Castelluccio bfec0f94c6 Add another TODO
Former-commit-id: 335cd747a8
2018-10-11 22:43:21 +02:00
Marco Castelluccio 69ac23fe50 Read bugs iteratively
Former-commit-id: 581f1bc8b3
2018-10-11 22:42:34 +02:00
Marco Castelluccio 34682521a0 Update accuracy numbers
Former-commit-id: b21257ad66
2018-10-11 20:21:57 +02:00
Marco Castelluccio 84a2b08381 No need to retrieve both keys and values to generate y
Former-commit-id: 3a20373872
2018-10-11 20:13:21 +02:00
Marco Castelluccio 887514cafa Perform augmentation directly when retrieving labels
Former-commit-id: 8d0d63403b
2018-10-11 20:12:43 +02:00
Marco Castelluccio 8492939b08 Skip labels for which we have no bug data directly in get_labels
Former-commit-id: c5075c7e84
2018-10-11 20:04:26 +02:00
Marco Castelluccio 7462bbea6b Split downloading of bugs and retrieval of bugs for training
Former-commit-id: 67c300263f
2018-10-11 19:40:53 +02:00
Marco Castelluccio d733796ea7 Update number of bugs in the dataset
Former-commit-id: 5bb22f8fc9
2018-10-11 17:29:20 +02:00
Marco Castelluccio a58d187d0d Use higher compression ratio
Former-commit-id: 6a247f0ec6
2018-10-11 17:25:29 +02:00
Marco Castelluccio b174cec6a4 Don't use MongoDB
Former-commit-id: d25ee7891f
2018-10-11 15:43:44 +02:00
Marco Castelluccio e11bac35b9 Add more labelled bugs
Former-commit-id: d63848e639
2018-10-11 14:04:41 +02:00
Marco Castelluccio 31a487cb16 Add doc2vec to the list of things to try
Former-commit-id: a1a99b8ea4
2018-10-01 02:41:34 +02:00
Marco Castelluccio 0ce2382213 Add TODO about text cleanup
Former-commit-id: 58383767fc
2018-10-01 02:41:06 +02:00
Marco Castelluccio 5f8d64aa7f Add optional lemmatization using spaCy
Former-commit-id: 5262e12663
2018-10-01 02:40:06 +02:00
Marco Castelluccio f50971f7f9 Fix flake8 issues
Former-commit-id: 3eb73e0948
2018-10-01 01:22:59 +02:00
Marco Castelluccio 7dee06e724 Refactor code in a function
Former-commit-id: bb7931dade
2018-10-01 01:20:26 +02:00
Marco Castelluccio 0301ca6376 Add more labelled bugs
Former-commit-id: 535f4c66a1
2018-09-30 21:18:58 +02:00
Marco Castelluccio 191a25ec5d Add more labelled bugs
Former-commit-id: ca5db78213
2018-09-26 16:01:27 +01:00
Marco Castelluccio 19f9a7f942 Add more labelled bugs
Former-commit-id: d56b34face
2018-09-25 00:01:32 +01:00
Marco Castelluccio 1364627cdc Update README
Former-commit-id: bbd594e67b
2018-09-24 23:13:29 +01:00
Marco Castelluccio 186f91c31c Add more labelled bugs
Former-commit-id: 1b312094c7
2018-09-24 23:09:12 +01:00
Marco Castelluccio 2df82f0a1d Set a fixed seed for the random under-sampler, so we get consistent results
Former-commit-id: da8d6fc7b5
2018-09-24 23:08:49 +01:00
Marco Castelluccio 400b04858a Small cleanup by making get_labels directly return a dict
Former-commit-id: 1995e0ccb0
2018-09-24 14:14:15 +01:00
Marco Castelluccio 540b7ebaa7 Perform under-sampling of the majority class
Former-commit-id: 8d3c7c3ba4
2018-09-24 00:11:49 +01:00
Marco Castelluccio f9c03d0f8f Add missing requirements
Former-commit-id: 884f75adfd
2018-09-24 00:11:28 +01:00
Marco Castelluccio 2e8cb9d59c Print confusion matrix
Former-commit-id: 1c2cdbda42
2018-09-24 00:10:43 +01:00
Marco Castelluccio ab5795a834 Add more labelled bugs
Former-commit-id: 882d941044
2018-09-23 20:53:02 +01:00