Marco Castelluccio
3eee2f8c7a
Use utils.download_model for downloading models in the HTTP service instead of reimplementing it
...
Fixes #1242
2020-03-01 21:33:18 +01:00
Marco Castelluccio
289fc0a755
Install http_service with pip quietly and without caching in the integration test script
2020-03-01 21:07:01 +01:00
Marco Castelluccio
7da13fe1ce
In the integration test, make the HTTP service workers reuse the already cloned repository
...
Fixes #1334
2020-02-29 15:40:04 +01:00
Marco Castelluccio
2af1bc2672
Make sure we take into account some test scheduling history without using it for training for group-level too
...
To properly calculate the failure statistics also for the
first failures in the first part of the history used for
training.
2020-02-29 11:27:06 +01:00
Bastien Abadie
791c16ebe0
Misc fixes for the HTTP service and the integration tests ( #1332 )
...
* Simplify handling of HTTP service directory where to download models and correct installation of http_service package in the integration test
* Log http worker boot step and allow missing DB
* Retry for 10 minutes to allow the worker boot to finish
* Add more logging and wait more time
* Wait 30 seconds between requests as a workaround for https://github.com/mozilla/bugbug/issues/1340
Co-authored-by: Marco Castelluccio <mcastelluccio@mozilla.com>
2020-02-28 21:59:47 +01:00
Marco Castelluccio
9491e9da42
Use the test label scheduling DB for now for the TestFailure model and in the commit classifier
...
Regression from ba6c358ba7
It was fixed in fe74b9b480
for the TestSelect model
2020-02-28 11:56:11 +01:00
Bastien Abadie
0eb7f91a23
Use the new bugbug_http module, fixing tests and docker build
2020-02-28 10:49:40 +01:00
Marco Castelluccio
f3507ca1d5
Find the 4 months old commit by using the push date instead of git's --until
...
The old way was missing some commits in some cases (it was stopping at the first which
was committed on that given day).
The new way might return more commits than necessary, but then the tester will automatically
discard them based on their commit dates.
2020-02-27 17:30:44 +01:00
Marco Castelluccio
c59d40863b
Don't pass HEAD to the MethodDefectPredictor tester script, but an actual hash
...
pydriller has a bug when using HEAD (https://github.com/ishepard/pydriller/issues/90 )
2020-02-27 14:25:10 +01:00
Marco Castelluccio
c227806c04
Update to a newer version of MethodDefectPredictor
...
It fixes some issues with the newest version of pydriller
2020-02-27 14:25:10 +01:00
Marco Castelluccio
282b25871f
Compress push_data.json files right after generating them
2020-02-26 10:45:45 +01:00
Marco Castelluccio
faed859eaa
Setup adr configuration directly in the Docker image
...
Otherwise we'd have to load adr after writing the config file.
2020-02-25 23:52:51 +01:00
Marco Castelluccio
2b41571263
Fix import of mozci
...
Regression from 48ccedb28e
2020-02-25 17:52:58 +01:00
Marco Castelluccio
34c931372d
Correct logging of failing regressor finder analysis
...
We were using enumerate, but we can't be sure of the order of future completion.
2020-02-25 12:39:16 +01:00
Marco Castelluccio
48ccedb28e
Implement gathering push data directly with adr and mozci instead of relying on ci-recipes
2020-02-24 14:38:15 +01:00
Marco Castelluccio
9d0a24f705
Upload bug-introducing commits DB periodically every hour instead of every 250 iterations
2020-02-24 14:33:58 +01:00
Marco Castelluccio
093dd38419
Assert ci-recipes run successfully
2020-02-24 00:34:19 +01:00
Marco Castelluccio
057744530e
Update to latest ci-recipes (with mozci 1.2.5)
2020-02-23 20:12:27 +01:00
Marco Castelluccio
48c22e35c3
Use patch to apply patches to the git repo instead of 'git apply' which fails more easily
...
Also avoid writing a temporary file with the patch contents, input it via stdin.
2020-02-21 16:15:11 +01:00
Marco Castelluccio
05cb631f8d
Don't try to checkout the git repo to a base revision if it is tip
2020-02-21 16:13:48 +01:00
Marco Castelluccio
88dbd77f73
Define two separate model classes for label-level and group-level test selection
...
Instead of using a 'granularity' argument to choose between them.
This fits better with the rest of the architecture which relies on the model name.
2020-02-21 10:36:44 +01:00
Marco Castelluccio
fe74b9b480
Make it possible to train a TestSelect model on group-level test history
...
Fixes #1125
2020-02-20 17:06:07 +01:00
Marco Castelluccio
89ecb8f527
Import pydriller only for finding bug-introducing commits
...
To avoid issues in the bug-fixing commits task where git is not installed.
Regression from 7c81a5ece9
2020-02-19 16:11:17 +01:00
Marco Castelluccio
e8fc1aad8c
Don't write the same commits multiple times in the commits to ignore DB
2020-02-19 16:09:48 +01:00
Marco Castelluccio
ba6c358ba7
Support generating a test scheduling history DB for group granularity instead of just label granularity
...
And introduce a new task that generates a group-level test scheduling history DB.
Part of #1125
2020-02-19 13:15:08 +01:00
Marco Castelluccio
7c81a5ece9
Split regressor finder task in four separate tasks
...
A task to find commits to ignore, a task to classify commits between
bug-fixing vs not-bug-fixing, a task to find regressors using the
normal repo, and a task to find regressors using the tokenized
repo.
This way we can also find regressors for the two kinds of repos
in parallel.
Fixes #1273
Make the past bugs by function task depend on the task to classify
commits between bug-fixing and not-bug-fixing.
Fixes #1274
2020-02-19 12:25:18 +01:00
Marco Castelluccio
b83eb8048a
Skip some modifications that are certainly not useful for regressor finding (and slow down blaming)
2020-02-18 13:19:05 +01:00
Marco Castelluccio
b3e59739bf
Upload bug-introducing DB every 250 iterations rather than every 500
2020-02-18 13:19:05 +01:00
Marco Castelluccio
8aa9b85d68
Log exceptions happening while finding regressors as soon as detected
2020-02-18 13:19:05 +01:00
Marco Castelluccio
6b0d778e58
Log the number of workers used to find regressors
2020-02-18 13:19:05 +01:00
Marco Castelluccio
8afd7f44a6
Update to latest ci-recipes
2020-02-17 12:14:32 +01:00
Marco Castelluccio
6114a57bbe
Upload bug-introducing DB every 500 iterations rather than every 1000
2020-02-17 10:58:01 +01:00
Marco Castelluccio
049c3d2c5c
Update ADR cache URL to point to S3 instead of Taskcluster artifact
2020-02-16 02:50:51 +01:00
Marco Castelluccio
611bbadc44
Remove useless support_files argument usage
2020-02-15 20:17:20 +01:00
Marco Castelluccio
49e57e66a3
Upload ADR cache periodically while the task runs
...
This way we don't lose the work done so far if the (very long running) task fails
for unknown reasons.
2020-02-15 20:16:26 +01:00
Marco Castelluccio
77fa140e60
Update to latest ci-recipes
2020-02-14 19:21:38 +01:00
Marco Castelluccio
d9160d2fb8
Make pydriller initialize repository when initializing the threads
...
Since pydriller is changing a git configuration, it is acquiring a file lock.
So, we need to initialize the git repositories for the different threads
right away and behind a lock.
2020-02-14 00:01:13 +01:00
Marco Castelluccio
5be6d82a0b
Update to latest ci-recipes (using 'fixed by commit' data)
2020-02-13 17:59:03 +01:00
Marco Castelluccio
9f5d7bb2f7
Instead on relying on the pushdate, re-analyze a few recent commits for finding commits to ignore
2020-02-13 17:09:43 +01:00
Marco Castelluccio
b3da6498e9
Update URL of the regressor-related DBs
2020-02-13 13:23:01 +01:00
Marco Castelluccio
b71799deac
Go back to checking the version in the microannotate generator script, now that we can run tasks for longer and we upload intermediate results to S3
2020-02-13 13:21:41 +01:00
Marco Castelluccio
e38fc1e1e9
Update URL of the microannotate DB
2020-02-13 13:20:53 +01:00
Marco Castelluccio
9915ec4574
Cancel all pending futures to find regressors when one fails
2020-02-13 13:19:15 +01:00
Marco Castelluccio
b00d9d1577
Use ThreadPoolExecutorResult to make sure we fail if cloning fails
2020-02-13 13:18:24 +01:00
Marco Castelluccio
a9ab2acf70
Add some logging when cloning and updating git repositories
2020-02-12 21:58:19 +01:00
Marco Castelluccio
52a5a7335c
Make microanntate generator task upload the DB version to S3
...
This way, even if the task times out, we don't lose the work done so far.
N.B.: We'll need to also update the DB URLs, when the bugbug S3 bucket is made public.
2020-02-12 13:43:54 +01:00
Marco Castelluccio
caa10dc2e8
Upload regressor finder results to S3 periodically while the task is running
...
This way, even if the task times out, we don't lose the work done so far.
N.B.: We'll need to also update the DB URLs, when the bugbug S3 bucket is made public.
2020-02-12 13:42:21 +01:00
Marco Castelluccio
e838553c15
Add missing function calls
...
Regression from 7a1d2457ef
2020-02-12 10:04:54 +01:00
Marco Castelluccio
118870fdfd
Update to latest ci-recipes
2020-02-11 21:43:38 +01:00
Marco Castelluccio
26aca814be
Run push_data recipe to retrieve test scheduling info both at the label- and at the group- level
...
As part of this, update to latest ci-recipes, including a plethora of changes and improvements.
Part of #1125
2020-02-11 16:33:21 +01:00