Граф коммитов

336 Коммитов

Автор SHA1 Сообщение Дата
Marco Castelluccio 3eee2f8c7a Use utils.download_model for downloading models in the HTTP service instead of reimplementing it
Fixes #1242
2020-03-01 21:33:18 +01:00
Marco Castelluccio 289fc0a755 Install http_service with pip quietly and without caching in the integration test script 2020-03-01 21:07:01 +01:00
Marco Castelluccio 7da13fe1ce In the integration test, make the HTTP service workers reuse the already cloned repository
Fixes #1334
2020-02-29 15:40:04 +01:00
Marco Castelluccio 2af1bc2672 Make sure we take into account some test scheduling history without using it for training for group-level too
To properly calculate the failure statistics also for the
first failures in the first part of the history used for
training.
2020-02-29 11:27:06 +01:00
Bastien Abadie 791c16ebe0
Misc fixes for the HTTP service and the integration tests (#1332)
* Simplify handling of HTTP service directory where to download models and correct installation of http_service package in the integration test

* Log http worker boot step and allow missing DB

* Retry for 10 minutes to allow the worker boot to finish

* Add more logging and wait more time

* Wait 30 seconds between requests as a workaround for https://github.com/mozilla/bugbug/issues/1340

Co-authored-by: Marco Castelluccio <mcastelluccio@mozilla.com>
2020-02-28 21:59:47 +01:00
Marco Castelluccio 9491e9da42 Use the test label scheduling DB for now for the TestFailure model and in the commit classifier
Regression from ba6c358ba7

It was fixed in fe74b9b480 for the TestSelect model
2020-02-28 11:56:11 +01:00
Bastien Abadie 0eb7f91a23 Use the new bugbug_http module, fixing tests and docker build 2020-02-28 10:49:40 +01:00
Marco Castelluccio f3507ca1d5 Find the 4 months old commit by using the push date instead of git's --until
The old way was missing some commits in some cases (it was stopping at the first which
was committed on that given day).
The new way might return more commits than necessary, but then the tester will automatically
discard them based on their commit dates.
2020-02-27 17:30:44 +01:00
Marco Castelluccio c59d40863b Don't pass HEAD to the MethodDefectPredictor tester script, but an actual hash
pydriller has a bug when using HEAD (https://github.com/ishepard/pydriller/issues/90)
2020-02-27 14:25:10 +01:00
Marco Castelluccio c227806c04 Update to a newer version of MethodDefectPredictor
It fixes some issues with the newest version of pydriller
2020-02-27 14:25:10 +01:00
Marco Castelluccio 282b25871f Compress push_data.json files right after generating them 2020-02-26 10:45:45 +01:00
Marco Castelluccio faed859eaa Setup adr configuration directly in the Docker image
Otherwise we'd have to load adr after writing the config file.
2020-02-25 23:52:51 +01:00
Marco Castelluccio 2b41571263 Fix import of mozci
Regression from 48ccedb28e
2020-02-25 17:52:58 +01:00
Marco Castelluccio 34c931372d Correct logging of failing regressor finder analysis
We were using enumerate, but we can't be sure of the order of future completion.
2020-02-25 12:39:16 +01:00
Marco Castelluccio 48ccedb28e Implement gathering push data directly with adr and mozci instead of relying on ci-recipes 2020-02-24 14:38:15 +01:00
Marco Castelluccio 9d0a24f705 Upload bug-introducing commits DB periodically every hour instead of every 250 iterations 2020-02-24 14:33:58 +01:00
Marco Castelluccio 093dd38419 Assert ci-recipes run successfully 2020-02-24 00:34:19 +01:00
Marco Castelluccio 057744530e Update to latest ci-recipes (with mozci 1.2.5) 2020-02-23 20:12:27 +01:00
Marco Castelluccio 48c22e35c3 Use patch to apply patches to the git repo instead of 'git apply' which fails more easily
Also avoid writing a temporary file with the patch contents, input it via stdin.
2020-02-21 16:15:11 +01:00
Marco Castelluccio 05cb631f8d Don't try to checkout the git repo to a base revision if it is tip 2020-02-21 16:13:48 +01:00
Marco Castelluccio 88dbd77f73 Define two separate model classes for label-level and group-level test selection
Instead of using a 'granularity' argument to choose between them.
This fits better with the rest of the architecture which relies on the model name.
2020-02-21 10:36:44 +01:00
Marco Castelluccio fe74b9b480 Make it possible to train a TestSelect model on group-level test history
Fixes #1125
2020-02-20 17:06:07 +01:00
Marco Castelluccio 89ecb8f527 Import pydriller only for finding bug-introducing commits
To avoid issues in the bug-fixing commits task where git is not installed.

Regression from 7c81a5ece9
2020-02-19 16:11:17 +01:00
Marco Castelluccio e8fc1aad8c Don't write the same commits multiple times in the commits to ignore DB 2020-02-19 16:09:48 +01:00
Marco Castelluccio ba6c358ba7 Support generating a test scheduling history DB for group granularity instead of just label granularity
And introduce a new task that generates a group-level test scheduling history DB.

Part of #1125
2020-02-19 13:15:08 +01:00
Marco Castelluccio 7c81a5ece9 Split regressor finder task in four separate tasks
A task to find commits to ignore, a task to classify commits between
bug-fixing vs not-bug-fixing, a task to find regressors using the
normal repo, and a task to find regressors using the tokenized
repo.

This way we can also find regressors for the two kinds of repos
in parallel.

Fixes #1273

Make the past bugs by function task depend on the task to classify
commits between bug-fixing and not-bug-fixing.

Fixes #1274
2020-02-19 12:25:18 +01:00
Marco Castelluccio b83eb8048a Skip some modifications that are certainly not useful for regressor finding (and slow down blaming) 2020-02-18 13:19:05 +01:00
Marco Castelluccio b3e59739bf Upload bug-introducing DB every 250 iterations rather than every 500 2020-02-18 13:19:05 +01:00
Marco Castelluccio 8aa9b85d68 Log exceptions happening while finding regressors as soon as detected 2020-02-18 13:19:05 +01:00
Marco Castelluccio 6b0d778e58 Log the number of workers used to find regressors 2020-02-18 13:19:05 +01:00
Marco Castelluccio 8afd7f44a6 Update to latest ci-recipes 2020-02-17 12:14:32 +01:00
Marco Castelluccio 6114a57bbe Upload bug-introducing DB every 500 iterations rather than every 1000 2020-02-17 10:58:01 +01:00
Marco Castelluccio 049c3d2c5c Update ADR cache URL to point to S3 instead of Taskcluster artifact 2020-02-16 02:50:51 +01:00
Marco Castelluccio 611bbadc44 Remove useless support_files argument usage 2020-02-15 20:17:20 +01:00
Marco Castelluccio 49e57e66a3 Upload ADR cache periodically while the task runs
This way we don't lose the work done so far if the (very long running) task fails
for unknown reasons.
2020-02-15 20:16:26 +01:00
Marco Castelluccio 77fa140e60 Update to latest ci-recipes 2020-02-14 19:21:38 +01:00
Marco Castelluccio d9160d2fb8 Make pydriller initialize repository when initializing the threads
Since pydriller is changing a git configuration, it is acquiring a file lock.
So, we need to initialize the git repositories for the different threads
right away and behind a lock.
2020-02-14 00:01:13 +01:00
Marco Castelluccio 5be6d82a0b Update to latest ci-recipes (using 'fixed by commit' data) 2020-02-13 17:59:03 +01:00
Marco Castelluccio 9f5d7bb2f7 Instead on relying on the pushdate, re-analyze a few recent commits for finding commits to ignore 2020-02-13 17:09:43 +01:00
Marco Castelluccio b3da6498e9 Update URL of the regressor-related DBs 2020-02-13 13:23:01 +01:00
Marco Castelluccio b71799deac Go back to checking the version in the microannotate generator script, now that we can run tasks for longer and we upload intermediate results to S3 2020-02-13 13:21:41 +01:00
Marco Castelluccio e38fc1e1e9 Update URL of the microannotate DB 2020-02-13 13:20:53 +01:00
Marco Castelluccio 9915ec4574 Cancel all pending futures to find regressors when one fails 2020-02-13 13:19:15 +01:00
Marco Castelluccio b00d9d1577 Use ThreadPoolExecutorResult to make sure we fail if cloning fails 2020-02-13 13:18:24 +01:00
Marco Castelluccio a9ab2acf70 Add some logging when cloning and updating git repositories 2020-02-12 21:58:19 +01:00
Marco Castelluccio 52a5a7335c Make microanntate generator task upload the DB version to S3
This way, even if the task times out, we don't lose the work done so far.
N.B.: We'll need to also update the DB URLs, when the bugbug S3 bucket is made public.
2020-02-12 13:43:54 +01:00
Marco Castelluccio caa10dc2e8 Upload regressor finder results to S3 periodically while the task is running
This way, even if the task times out, we don't lose the work done so far.
N.B.: We'll need to also update the DB URLs, when the bugbug S3 bucket is made public.
2020-02-12 13:42:21 +01:00
Marco Castelluccio e838553c15 Add missing function calls
Regression from 7a1d2457ef
2020-02-12 10:04:54 +01:00
Marco Castelluccio 118870fdfd Update to latest ci-recipes 2020-02-11 21:43:38 +01:00
Marco Castelluccio 26aca814be Run push_data recipe to retrieve test scheduling info both at the label- and at the group- level
As part of this, update to latest ci-recipes, including a plethora of changes and improvements.

Part of #1125
2020-02-11 16:33:21 +01:00