Occasionally failing build/test runs can fail in such a way that results
in a significant amount of log spam and therefore log files that are
hundreds of MB in size each. This can cause log parsing backlogs,
particularly when many jobs on the same push fail in such a way.
The log parser now checks the `Content-Length` of log files prior to
streaming them, and skips the download/parse if it exceeds the set
threshold. The frontend has been adjusted to display an appropriate
message explaining why the parsed log is not available.
The threshold has been set to 5MB, since:
* the 99th percentile of download size on New Relic was ~2.8MB:
https://insights.newrelic.com/accounts/677903/dashboards/339080
* `Content-Length` is the size of the log prior to decompression, and
the chronic logspam cases have been known to have compression ratios
of 20-50x, which would translate to an uncompressed size limit of
up to 250MB (which is already much larger than buildbot's former 50MB
uncompressed size limit).
Previously any exceptions raised whilst loading the expected output JSON
fixtures were suppressed, which made debugging the Python 3 test failures
harder than needs be.
The reason failures were suppressed was to allow the test to continue far
enough that the actual output could be saved to the fixture when creating
new tests. However reordering `do_test()` has the same effect without the
need for the `load_exp()` try-except handling.
Since the `request` package's `iter_lines()` returns bytes by default.
We cannot pass `decode_unicode=True` to `iter_lines()` since it splits on
Unicode newlines (which can exist in test message output), so instead we
manually `.decode()` each line before parsing it.
Fixes the following exception under Python 3:
`TypeError: a bytes-like object is required, not 'str'`
The test utility `load_exp()` had to be modified to no longer use append
mode when opening the expected output JSON files, in order to fix:
`json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)`
Now that autophone and AWFY have migrated to Taskcluster, there are
no more submitters of jobs to the REST API (confirmed via New Relic
Insights). As such, this deprecated data ingestion method can now be
removed, along with support for API Hawk auth, API POST throttling
and `treeherder-client` job submission capability.
After this lands we'll need to manually drop the `credentials` table.
To avoid unnecessary dependencies, and use a more conventional
django-rest-framework testing approach:
http://www.django-rest-framework.org/api-guide/testing/#apiclient
APIClient has a few API differences:
* `.json` -> `.json()`
* `.status_int` -> `.status_code`
* Get parameters are passed as keyword argument `data` not `params`
* The default hostname is `http://testserver` not `http://localhost`
* Additional HTTP headers are passed directly as keyword arguments,
rather than nested under a `headers` property.
* It doesn't check the status code itself, so explicit checks are
required, along with removing `expect_errors`.
* The `.post_json()` and `.put_json()` methods don't exist.
See also the docs for the Django test client (which APIClient wraps):
https://docs.djangoproject.com/en/1.11/topics/testing/tools/#the-test-client
Whilst making these changes, I also cleaned up the session fetching
in `test_auth.py` and `test_backends.py`, and added a `status_code`
check in `conftest.py`'s `mock_post_json()` - which makes the root
cause of test failures clearer.
Since after bug 1400069 it is no longer used by the UI.
This removes everything but the Job model field (since the table is
large enough that migrations need to be carefully coordinated, and we
can batch up that change with others).
This is to allow any job type to be able to belong to any
job group. This will also mean that if someone accidentally
picked the wrong group for a job type, we don't need to
fix it in the DB for all new jobs. They can fix their task
definition, and all new jobs will go to the new job group.
This includes a management command to migrate the old
data from job_type.job_group to the new field of job.job_group.
A follow-up PR will remove the old field and set the API to
read from the job.job_group field.
Change of new environment variable `PULSE_PUSH_SOURCES`.
Keep old `publish-resultset-runnable-job-action` task name by creating a
method that points to `publish_push_runnable_job_action`.
Doing so also fixes incorrect line numbers in logs that have Windows
line endings (bug 1328880), hence having to update the expected logview
output.
The tests have to be adjusted to use responses, since requests doesn't
support `file://` URLs.
This import only affects internal treeherder usage, people using the
PyPI package import from the `thclient` subdirectory instead.
Fixes:
treeherder/client/__init__.py:1:1: F401 '.thclient.*' imported but unused
* Can no longer store raw artifacts (anything treeherder doesn't understand
is ignored)
* Attempting to retrieve an artifact now returns a 405 (not allowed)
And use `SITE_URL` rather than requiring the separate environment
variables just for requests to Treeherder's own API. (Thereby reducing
the number of environment variables I have to toggle back and forth for
the Heroku migration).
New resultsets will still store a value in their ``revision_hash`` field, but it will
just be the same value as their ``long_revision`` field.
This will log an exception in New Relic when a new resultset or job is posted
to the API with only a ``revision_hash``and not a ``revision`` value.
This also switches to using the longer 40 char revisions along side the
12 char revisions. But we leverage the longer ones for most actions. The
short revisions are stored and used so that people and the UI can support
locating a resultset (or setting ranges) with short revisions.
The `auth` parameter was intended to receive either a `TreeherderAuth`
instance (for OAuth credentials), or a `HawkAuth` instance. The former
no longer exists since OAuth support has been removed, and the latter is
not necessary, since the client now supports simply passing `client_id`
and `secret` to the `TreeherderClient` constructor, as a simpler way of
specifying the Hawk credentials. As such, the `auth` parameter is
superfluous and can be removed.
I added a create_credentials command to help setting up the initial
development environment. The puppet setup now creates a new user and set
it as the owner of the treeherder-etl credentials.
It's unused in the UI and doesn't add any value, since we're
normally much more interested in the push_timestamp.
This can land without causing errors, since the field is DEFAULT NULL,
so can be dropped at our leisure later.
They were never used for anything and take up a lot of space. They
also don't fit into the new performance model we're working on. We
can always bring back the useful bits later.
The MPL 2.0 terms state that as long as a LICENSE file is present, the
per-file header text is not required. See "Exhibit A" at the end of:
https://www.mozilla.org/MPL/2.0/
Created using |isort -p tests -rc .| and a couple of manual tweaks.
The order is:
* futures
* std library
* third party packages
* local imports
* relative local imports
...with each group ordered with "import x" before "from x import y", and
then alphabetically.
After the previous commit, the Objectstore is effectively "dead code".
So this commit removes all the dead code after anything left over in
the Objectstore has been drained and added to the DB.
* Create a generic TreeherderClient class
* Add a single method called `post_collection` which takes care of all
details of validation, submitting stuff and raising errors
* Also add a new update_parse_status method, for updating status (replaces
manual calls to post information on raw TreeherderRequest)
Generated using:
autopep8 --in-place --recursive .
Before:
$ pep8 | wc -l
1686
After:
$ pep8 | wc -l
57
A later autopep8 run will be performed using --aggressive, which makes
non-whitespace changes too.