Fixes pylint:
```
tests/etl/test_text.py:34,0: Anomalous Unicode escape in byte string:
'\U'. String constant might be missing an r or u prefix. (W1402:
anomalous-unicode-escape-in-string)
```
Since in Python 3 they return iterators rather than lists, so if
used in contexts where iterators are not supported must first be
cast to `list`.
These cases weren't caught by pylint `dict-keys-not-iterating` and
`dict-values-not-iterating` (since it isn't able to infer the type
of anything but straightforward `dict` usages), but instead by manual
auditing.
Instead of casting `resp.json.keys()` in `test_performance_data_api.py`
the asserts have been removed, since they were duplicating the
coverage provided by the `assert` on the next line.
Python 3's `map()`, `filter()` and others now return iterators rather
than list/..., so must be cast back to a `list()` if used in contexts
where an iterator is not supported.
The runnable jobs API now fetches runnable-jobs.json if available and fallsback to
full-task-graph.json.
The new file is less than a tenth of the original file and contains the minimum
amount of data required for the endpoint.
Drop support for full-task-graph.json and 'job_type_description'.
Now that no submissions are using revision_hash, it can be removed.
This removes everything but the model field, which will be handled
later.
I've removed revision_hash from the Pulse jobs schema without bumping
the version, which wouldn't normally be ok, but no one is still using
it, and I'd rather have explicit failures later than if we left the
schema unchanged.
* Makes TaskCluster jobs independent of profiles to set Tier. We take
whatever tier we are given by TaskCluster for the job.
* Removes use of exclusion profiles for Buildbot jobs. Jobs that
should be Tier-2 and Tier-3 are hard-coded in the Treeherder code
by their job signature.
This is to allow any job type to be able to belong to any
job group. This will also mean that if someone accidentally
picked the wrong group for a job type, we don't need to
fix it in the DB for all new jobs. They can fix their task
definition, and all new jobs will go to the new job group.
This includes a management command to migrate the old
data from job_type.job_group to the new field of job.job_group.
A follow-up PR will remove the old field and set the API to
read from the job.job_group field.
It's having to be added as a platform rather than a new job/group
name, since otherwise comparisons can't be made in Perfherder with
the existing tests.
Change of new environment variable `PULSE_PUSH_SOURCES`.
Keep old `publish-resultset-runnable-job-action` task name by creating a
method that points to `publish_push_runnable_job_action`.
The new version treats whitespace slightly differently, and also
counts `concurrent.futures` as a stdlib (since it is in Python 3,
even though we have to use the `futures` package). However I'm fine
with not overriding the latter for simplicity and given we'll be
switching to Python 3 at some point in the future.
Since otherwise tests that ingest jobs that have structured error
summary logs will hit the network, causing non-deterministic test
failures, such as the failures in `test_ingest_pulse_jobs` currently
being seen on Travis (since the log in question no longer exists on
S3).
The data format for messages for GitHub are different than Mercurial. So
the message to New Relic was looking for a field in the Mercurial message
that didn't exist. This makes it check both places for the right field.
This uses the same mechanism we use for ingesting GitHub pushes.
This adds an additional Transformer for HG pushes, and requires
adding the Pulse exchange of ``exchange/hgpushes/v1`` to the
existing PULSE_RESULTSET_SOURCES environment variable.
Fixes:
tests/autoclassify/test_classify_failures.py:7:1: F401 'treeherder.model.models.TextLogErrorMetadata' imported but unused
tests/etl/test_job_loader.py:7:1: F401 'treeherder.model.models.Repository' imported but unused
tests/model/test_classified_failure.py:6:1: F401 'treeherder.model.models.FailureLine' imported but unused
tests/seta/conftest.py:2:1: F401 'django.utils.timezone' imported but unused
tests/seta/test_job_priorities.py:8:1: F401 'treeherder.seta.settings.SETA_LOW_VALUE_PRIORITY' imported but unused
tests/webapp/api/test_text_log_summary_lines.py:4:1: F401 'treeherder.model.models.TextLogError' imported but unused
treeherder/auth/backends.py:13:5: F401 'django.utils.encoding.smart_str as smart_bytes' imported but unused
treeherder/autoclassify/tasks.py:4:1: F401 'django.conf.settings' imported but unused
treeherder/autoclassify/tasks.py:6:1: F401 'treeherder.celery_app' imported but unused
treeherder/perfalert/__init__.py:1:1: F401 '.perfalert.*' imported but unused
treeherder/seta/analyze_failures.py:7:1: F401 'treeherder.etl.seta.valid_platform' imported but unused
treeherder/seta/job_priorities.py:10:1: F401 'treeherder.model.models.Repository' imported but unused
treeherder/seta/models.py:7:1: F401 'treeherder.model.models.Repository' imported but unused
The seta migrations file change is due to the seta models no longer
depending on `model` (since the unnecessary `Repository` import has
been removed).
* Bug 1330677 - Allow calling runnable jobs API without having to pass the Gecko decision task id.
This is useful if all you care about is to determine what is most up-to-date list of tasks that can be scheduled.
In the future, this will allow for determining the "current set of runnable jobs" on a schedule
(caching the latest values) rather on per API call.
* Bug 1330652 - SETA - Fix job priorities endpoint
We were not passing the project name down to the functionality that
retrieves runnable jobs, thus, using 'mozilla-inbound' by default.
This change starts using the simplified ref_data_names() method which also
takes the project name.
This also paves the way to drop Treecodes from the code.
This naming was a relic of the old datasource code we were using. For
the most part, we don't need it. Where we do need it, we should call it
what it is: a repository name.
Until now, the only way to get runnable jobs information is by
querying the Treeherder runnable jobs api.
After this change, Treeherder modules won't need to call the API
but import the same function the API calls (list_runnable_jobs()).
_load_jobs() from treeherder/model/derived/jobs.py will add a signature
hash to signatures instead of the buildername if we don't use
'reference_data_name' for the buildername.
This change does the following:
* Replace 'buildername' for 'reference_data_name'
* Replace 'b2g26_v1_2' for 'release'
* Can no longer store raw artifacts (anything treeherder doesn't understand
is ignored)
* Attempting to retrieve an artifact now returns a 405 (not allowed)
It appears that the intent of this code is to to a phrase match of the
search string against the bug summary for relevance matching. However
the code incorrectly tried to quote the string and as a result failed
to handle special characters in the AGAINST clause (e.g. + - ~ >
etc.). This change simply removes any existing quote characters from
the string and places the entire thing in quotes. Per the MySQL
documentations:
> A phrase that is enclosed within double quote (") characters
matches only rows that contain the phrase literally, as it was
typed
* Bug 1286578 - Retry job task if resultset doesn't exist
This removes the logic which creates `skeleton resultsets`
when a job is ingested that we don't have a resultset for yet.
The new approach is to fail and wait for the task to retry.
The buildbot job ingestion already skips and retries later if
it encounters a job for which it has no resultset.
This adds a similar check to the Pulse Job ingestion. If
a job comes in with a revision that doesn't have a resultset
yet, then this will raise a ValueError. That will invoke the
retryable_task actions which will wait a bit, then retry. Each
time it will wait a little longer to retry. After 9 retries it
waits something like 3900 seconds which should be plenty of time
for the resultset ingestion to complete.
* Bug 1264074 - Move to_timestamp function to a reusable location
* Bug 1264074 - Refactor JobConsumer to have a PulseConsumer super class
Much of what was in the JobConsumer is reusable by the upcoming
ResultsetConsumer. So refactor those parts out so that each specific
consumer can reuse code as much as possible.
* Bug 1264074 - Add ability to ingest Github Resultsets via Pulse
This introduces a ResultsetConsumer and a read_pulse_resultsets
management command to ingest resultsets from the TaskCluster
github exchanges.
When a supported Github repo has a Pull Request created or
updated, or a push is made to master, then it will kick off a
Pulse message. We will receive it and then fetch any additional
information we need from github's API and store the Resultset.
This follows a very similar pattern to the Job Pulse ingestion.
* Bug 1264074 - Old code/comments cleanup
* Bug 1264074 - Tests for the Github resultset pulse loader
errorsummary logs containing failure lines are, for historical reasons,
not supplied as normal logs, but are in the jobInfo property of TC
submissions. Therefore in order to be processed as for buildbot we need
to extract this data from there and add it to the other log data.
* Bug 1275589 - Prune runnable jobs
Prior to this fix, the list of runnable jobs would be pruned only when
the normal cycle-data process ran. This is not fast enough, however.
Now, defunct buildernames are removed every time the runnable jobs are
updated.
* test fix
* fix
Remove JOB_TYPE_BUILDERNAME and extract_job_type(), as neither are used
anywhere.
The relevant test components have also been removed from test_buildbot.py.
These jobs have been added under the new Release group in the "other" platform:
- Version Bump
- Checksums Builder
- Uptake Monitor
- Updates
- Bouncer Aliases
- Bouncer Submission
- Update Verify (moved from Updates group)
Since the jobs do not contain platform info in buildname, nor are
associated with a specific os platform, the regex captures all
of the jobs individually to assign it the "other" os_platform.
Some repos are longer-lived and do not yet have the Task Cluster
code that allows them to submit tasks with a revision. They only
have the older code to submit revision_hash. This prevents the
jobs from being ingested via Pulse. This commit adds support
for revision_hash until a time when it's no longer needed.