The MPL 2.0 terms state that as long as a LICENSE file is present, the
per-file header text is not required. See "Exhibit A" at the end of:
https://www.mozilla.org/MPL/2.0/
This test checks that:
* log content that falls in-between two step markers is captured in a
dummy unnamed step.
* if the final step is missing the "step finish" marker, we still save/
update the step and append any step errors to all_errors.
Bug 1060339 made check_errors always be true, since we want to parse all
logs, not just those for failing jobs. As such, we have no use for
check_errors.
Previously the tests were created after act["logurl"] was deleted, so
the resultant expected output .json file was missing the "logurl" key.
The json import was also missing - people now just have to uncomment the
test creation block, and not also add the missing import each time.
Created using |isort -p tests -rc .| and a couple of manual tweaks.
The order is:
* futures
* std library
* third party packages
* local imports
* relative local imports
...with each group ordered with "import x" before "from x import y", and
then alphabetically.
* Simplify logic in talos parser (there was an optimization which didn't
save anything and just caused confusion before)
* Make it so if log parsing fails for a non-http reason, we don't try
again
urllib isn't handling the unicode found in some log lines correctly,
whereas requests does. This prevents UnicodeEncodeError exceptions when
making the request to the bugscache API to find the bug suggestions for
these log lines.
The sample notes added for the test normally have the same timestamp for
several notes, but not always. With the previous `ORDER BY`, this meant
the list of notes retrieved could vary in order depending on if the
timestamps were identical. We now additionally sort by id (descending,
to match the timestamp sort), so the returned list is deterministic.
Since bug 1140349, the objectstore endpoint has been deprecated, and
performs the same function as the jobs endpoint. Now that there are no
remaining submitters to it, let's remove it.
To avoid continually attempting to re-ingest them, thereby reducing
task runtime and database load.
In order to make this behaviour easier to test, the
pending/running/build4hr jobs process run() method now returns True if
new jobs were loaded, and False otherwise. This method was used instead
of calling the transformer mixins from the test directly, since the test
would then have had to reimplement much of the run() method anyway.
After the previous commit, the Objectstore is effectively "dead code".
So this commit removes all the dead code after anything left over in
the Objectstore has been drained and added to the DB.
dj-database-url extracts DB host, port, username, password and database
name from the env variable 'DATABASE_URL' (unless another env variable
name is specified). If the env variable is not defined, it falls back to
the default passed to dj_database_url.config().
This means for Heroku and similar we can replace the multiple DB env
variables with just one URL for default & one for read_only.
This also effectively makes the setting of the read only DB variable
mandatory for stage/production/heroku, since DEFAULT_DATABASE_URL won't
be valid for them - so prevents us inadvertently not using the read only
DB.
The deployment script also had to be updated, so that we set the
prod/stage-specific environment variables before using manage.py, since
dj-database-url cannot rely on what's in the stage/prod local.py config
(which isn't a bad thing, since we're deprecating that file).
This patch upgrades the version stored in the requirements file and fixes some issues introduced by breaking changes in the new version of the library:
- Writable nested fields are not available anymore, you need an explicit create method on the serializer to write a nested field.
- ModelViewSet now requires serializer_class and queryset attributes.
- @action and @link decorators are now replaced by either @detail_route or @list_route.
- any attempt to create a ModelSerializer instance with an attribute which type is either dict or list will raise an exception.
Since we use Celery for queueing job ingestion, the objectstore is
now irrelevant. This code is the first step. This will bypass
the Objectstore and ingest jobs directly to our ``jobs`` database.
Phase 2 is to remove all the Objectstore code (in a later commit)
Phase 3 is to delete the Objectstore databases and related fields in
other tables.
It appears that on occasion we parse a log more than once, which
resulted in duplicate performance series going into the database.
Let's be resilient about this by not inserting duplicate jobs into the
database (we always attach a unique job id to every datapoint, so there's
no chance of accidentally removing entries which happen to have the same
performance numbers)
Having the ability to use different DB hosts for each project sounded
like a good idea, but in reality, we have no need for it.
This switches us to using the global read-write and read-only database
host names rather than the fields on the datasource table. As such, the
'host', 'read_only_host' and 'type' (eg 'mysql') fields can be removed.
The Django model had a unique_together on host+name, so we now need to
make 'name' (ie database name) a unique key on it's own.
In addition, this removes the 'creation_date' field, since we don't use
it anywhere, and we can just look at the commit history to see when a
repo was created. (I imagine it may have had more use if we actually had
started partitioning the databases uses the old 'dataset' count field).
In a future bug, I'll remove the redundant substitution of 'engine' for
'InnoDB' in the template schema, given that engine is now always InnoDB
in create_db().
Since otherwise we get access denied errors using run_sql on Heroku.
All other calls use datasource, so have already been set up to pass the
SSL options.
dj-database-url extracts DB host, port, username, password and database
name from the env variable 'DATABASE_URL' (unless another env variable
name is specified). If the env variable is not defined, it falls back to
the default passed to dj_database_url.config().
This means for Heroku and similar we can replace the multiple DB env
variables with just one URL for default & one for read_only.
This also effectively makes the setting of the read only DB variable
mandatory for stage/production/heroku, since DEFAULT_DATABASE_URL won't
be valid for them - so prevents us inadvertently not using the read only
DB.
Before this is deployed, we'll need to update the stage/prod puppet
configs & Heroku settings to add the new environment variable.
* Put # of runs directly beside average/geomeans in UI, and put a dotted
line underneath to make it easier to pull up tooltip
* Use a bootstrap tooltip for displaying run information (clearer)
* Use a bootstrap abbreviation to make it more clear what low/med/high
confidence actually means
...when referring to the datetime that a classification was made. This
avoids confusion in ElasticsearchDocRequest, since previously we had
two similarly named variables: 'job_data["submit_timestamp"]' and
'self.submit_timestamp', the former referring to the time the job was
scheduled, the latter to the time the classification was submitted.
Since it's unused, and hg.mozilla.org now has this information available
via its API.
Note: This commit depends on bug 1178719, to prevent issues during
deployment. Also, due to https://code.djangoproject.com/ticket/25036 a
migrate will need to be run interactively after deployment, to clean up
the old repositoryversion content type.
Since otherwise we may end up with interactive prompts.
Note: When using call_command() we instead have to use 'interactive'
instead of 'noinput' due to https://code.djangoproject.com/ticket/22985,
which is only fixed in Django 1.8+.
The datasource table has a 'dataset' field, to allow for multiple
datasources of the same type (for partitioning; eg the "1" in
`mozilla-central_jobs_1`). However we have never used it, so let's just
remove it.
Before if two celery jobs were updating the same series, one would overwrite
the other because the locking code did not actually work (it just always
unconditonally got a new lock without checking if anything was using it).
This fixes that.
* Using machine history is now optional (perfherder doesn't track it, and
we don't think we need it)
* Performance datums, analyze_t take keyword arguments to make API more
intuitive
* Various other minor updates to make code easier to understand
* Create a proper setup.py so it can eventually be distributed on pypi
(and installed locally meanwhile)
* Make relevant tests run along with the rest of treeherder-service's tests
* Remove dashboard, old graphserver business logic
The fixtures path changed from:
webapp/test/mock/*
...to:
tests/ui/mock/*
However basePath in the Karma config was also changed from webapp/
to the repo root.
This reverts commit e71e781565.
This commit caused pending and running jobs to be put into the objectstore.
This causes their completed versions not to be ingested.
This introduces two new ways to generate ``Bug suggestions`` artifacts from
a ``text_log_summary`` artifact
1. POST a ``text_log_summary`` on the ``/artifact`` endpoint
2. POST a ``text_log_summary`` with a job on the ``/jobs`` endpoint.
Both of these cases will schedule an asynchronous task to generate the
``Bug suggestions`` artifact with ``celery``.
Artifact generation scenarios:
JobCollections
^^^^^^^^^^^^^^
Via the ``/jobs`` endpoint:
1. Submit a Log URL with no ``parse_status`` or ``parse_status`` set to "pending"
* This will generate ``text_log_summary`` and ``Bug suggestions`` artifacts
* Current *Buildbot* workflow
2. Submit a Log URL with ``parse_status`` set to "parsed" and a ``text_log_summary`` artifact
* Will generate a ``Bug suggestions`` artifact only
* Desired future state of *Task Cluster*
3. Submit a Log URL with ``parse_status`` of "parsed", with ``text_log_summary`` and ``Bug suggestions`` artifacts
* Will generate nothing
ArtifactCollections
^^^^^^^^^^^^^^^^^^^
Via the ``/artifact`` endpoint:
1. Submit a ``text_log_summary`` artifact
* Will generate a ``Bug suggestions`` artifact if it does not already exist for that job.
2. Submit ``text_log_summary`` and ``Bug suggestions`` artifacts
* Will generate nothing
* This is *Treeherder's* current internal log parser workflow
As part of merging the UI repo into this one, the following directory
moves were performed:
webapp/app/ -> ui/
webapp/test/ -> tests/ui/
webapp/config/ -> tests/ui/config/
webapp/scripts/ -> tests/ui/scripts/
webapp/scripts/web-server.js -> web-server.js
* Create a generic TreeherderClient class
* Add a single method called `post_collection` which takes care of all
details of validation, submitting stuff and raising errors
* Also add a new update_parse_status method, for updating status (replaces
manual calls to post information on raw TreeherderRequest)
The directory is empty apart from a .gitkeep, since it only exists to
house jstd.log, which is output by watchr.rb. I was going to move the
directory around in bug 1056877, but let's just delete it and move the
log file one directory higher up.
We were previously calling them OS X 10.8 for aesthetics in TBPL,
however it can cause confusion with developers. In addition, in
Treeherder the platform is not just used in the UI, but for downstream
analysis, so using the incorrect platform has more severe consequences.
This supports ingesting job ``log_references`` that have a
``parse_status`` value. This is so that external tools can submit jobs
that don’t require our internal log parsing. They will then submit
their own log summary artifact.
This commit changes all the references to 'builds-4h' to 'buildbot_text'. Following
are the changed files along with the no. of occurences that have been changed in
each.
1. tests/etl/test_buildapi.py: 3 occurences
2. tests/sample_data/job_data.txt: 304 occurences
3. treeherder/etl/buildapi.py: 1 occurence
4. treeherder/model/sample_data/job_data.json.sample: 2 occurences
In treeherder/webapp/api/logslice.py, a conditional was removed. The todo above
it instructed it to be removed once this bug was addressed.
To make sure all tests run properly, three files were renamed. Only the portion
of the filename that said 'builds-4h' was changed to say 'buildbot_text'.
The 'treeherder-service' repo has been renamed to 'treeherder', ready
for when the treeherder-ui repo is imported into it. This means the
Github URL, Travis URL and directory name when cloned changes. The Read
The Docs URL cannot be changed, so for now we will leave as-is, and in
the future (once service and UI docs combined) we will create a new
project on RTD with name "treeherder".
This updates doc links and puppet/Vagrant configs, but leaves the
stage/prod deploy script alone, since renaming the directories on our
infra is non-trivial. The dev instance will need some TLC since unlike
stage/prod, it does use the puppet scripts in the repo.
Unlike the Pulse publishing, the code for consuming data from Pulse is
unused, being a leftover from initial attempts to ingest buildbot data
via pulse, rather than builds-{4hr,running,pending}.js
We're submitting to Elasticsearch (used by OrangeFactor), not directly
to OrangeFactor, so "Elasticsearch" is more appropriate. The use of
"Bug" in the name makes it sound like we're submitting a bug, which
we're not, we're submitting a bug comment (or ES doc) which contains a
number of different fields, in response to a classification entry made a
sheriff iff the classification included a bug number, which is slightly
different, and too nuanced to include in the name.
As such whilst not perfect, I think this is slightly clearer:
s/OrangeFactorRequest/ElasticsearchDocRequest/
and
s/BugzillaBugRequest/BugzillaCommentRequest/
Now that the job names have been made more consistent by bug 740142, we
can simplify our regex again :-)
This is a direct revert of the last three hunks in:
d7abe14635
...plus appropriate updates to the job names in the tests.
For debugging & also for when filing new intermittent failure bugs, it
is useful to see which search terms were extracted from a log failure
line, and used to query the bugscache for bug suggestions. In the future
this could be used by an intermittent bug filer to verify the bug
summary contained the term extracted for failures of that type.
* Use a text column instead of varchar for storing series property (since the
subtest signatures can be quite long). Also stop indexing it.
* Update query for getting series signatures to not use a coallation (which
also has a size limit by default)
The treeherder client is in the vendor directory, however that doesn't
get added to the sys.path until settings/base.py is loaded, so defer the
import until we need it.
Currently if a TinderboxPrint line contains a space in the link title,
eg in 'hazard results' here:
TinderboxPrint: hazard results: https://ftp-ssl.mozilla.org/...
...then we ingest it as content_type 'raw_html' rather than 'link'.
test_resultset_api.py:294:5: F841 local variable 'email' is assigned to but never used
test_resultset_api.py:305:5: F841 local variable 'resp' is assigned to but never used
Fixes:
tests/model/derived/test_jobs_model.py:343:23: E712 comparison to False should be 'if cond is False:' or 'if not cond:'
treeherder/__init__.py:11:1: E731 do not assign a lambda expression, use a def
treeherder/etl/buildapi.py:107:16: E713 test for membership should be 'not in'
treeherder/log_parser/utils.py:183:33: W503 line break before binary operator
treeherder/model/derived/base.py:73:12: E713 test for membership should be 'not in'
treeherder/model/derived/base.py:82:12: E713 test for membership should be 'not in'
treeherder/model/derived/jobs.py:1998:26: W503 line break before binary operator
Generated using:
autopep8 --in-place --recursive --aggressive --aggressive
--max-line-length 999 --exclude='.git,__pycache__,.vagrant,build,vendor,
0001_initial.py,models.py,test_note_api.py,test_bug_job_map_api.py' .
autopep8's aggressive mode, unlike standard mode, makes non-whitespace
changes. It also uses lib2to3 to correct deprecated code (W690), some of
which aren't pep8 failures. Some of these changes are more dubious, but
rather than disable W690 completely, I've just excluded the files where
the unwanted changes would have been made, so we can benefit from the
rest.
Tests that are not aimed at the jobs API should not be dependant on the
order of jobs returned by get_job_list().
* test_tbpl.py does not even need to use get_job_list() since the only
accessed property is the job_id, which we are better off hard-coding.
* test_note_apy.py should use the job_id found earlier in the test,
rather than hard-coding a wrong value.
* In test_bug_job_map_api.py, there is no ORDER BY clause for the stored
get_bug_job_map_list query. The current test only happens to pass
since the bug_job_map table currently uses the InnoDB engine, which
default to the order of the primary key. Were our test environment and
production bug_job_map tables to use different engines, the behaviour
would silently change, so it seems wrong for the test to give the
illusion of a guaranteed order. If in the future we wanted to give
such a guarantee, we should add an ORDER BY to the
get_bug_job_map_list query & update the test accordingly.
Bug 1097090 combined get_job_list and get_job_list_full, but the two
queries were actually subtly different. The former had an ORDER BY
push_timestamp, which was lost when they were combined. This means jobs
displayed in the similar jobs panel are from the past, and not the most
recent jobs of the same type.
The get_job_list query also sorted on platform, however I don't believe
this is necessary, so I've not added it back in here.
The sample config now points at the production service API to make
the first-run experience for new contributors easier. However the tests
use the sample config as part of the test run, so this breaks them.
However since the sample config is now optional, we can just not use it
at all during the tests to fix the failures.
To avoid logs with excessive number of lines that match the error regex
from taking up too much space in the DB & also making the API response
and thus UI unwieldy, we cap the number of error lines at 100 per step
of the job. The 'errors_truncated' property can be used by the UI to
indicate that the error lines are only a subset of the total failures.
Generated using:
autopep8 --in-place --recursive .
Before:
$ pep8 | wc -l
1686
After:
$ pep8 | wc -l
57
A later autopep8 run will be performed using --aggressive, which makes
non-whitespace changes too.