New resultsets will still store a value in their ``revision_hash`` field, but it will
just be the same value as their ``long_revision`` field.
This will log an exception in New Relic when a new resultset or job is posted
to the API with only a ``revision_hash``and not a ``revision`` value.
This also switches to using the longer 40 char revisions along side the
12 char revisions. But we leverage the longer ones for most actions. The
short revisions are stored and used so that people and the UI can support
locating a resultset (or setting ranges) with short revisions.
Vagrant now supports a snapshots feature which makes this obsolete:
https://www.vagrantup.com/docs/cli/snapshot.html
By removing the 'scratch' Vm config, we avoid confusion console messages
when performing Vagrant commands, eg:
[~/src/treeherder]$ vagrant provision
==> default: Running provisioner: puppet...
...
==> default: Notice: Finished catalog run in 14.75 seconds
==> scratch: VM not created. Moving on...
* Recommend using the same `client_id` for stage/prod.
* Mention the need to file a bug (and where) for requesting approval.
* Explain prod approval needs submission to be working on stage first.
Avoids this redirect seen in prod gunicorn logs:
[05/Jan/2016:05:55:42 -0800] "GET /docs HTTP/1.1" 301 -
"http://treeherder.readthedocs.org/retrieving_data.html"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:46.0) Gecko/20100101 Firefox/46.0"
Also remove the duplication between the two pages, by having the
submitting data section not mention requesting credentials at all, and
leave that to the common tasks page instead.
If the worker is not running, any `apply_async()` calls are silently
thrown away, due to `ingest_push`'s use of `CELERY_ALWAYS_EAGER` and:
https://github.com/celery/celery/issues/2910
As such, running the worker after ingest_push doesn't help (since the
rabbitmq queues are empty) and so if people are interested in perf/log
data, then they must start the worker first instead.
In update.py, the line outputting revision.txt has to be moved later,
since the `dist/` directory won't exist until grunt build has run. In
addition, since `grunt build` removes the entire `dist/` directory, we
no longer need to manually remove *.gz.
We use the `--production` options for both `npm install` and
`grunt build`, so that the `devDependencies` in package.json are
ignored, and we only install/load the ones listed under `dependencies`
in package.json - since that's all that is required for the build.
We have to use `./node_modules/.bin/grunt` rather than `grunt`, since
grunt-cli is not installed globally on the treeherder admin machine for
greater isolation between stage and production.
Whilst the packages listed in package.json are pinned to exact versions,
they will have their own dependencies, which may be specified via
version ranges. In order to make production/local behaviour more
deterministic, these can be pinned too, using `npm shrinkwrap`.
However the stock shrinkwrap command has a few deficiencies, so we're
using a wrapper around it:
https://github.com/uber/npm-shrinkwrap
Note: Only packages listed under `dependencies` will be shrinkwrapped,
not those under `devDependencies`. This is because using the `--dev`
option (which would include the dev packages in npm-shrinkwrap.json)
means there would then be no way to way to exclude the dev packages when
installing in production.
For more information about shrinkwrap in general, see:
https://docs.npmjs.com/cli/shrinkwraphttp://tilomitra.com/why-you-should-use-npm-shrinkwrap/https://nodejs.org/en/blog/npm/managing-node-js-dependencies-with-shrinkwrap/
And also:
* Explain the process for request/approval on stage/prod
* Remove the unnecessary export_project_credentials step in the
"add a new repository" section, since Treeherder's ETL no longer uses
credentials.json.
Since with the new per-user Hawk credentials, the same auth object can
be used for the whole session, so should just be passed when
instantiating TreeherderClient.
Since they're not specific to the Django app 'webapp'.
Whilst we're there, the local & example settings files have been
renamed. In the future I'd like to combine settings_local.example.py
with puppet/files/treeherder/local.vagrant.py, but I'll do that in
another bug.
Heroku now generates it on deploy, and for stage/prod we generate it
fresh on the stage/prod branch and force push each time. As such, we
have no need for the directory on master, and by removing it we avoid
confusion when new contributors grep the repo.
As an added bonus, the stage/prod deploy script should fail if the dist
directory is missing, so the grunt build cannot be forgotten prior to
deploying. (Currently if it's forgotten, we end up deploying the ancient
dist directory from master that was last updated prior to us switching
to the new deployment strategy.)
This documentation instructs a user on how to setup their local
machine to ingest data from existing exchanges as well as posting
to their own to test their jobs.
The MPL 2.0 terms state that as long as a LICENSE file is present, the
per-file header text is not required. See "Exhibit A" at the end of:
https://www.mozilla.org/MPL/2.0/
Since it can be performed whilst `vagrant up` is running, and whilst
modifying the hosts file is not necessary for running the tests, most
people will still want to do it.
Created using |isort -p tests -rc .| and a couple of manual tweaks.
The order is:
* futures
* std library
* third party packages
* local imports
* relative local imports
...with each group ordered with "import x" before "from x import y", and
then alphabetically.
* Reinforce push must be < 4 hours old
* s/mozilla-central/mozilla-inbound/ (better example, since mozilla-inbound is more likely to have pushes less than 4 hours old)
Since bug 1140349, the objectstore endpoint has been deprecated, and
performs the same function as the jobs endpoint. Now that there are no
remaining submitters to it, let's remove it.
After the previous commit, the Objectstore is effectively "dead code".
So this commit removes all the dead code after anything left over in
the Objectstore has been drained and added to the DB.
This adds the ability to specify a custom log name and have the log
viewer use the ``logname`` param of the ``text_log_summary`` to get the
right log.
This also improves the error message returned by the /logslice/ API if a
log name is used that is not found.
Since it only speeds up parsing by a few percent of total runtime, and
is therefore not worth the added complexity for deployment and local
hack-test-debug cycles when working on the log parser.
The .gitignore and update.py entries will be removed in a later commit,
once the stage/prod src directories have been cleaned up.
In order that we can serve the UI on Heroku, we wrap the Django wsgi app
with WhiteNoise, so both the UI and API requests are served by gunicorn.
In the Vagrant environment, Apache has been removed and Varnish instead
now proxies all requests to gunicorn/Django runserver directly, without
Apache as a go-between.
The UI on production will not be affected by this commit, since the
Apache config there will still intercept requests for the UI assets
rather than proxying them to gunicorn.
It's worth noting too, that we're not able to make use of WhiteNoise's
automatic Django GZip/caching support since that assumes we are using
Django templates and referring to resources using {% static "foo.css" %}
However, we can sub-class WhiteNoise (or more specifically the
DjangoWhiteNoise class) and override the is_immutable_file() method to
add caching support at a later date:
http://whitenoise.evans.io/en/latest/base.html#caching-headers
Documentation for WhiteNoise can be found at:
http://whitenoise.evans.io/
The Varnish config uses |return (pass)| unconditionally, so never
caches anything, so there's no need to suggest restarting it after
making changes to the UI.
It's sometimes useful to be able to spin up an additional Vagrant
environment without affecting the first. To do this, we create two named
machines, both identical (since they inherit the main Vagrantfile
configuration) - one called default (to match the machine name used up
until now, so we don't force people to recreate their existing VMs when
we land this) and one called scratch.
The former is set to be the primary, so that single-machine commands
(eg `vagrant ssh`) work without having to append the machine name every
time.
The scratch machine has autostart set to false, so that it does not spin
up without explicitly using: `vagrant up scratch`
The name `scratch` is entirely arbitrary, and we can add additional
temporary machine names later, if people wish to have more than two
environments saved simultaneously.
For more information, see:
http://docs.vagrantup.com/v2/multi-machine/
These steps are already performed by puppet on provision, so don't need
to be performed again manually. They are mentioned on the "common tasks"
docs page if people need to run them again later.
Since gunicorn is only running on port 8000 inside the VM, it's then
proxied via varnish/apache and accessible on port 80 outside the VM, so
mentioning port 8000 is more confusing than helpful.
We're no longer using the in-repo puppet config in production, nor would I
recommend anyone do so again in the future. In addition, copying and pasting
default configs into the docs isn't great, since they are a pain to maintain.
Let's just remove this from the docs.
One of the few differences between vagrant.pp and production.pp is from
which directory the UI is served. This change adds the ability to toggle
the directory from the Vagantfile directly, thereby bringing
production.pp (and the duplication it brings) closer to being obsolete.
Instead of always displaying the directory file listing, if index.html
is present we serve that instead. This not only fixes the remaining
broken URLs when using web-server.js, but also means people can click
the "Starting web server at <URL>" link in the console and immediately
see Treeherder, without having to then manually navigate to index.html
themselves.
This introduces two new ways to generate ``Bug suggestions`` artifacts from
a ``text_log_summary`` artifact
1. POST a ``text_log_summary`` on the ``/artifact`` endpoint
2. POST a ``text_log_summary`` with a job on the ``/jobs`` endpoint.
Both of these cases will schedule an asynchronous task to generate the
``Bug suggestions`` artifact with ``celery``.
Artifact generation scenarios:
JobCollections
^^^^^^^^^^^^^^
Via the ``/jobs`` endpoint:
1. Submit a Log URL with no ``parse_status`` or ``parse_status`` set to "pending"
* This will generate ``text_log_summary`` and ``Bug suggestions`` artifacts
* Current *Buildbot* workflow
2. Submit a Log URL with ``parse_status`` set to "parsed" and a ``text_log_summary`` artifact
* Will generate a ``Bug suggestions`` artifact only
* Desired future state of *Task Cluster*
3. Submit a Log URL with ``parse_status`` of "parsed", with ``text_log_summary`` and ``Bug suggestions`` artifacts
* Will generate nothing
ArtifactCollections
^^^^^^^^^^^^^^^^^^^
Via the ``/artifact`` endpoint:
1. Submit a ``text_log_summary`` artifact
* Will generate a ``Bug suggestions`` artifact if it does not already exist for that job.
2. Submit ``text_log_summary`` and ``Bug suggestions`` artifacts
* Will generate nothing
* This is *Treeherder's* current internal log parser workflow
As part of merging the UI repo into this one, the following directory
moves were performed:
webapp/app/ -> ui/
webapp/test/ -> tests/ui/
webapp/config/ -> tests/ui/config/
webapp/scripts/ -> tests/ui/scripts/
webapp/scripts/web-server.js -> web-server.js
The UI docs are now being built at the same time as the main service
docs. Until we combine them, let's at least make sure the UI parts are
still discoverable, by adding them to the main table of contents.
* Create a generic TreeherderClient class
* Add a single method called `post_collection` which takes care of all
details of validation, submitting stuff and raising errors
* Also add a new update_parse_status method, for updating status (replaces
manual calls to post information on raw TreeherderRequest)
The 'treeherder-service' repo has been renamed to 'treeherder', ready
for when the treeherder-ui repo is imported into it. This means the
Github URL, Travis URL and directory name when cloned changes. The Read
The Docs URL cannot be changed, so for now we will leave as-is, and in
the future (once service and UI docs combined) we will create a new
project on RTD with name "treeherder".
This updates doc links and puppet/Vagrant configs, but leaves the
stage/prod deploy script alone, since renaming the directories on our
infra is non-trivial. The dev instance will need some TLC since unlike
stage/prod, it does use the puppet scripts in the repo.
At some point in the future, we may break the "Running the tests"
section out to its own file, but for now "Common Tasks" seems like a
better home than the installation instructions.
We're not currently using socketio - and if we start doing so in the
future we'll likely want to update to a newer version/adjust the
implementation anyway. Removing the dependencies from common.txt speeds
up the pip install on Travis. The old files will still be in version
control should we wish to refer to them :-)
Generated using:
autopep8 --in-place --recursive .
Before:
$ pep8 | wc -l
1686
After:
$ pep8 | wc -l
57
A later autopep8 run will be performed using --aggressive, which makes
non-whitespace changes too.
We only pass it in some cases, so setting it to anything other than the
default will break all the places where we don't use it. Django doesn't
require it is set, since it will fall back to the default port for that
engine type anyway.
* Separates out the steps required for running the tests, from those for
setting up a local instance.
* The running ingestion tasks step now explains what they are, that the
API server must be running already, and how to ingest just a single
revision for testing.
* The log parser compile step is moved inline, so it's harder to forget.