Now that Treeherder's data ingestion process doesn't hit it's own API:
* `./manage.py runserver` is less susceptible to memory issues.
* The runserver/gunicorn process doesn't need to be running whilst the
data ingestion takes place.
Since it's only used for local testing, so there's no need to populate
description or set an owner. (They can always be added afterwards via
the Django admin UI).
Prior to this the command would fail if the provided owner email address
did not correspond to a valid user, which meant the added hassle of
creating a user prior, if one did not already exist.
This makes it harder to inadvertently use HTTPS with local Vagrant
hostnames, as well as reduces the number of config variables users of
the client have to keep track of.
The docs have been tweaked to encourage people using production
Treeherder to just omit the `server_url` argument entirely, which
reduces the boilerplate, and also means they'll be less affected by
changes in the future.
Rename ``ingest_from_pulse`` management command to ``read_pulse_jobs`` to
indicate that this step does not actually do any ingesting. It just populates
the celery queue ``store_pulse_jobs`` that DOES do the actual ingesting.
Since several people have not had logging set up when trying to debug
issues using the client/API, so instead of a helpful error message (eg
reminding them to sync their clock for Hawk auth), they get:
`No handlers could be found for logger "thclient.client"`
We're soon going to start blacklisting some default scripting User
Agents, to try and make accidental API abuse easier to trace back to the
source.
The docs are being updated in advance, so that the newsgroup posts have
the real pages available to link to.
For content that's not specifically related to only one of submitting or
retrieving data from the Treeherder API, to avoid duplication.
Ideally the submitting/retrieving sections would be nested under this
new REST API section, however there isn't a way to get Sphinx to do this
that doesn't then mis-display the subheadings:
http://stackoverflow.com/questions/25276415/prevent-sub-section-nesting-in-python-sphinx-when-using-toctree
Since npm 3 (which ships with nodejs 5+) fixes many of the previous
issues, making the third-party npm-shrinkwrap tool redundant (the tool
is also not compatible with npm 3).
New resultsets will still store a value in their ``revision_hash`` field, but it will
just be the same value as their ``long_revision`` field.
This will log an exception in New Relic when a new resultset or job is posted
to the API with only a ``revision_hash``and not a ``revision`` value.
This also switches to using the longer 40 char revisions along side the
12 char revisions. But we leverage the longer ones for most actions. The
short revisions are stored and used so that people and the UI can support
locating a resultset (or setting ranges) with short revisions.
Vagrant now supports a snapshots feature which makes this obsolete:
https://www.vagrantup.com/docs/cli/snapshot.html
By removing the 'scratch' Vm config, we avoid confusion console messages
when performing Vagrant commands, eg:
[~/src/treeherder]$ vagrant provision
==> default: Running provisioner: puppet...
...
==> default: Notice: Finished catalog run in 14.75 seconds
==> scratch: VM not created. Moving on...
* Recommend using the same `client_id` for stage/prod.
* Mention the need to file a bug (and where) for requesting approval.
* Explain prod approval needs submission to be working on stage first.
Avoids this redirect seen in prod gunicorn logs:
[05/Jan/2016:05:55:42 -0800] "GET /docs HTTP/1.1" 301 -
"http://treeherder.readthedocs.org/retrieving_data.html"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:46.0) Gecko/20100101 Firefox/46.0"
Also remove the duplication between the two pages, by having the
submitting data section not mention requesting credentials at all, and
leave that to the common tasks page instead.
If the worker is not running, any `apply_async()` calls are silently
thrown away, due to `ingest_push`'s use of `CELERY_ALWAYS_EAGER` and:
https://github.com/celery/celery/issues/2910
As such, running the worker after ingest_push doesn't help (since the
rabbitmq queues are empty) and so if people are interested in perf/log
data, then they must start the worker first instead.
In update.py, the line outputting revision.txt has to be moved later,
since the `dist/` directory won't exist until grunt build has run. In
addition, since `grunt build` removes the entire `dist/` directory, we
no longer need to manually remove *.gz.
We use the `--production` options for both `npm install` and
`grunt build`, so that the `devDependencies` in package.json are
ignored, and we only install/load the ones listed under `dependencies`
in package.json - since that's all that is required for the build.
We have to use `./node_modules/.bin/grunt` rather than `grunt`, since
grunt-cli is not installed globally on the treeherder admin machine for
greater isolation between stage and production.
Whilst the packages listed in package.json are pinned to exact versions,
they will have their own dependencies, which may be specified via
version ranges. In order to make production/local behaviour more
deterministic, these can be pinned too, using `npm shrinkwrap`.
However the stock shrinkwrap command has a few deficiencies, so we're
using a wrapper around it:
https://github.com/uber/npm-shrinkwrap
Note: Only packages listed under `dependencies` will be shrinkwrapped,
not those under `devDependencies`. This is because using the `--dev`
option (which would include the dev packages in npm-shrinkwrap.json)
means there would then be no way to way to exclude the dev packages when
installing in production.
For more information about shrinkwrap in general, see:
https://docs.npmjs.com/cli/shrinkwraphttp://tilomitra.com/why-you-should-use-npm-shrinkwrap/https://nodejs.org/en/blog/npm/managing-node-js-dependencies-with-shrinkwrap/