Rename ``ingest_from_pulse`` management command to ``read_pulse_jobs`` to
indicate that this step does not actually do any ingesting. It just populates
the celery queue ``store_pulse_jobs`` that DOES do the actual ingesting.
`manage.py check --deploy` is now run during Travis testing and as part
of stage/prod/Heroku deployment. It checks for a number of common
configuration mistakes & ensures security best practices are being
followed:
https://docs.djangoproject.com/en/1.8/ref/checks/
...by defining 'SKIP_PREDEPLOY' in the environment.
This is useful in cases where we want to avoid potentially destructive
actions (such as running migrations) from taking place, or know that the
migration is just going to time out unless run by hand.
Previously if the `bin/` directory scripts were run in the Vagrant
environment, the processes run would not have been started via the
New Relic wrapper command, since the New Relic licence key is not set.
For greater consistency between Vagrant and production, the wrapper
command is now always used.
This works since `NEW_RELIC_DEVELOPER_MODE` is defined in the Vagrant
environment (thanks to a previous commit in the same bug), which
prevents the agent from trying to submit real data:
https://docs.newrelic.com/docs/agents/python-agent/installation-configuration/python-agent-configuration#developer_mode
There are some backwards incompatible changes:
http://whitenoise.evans.io/en/latest/changelog.htmlhttps://github.com/evansd/whitenoise/compare/v2.0.6...v3.0
Specifically:
* The CLI compression utility must now be called via
`python -m whitenoise.compress` rather than `python -m whitenoise.gzip`.
* The `whitenoise.django.GzipManifestStaticFilesStorage` storage backend
has moved to `whitenoise.storage.CompressedManifestStaticFilesStorage`.
* The internal `add_files()` method has been split into two and the part
which we need to subclass is now named `update_files_dictionary()`. See:
07f9c0bece
When first setting up a new app on Heroku, things like reporting the
deploy to New Relic will fail, since it requires that the app exist on
New Relic. However the app will only be created there once the Python
agent first reports app metadata, which won't happen until after the
deploy (there is no way to create the app via the web interface).
In addition, there may be cases in the future when stage/prod is broken,
and the pre-deploy tasks therefore fail, however we still want the
deploy to proceed.
To avoid needing to constantly edit this file, the environment variable
`IGNORE_PREDEPLOY_ERRORS` can now be set, in cases where the deploy
should continue even if there were errors. (Note this uses the bash 4.2+
`-v` option, see http://stackoverflow.com/a/18448624).
Requires that `NEW_RELIC_APP_NAME` and `NEW_RELIC_API_KEY` be set in the
environment. NB: `NEW_RELIC_API_KEY` is different from the existing
`NEW_RELIC_LICENSE_KEY`.
We're also making use of the runtime-dyno-metadata labs feature, which
sets the slug/release related environment variables used in this PR:
https://devcenter.heroku.com/articles/dyno-metadata
Since we'll soon be adding reporting deploys to New Relic, which will be
too verbose to include in the Procfile. Also adds additional log output
(which follows the buildpack compile log formatting convention) to make
it easier to find & follow the release tasks on Papertrail.
Uses the `set -euo pipefail` recommendation from:
http://redsymbol.net/articles/unofficial-bash-strict-mode/
The Python buildpack has now rewritten the automatic collectstatic
feature, such that it no longer does an unnecessary (and time-consuming)
dry-run every time. As such, we can switch back to the buildpack's
automatic collectstatic:
https://github.com/heroku/heroku-buildpack-python/blob/master/bin/steps/collectstatic
Prior to this landing, I'll update us to the latest version of the
buildpack (using the `heroku buildpacks:set -i X ...` command) and
remove the `DISABLE_COLLECTSTATIC` environment variable.
[ci skip]
Since once the `grunt build` has run they are no longer required, and
only serve to bloat the slug size, increase attack surface & overwrite
the Python .profile.d script's environment variables.
Since we're not calculating ETAs (the UI does that once it knows the
start time and expected duraction), we're calculating recent average
durations instead.
This creates a 'runnable_job' table in the database, as well as an API
endpoint at /api/project/{branch}/runnable_jobs listing all existing
buildbot jobs and their symbols. A new daily task 'fetch_allthethings' is
added to update the this table.
The Python buildpack's automatic collectstatic is slower, since it does
a `--dry-run` first. To avoid the time penalty of this, we disable it by
setting `DISABLE_COLLECTSTATIC` in the Heroku environment, and run
collectstatic manually in `bin/post_compile`. See:
https://github.com/heroku/heroku-buildpack-python/issues/252
This commit relies on the nodejs buildpack being added to the list of
buildpacks for the app, and prior to the Python buildpack. See:
https://devcenter.heroku.com/articles/using-multiple-buildpacks-for-an-app
The nodejs buildpack will automatically install the packages listed in
`dependencies` in package.json, so that we have the requirements for
the grunt build. We don't actually need node or all of the files in
node_modules after we've run the grunt build, so in the future could try
and remove them to reduce the resultant slug size (though it only
increased from 55MB to 70MB, so it's not urgent).
The dist directory has been added to `.slugignore` to prevent the
in-repo directory from being uploaded, since we'll be generating a new
one as part of the deploy. Once `dist/` is deleted from master, that
entry can be removed.
This adds an autoclassify command and a detect_intermittents command.
The former is designed to take an incoming job with an error summary
and look for existing results marked as intermittent that are a close
match for the new result. At present only one matcher is implemented;
this requires an exact match in terms of test name, result and error
message. Matching is also constrained to be based on single lines; it
is anticipated that future iterations may add support for matching on
groups of lines.
The detect_intermittents command is designed to take a group of jobs
running on the same push and with the same build job (i.e. same
testsuite, same chunk, etc.) and look for new intermittents to add to
the database. This currently only looks for test failures where there
is at least one green job and one non-green job.
There is currently no UI for seeing matches or for adding new
prototypical intermittents as match candidates. There is also no
integration with bugzilla; future development should add association
of frequent intermittents with bugs.
`IS_HEROKU` isn't something that will differ between stage/prod, and is
not going to change any time soon. So let's just get post_compile to
add it to the dyno profile script, so it's one less variable to have to
remember to set via the Heroku CLI/dashboard.
Quoting from:
http://blog.doismellburning.co.uk/2014/10/06/twelve-factor-config-misunderstandings-and-advice/
"12factor says your applications should read their config from the
environment; it has very little to say about how you populate the
environment ? use whatever works for you".
On Heroku, there is no load balancer or Varnish-like cache in front of
gunicorn, so we must handle gzipping responses in the app.
In order for WhiteNoise to serve gzipped static content, assets must be
gzipped on disk in advance (doing so on-demand in Python would not be
as performant). WhiteNoise will then serve the `.gz` version of files in
preference to the original, if the client indicated it supported gzip.
For assets covered by Django's collectstatic, gzipping the assets only
requires using WhiteNoise's GzipManifestStaticFilesStorage backend,
which wraps Django's ManifestStaticFilesStorage to create hashed+gzipped
versions of static assets:
http://whitenoise.evans.io/en/latest/django.html#add-gzip-and-caching-support
The collectstatic generated files will then contain the file hash in
their filename, so WhiteNoise can also serve them with a large max-age
to avoid further requests if the file contents have not changed.
For the UI files under `dist/`, we cannot rely on the Django storage
backend, since the directory isn't covered by STATICFILES_DIRS (it is
instead made known to WhiteNoise via `WHITENOISE_ROOT`). As such, files
under `dist/` are gzipped via an additional step during deployment. See:
http://whitenoise.evans.io/en/latest/base.html#gzip-support
Files whose extension is on the blacklist, or that are not >5% smaller
when compressed, are skipped during compression.
The MPL 2.0 terms state that as long as a LICENSE file is present, the
per-file header text is not required. See "Exhibit A" at the end of:
https://www.mozilla.org/MPL/2.0/
After the previous commit, the Objectstore is effectively "dead code".
So this commit removes all the dead code after anything left over in
the Objectstore has been drained and added to the DB.
Since it only speeds up parsing by a few percent of total runtime, and
is therefore not worth the added complexity for deployment and local
hack-test-debug cycles when working on the log parser.
The .gitignore and update.py entries will be removed in a later commit,
once the stage/prod src directories have been cleaned up.
Previously all three of the buildapi ETL ingestion tasks (pending,
running, 4hr) were run under one queue. In order to be able to tell
issues (such as backlogs or leaks) apart, these have now been split onto
their own queues.
On Heroku, these queues now also have a dyno each - which should mean we
can easily tell which is leaking and possibly also downgrade from the
expensive performance dyno even before the leak is fixed.
This introduces two new ways to generate ``Bug suggestions`` artifacts from
a ``text_log_summary`` artifact
1. POST a ``text_log_summary`` on the ``/artifact`` endpoint
2. POST a ``text_log_summary`` with a job on the ``/jobs`` endpoint.
Both of these cases will schedule an asynchronous task to generate the
``Bug suggestions`` artifact with ``celery``.
Artifact generation scenarios:
JobCollections
^^^^^^^^^^^^^^
Via the ``/jobs`` endpoint:
1. Submit a Log URL with no ``parse_status`` or ``parse_status`` set to "pending"
* This will generate ``text_log_summary`` and ``Bug suggestions`` artifacts
* Current *Buildbot* workflow
2. Submit a Log URL with ``parse_status`` set to "parsed" and a ``text_log_summary`` artifact
* Will generate a ``Bug suggestions`` artifact only
* Desired future state of *Task Cluster*
3. Submit a Log URL with ``parse_status`` of "parsed", with ``text_log_summary`` and ``Bug suggestions`` artifacts
* Will generate nothing
ArtifactCollections
^^^^^^^^^^^^^^^^^^^
Via the ``/artifact`` endpoint:
1. Submit a ``text_log_summary`` artifact
* Will generate a ``Bug suggestions`` artifact if it does not already exist for that job.
2. Submit ``text_log_summary`` and ``Bug suggestions`` artifacts
* Will generate nothing
* This is *Treeherder's* current internal log parser workflow
I added a Procfile listing all the different python services treeherder needs.
Heroku provides deployment-specific settings via environment variables, so I had to modify the settings file to listen to them where that wasn't the case. I created an enviroment variable IS_HEROKU which allows to have a heroku-only configuration where needed.
The db service is provided by Amazon RDS, which requires a ssl connection. To enable ssl in the MySQLdb python client I had to modify Datasource (and bump up the version used).
The cache service is provided by the memcachier heroku addon. Heroku recommends to use pylibmc, so I set it up according to the docs here https://devcenter.heroku.com/articles/memcachier#python.
The amqp service is provided by the CloudAMQP addon.
I added a post_compile script that runs every time we deploy. We should run every build step we require in there, like static asset minification, collection, etc.
To share the oauth credentials among the various services I used an environment variable. I also added an option to export_project_credentials so that the credentials can be printed to stdout. This should come handy when we will need to update the environment-stored credentials with the ones in the db.
We're no longer using the vendor directory & this script wasn't entirely
reliable anyway, so let's remove it. The virtualenv package can be
removed from dev.txt, since virtualenv is installed globally, and
nothing inside our virtualenv (which is where the packages in dev.txt
end up) needs a local installation of it.