Since:
* Redis has additional features we need (eg for bug 1409679)
* the Redis server, python client and Django backend are more
actively maintained (so have persistent connections/pooling that
actually works, which gives a big perf win)
* we can use Heroku's own Redis addon rather than relying on a
third-party's (to hopefully prevent a repeat of the certificate
expiration downtime)
This commit:
* Switches pylibmc/django-pylibmc to redis-py/django-redis, and
configures the new backend according to:
http://niwinz.github.io/django-redis/latest/https://github.com/andymccurdy/redis-py
* Uses redis-py's native support for TLS to connect to the Heroku
Redis server's stunnel port directly, avoiding the complexity of
using a buildpack to create an stunnel between the dyno and server:
https://devcenter.heroku.com/articles/securing-heroku-redis#connecting-directly-to-stunnel
* Uses explicit `REDIS_URL` values on Travis/Vagrant rather than
relying on the django-environ `default` value (for parity with
how we configure `DATABASE_URL` and others).
* Removes the pylibmc connection-closing workaround from `wsgi.py`.
Note: Whilst the Heroku docs suggest using `django-redis-cache`, it's
not as actively maintained, so we're using `django-redis` instead:
https://devcenter.heroku.com/articles/heroku-redishttps://github.com/niwinz/django-redishttps://github.com/sebleier/django-redis-cache
Before this is merged, Heroku Redis instances will need to be created
for stage/production (prototype done) - likely on plan `premium-3`:
https://elements.heroku.com/addons/heroku-redis
We'll also need to pick an eviction policy:
https://devcenter.heroku.com/articles/heroku-redis#maxmemory-policy
Once deployed, the `memcachier-tls-buildpack` buildpack will no longer
be required and can be removed from prototype/stage/prod, along with
the `TREEHERDER_MEMCACHED` environment variable and Memcachier addon.
The UI has already been removed. This cleans up the data ingestion
and removes the `JobDuration` model, however leaves the `running_eta`
field on the `Job` model for the next time that table is touched (since
the table is large, so altering the schema would likely require
downtime).
Since after my update to the Travis images, the pre-installed version
of Elasticsearch matches the one we're looking for, meaning the
download/install can be skipped, saving time.
Since I've adjusted the Travis images to always use the latest yarn
at the time the images are created, so it's not worth installing the
latest version at every runtime:
https://github.com/travis-ci/travis-cookbooks/pull/935
Since it reduces the intermittent failure rate in bug 1401048 and is
also likely more representative of what people are using to access
Treeherder in production (whilst still being more stable than Nightly).
This:
* reduces duplication
* opens the door to sharing functionality with `vagrant/setup.sh`
* will make it easier to visualise the Travis bootstrap process
when moving both Travis and Vagrant to a unified Docker environment.
set-env was renamed in the buildpack to set_env, breaking this script.
Rather than just renaming our usage, it makes more sense to stop using
what is really an internal buildpack API and instead use `.profile`:
https://devcenter.heroku.com/articles/dynos#the-profile-file
On Heroku, the environment variable `DYNO` is always set in the
environment. As such, we can just use that to determine whether
code is being run on Heroku, rather than having to add our own
environment variable.
To ensure that:
* the Heroku router's 30 second timeout doesn't beat gunicorn to it,
given that routing time is included in Heroku's timing
* the app is more likely to remain responsive, when receiving many
badly filtered API calls
Change of new environment variable `PULSE_PUSH_SOURCES`.
Keep old `publish-resultset-runnable-job-action` task name by creating a
method that points to `publish_push_runnable_job_action`.
All environments are now using the native MySQL 5.7 libmysqlclient
library, so the custom vendoring script is unused. The tests checking
that the library isn't vulnerable aren't needed any more, since
heroku-16 doesn't even have that package installed (it's not available
on Xenial).
Since the Python buildpack now restores the cache prior to the
`bin/pre_compile` script being run, which makes the workaround
unnecessary and in fact breaks clean installs.
At some point soon we'll want to switch Heroku stage/production to
their new Heroku-16 (Ubuntu 16.04) stack, so it makes sense to trial
it locally first. Their Heroku-16 docker image will make switching
to docker simpler too.
Notable changes to the provision steps:
* We're using the Bento boxes rather than the 'official' Canonical
ones, at the recommendation of Vagrant:
https://www.vagrantup.com/docs/boxes.html#official-boxes
* openjdk-8-jre-headless is now available from the ubuntu.com
repository, so doesn't require a PPA.
* There isn't an official mysql-server-5.6 package (only 5.7+), so
we have to start using a PPA.
* Services must now be managed using systemd rather than SysV init.
* The `eth0` network interface has changed name:
https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/
* The `vendor-libmysqlclient.sh` script will still be run on Ubuntu
14.04 on Travis/Heroku for now, so has to be compatible with both.
* The previous temporary cleanup steps can be removed since the
vagrant destroy means a clean slate.
Adds the virtualenv directory to `PATH` in `env.sh`, to save having
to activate the virtualenv (which is about to be removed) or set
`PATH` in each of the wrapper bin scripts.
Now we're no longer using datasource, the memory leaks previously
seen have gone. As such there is no need to make the celery worker
processes restart so frequently (they will now only be restarted at
the daily Heroku dyno restart), which will improve performance.
In addition, this will reduce the number of transactions that don't get
reported to New Relic (when the maxtasksperchild threshold is
reached, the worker is forcibly killed before it can submit the
New Relic payload).
The log parser tasks have been left unchanged for now, since they
appear to still be leaking.
Since it's faster, deterministic and doesn't given obscure errors when
using `--no-bin-links` (which is required for both npm and yarn on
Windows hosts), and as such unblocks the work in bug 1343624.
Many of the commands are the same as with npm. See:
https://yarnpkg.com/en/docs/usage
Routing commands via npm/yarn is preferred, since it avoids
having to do global installs of grunt-cli, which simplifies contributor
setup, and means less effort when we switch to Yarn (since it requires
manual PATH setup for globally installed packages).
Since hopefully now we're no longer using datasource, the leaks should
have gone. The gunicorn processes will now only be restarted at the
daily Heroku dyno restart, rather than multiple times per minute,
improving performance.
Outputting to the console rather than a log file:
* is more user-friendly during development
* is more consistent with Heroku
* means the Vagrant-specific Django LOGGING config is now closer to the
one in settings.py, and so more easily combined with it
Both gunicorn and celery default to outputting to stdout/stderr, so the
`logfile` options can be omitted entirely.
The packages have been moved under the archives path, since there are
newer releases now available. The official download link has remained
unchanged, however it's a 302 redirect to a non-HTTPS version of these
links, so we intentionally deep-link, even if it means the URLs will
occasionally need updating.
Fixes the 404 whilst Vagrant provisioning, and failures that would occur
with a clean cache on Heroku/Travis.
This changes ingestion, the API endpoints, and the frontend to match
the new structure. For now we continue to store text_log_summary artifacts,
though they don't do anything anymore.
Previously `<site-root>/revision.txt` would 404 if `SERVE_MINIFIED_UI`
was unset on Heroku, which would then cause the next deploy to fail.
It's now made available in both the `ui/` and `dist/` directories, so
it can be found regardless of the value of `SERVE_MINIFIED_UI`.
* Bug 1264074 - Move to_timestamp function to a reusable location
* Bug 1264074 - Refactor JobConsumer to have a PulseConsumer super class
Much of what was in the JobConsumer is reusable by the upcoming
ResultsetConsumer. So refactor those parts out so that each specific
consumer can reuse code as much as possible.
* Bug 1264074 - Add ability to ingest Github Resultsets via Pulse
This introduces a ResultsetConsumer and a read_pulse_resultsets
management command to ingest resultsets from the TaskCluster
github exchanges.
When a supported Github repo has a Pull Request created or
updated, or a push is made to master, then it will kick off a
Pulse message. We will receive it and then fetch any additional
information we need from github's API and store the Resultset.
This follows a very similar pattern to the Job Pulse ingestion.
* Bug 1264074 - Old code/comments cleanup
* Bug 1264074 - Tests for the Github resultset pulse loader
We're using Bash errexit mode (`set -e`) so cannot directly use a
command that returns non-zero inside `[` style brackets (unless in a
`$(...)`), however the `(` form does work with errexit and is cleaner.
There's a strange bug when creating relative symlinks on top of an
already existing symlink, in that the target resolves to the wrong file.
As such, we now explicitly check for an existing symlink first, to avoid
overwriting.
The SCL3 system check disabling has also been tweaked since #1770, since
the `HOSTNAME` environment variable doesn't appear to be set during
deploy, even though it appears in `env` on the same machine.
Yey for deployment code that can only really be tested by deploying \o/
The Heroku pre_compile script is currently run prior to the cache being
restored (https://github.com/heroku/heroku-buildpack-python/pull/321),
which means we have to tweak PATH so vendor-libmysqlclient.sh can find
the binaries from the cache instead of the app directory.
However the workaround added in #1770 only added one of the two extra
required PATHs, this adds the other.
Prior to this the buildpack compile would output:
> ./bin/vendor-libmysqlclient.sh: line 65: pip: command not found
...and so wouldn't purge the old mysqlclient package, which is needed to
force recompilation against the newer libmysqlclient.
Once the PR against heroku-buildpack-python is merged, these workarounds
can be removed.
The latest versions of libmysqlclient 5.5/5.6 (used by mysqlclient) are
still vulnerable to TLS stripping, even after last year's backports of
5.7.x fixes:
- https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-3152
- http://bugs.mysql.com/bug.php?id=82383
Ideally we'd just use the standalone Connector/C library instead of the
libmysqlclient packages, however the latest release is too old:
- http://bugs.mysql.com/bug.php?id=82448
Heroku's cedar-14 stack comes with libmysqlclient 5.5.x, so until it is
updated to 5.7.x (see https://github.com/heroku/stack-images/pull/38) we
must manually vendor 5.7.x ourselves, so that connections between the
Heroku dynos and our public RDS instances are secure. We can do this and
still remain on MySQL server 5.6, since newer client releases are
backwards compatible with older server versions.
Whilst the Vagrant/Travis MySQL instances don't use TLS (and so aren't
affected), we still want them to use libmysqlclient 5.7, to be
consistent with production.
Installing the newer libmysqlclient isn't sufficient on it's own. Any
packages compiled against the older version (in our case mysqlclient)
need to be recompiled. We ensure this happens by pip uninstalling the
existing package if it was already installed.