We're already using pylibmc on Heroku, since we need its SASL
authentication support. However we were still using python-memcached on
Vagrant, Travis, stage & prod.
To reduce the number of simultaneous changes when we migrate to Heroku,
and to ensure at that point we're testing what we ship, this switches us
to pylibmc pre-emptively for all environments.
Django does have a native pylibmc backend [1], however it doesn't
support SASL authentication, so we have to use the custom django-pylibmc
backend instead [2], which we choose to use everywhere (even though only
Heroku's memcache instances require auth) for consistency.
Installing pylibmc requires libmemcached-dev, which is installed by
default on Travis, has just been installed on stage/prod (bug 1243767),
and as of this change, is now installed in the Vagrant environment too.
The comment mentioning that pylibmc must be present in the root
requirements.txt file no longer applies, since the Python buildpack now
uses pip-grep (rather than regex) to determine whether it is in the
requirements files [3], and so handles included requirements files too.
Example: https://emorley.pastebin.mozilla.org/8858007
I'll be checking for any changes in performance when this is deployed,
however if anything I expect it to be faster since it's not pure Python.
[1] https://github.com/django/django/blob/1.8.7/django/core/cache/backends/memcached.py#L171
[2] https://github.com/django-pylibmc/django-pylibmc/blob/master/django_pylibmc/memcached.py
[3] https://github.com/heroku/heroku-buildpack-python/blob/v75/bin/steps/pylibmc#L22
Since Heroku doesn't use nginx/Apache we must perform this via wsgi
middleware. We cannot use Django's HTTPS/HSTS features since they won't
help with requests that were served by WhiteNoise directly (eg the site
homepage).
Instead we use wsgi-sslify, as recommended by:
https://github.com/evansd/whitenoise/issues/53#issuecomment-166972824
We only enable it when IS_HEROKU is set, since stage/prod is handled by
Apache, and for local development we have to use HTTP.
We don't currently have code coverage enabled, and whilst we want to do
so, it will likely be in a form different from here (eg coveralls.io).
The packages removed by this commit are also older releases - and the
latest versions have changed their dependencies (for example cov-core is
no longer a separate package). By removing them it will reduce the
number of packages we have to install locally & on Travis, as well as
mean fewer warnings for out of date packages from requires.io.
pytest-django doesn't setup a test database for every single test, but
only for those tests that actually require a db. Tests that require a db
need to either be marked with `@pytest.mark.django_db` or use a fixture
that has a dependency on `db` or `transactional_db`.
Using a non transactional db would make tests execution much faster, but
unfortunately it doesn't play well with the treeherder datasource
creation so I used a transactional_db.
pytest-django also allows you to specify a settings file to use for
tests in a pytest.ini file, which is nicer than monkeypatch the original
settings file in the pytest session start function 😃.
These are the settings (which can be overridden by environment variables)
that will be used to specify which Pulse exchanges we ingest data from
This also introduces the ``django-environ`` package which will be used
elsewhere as we move away from ``local.py`` files on the stage/prod
servers.
The latest version of django-browserid removes a view that we used to fetch to retrieve
basig config params for the browserid client initialization. We now have loginUrl and logoutUrl
hardcoded in the client and we fetch the user login status from a dedicated endpoint.
The MPL 2.0 terms state that as long as a LICENSE file is present, the
per-file header text is not required. See "Exhibit A" at the end of:
https://www.mozilla.org/MPL/2.0/
We had to do at least one deploy after the initial landing of
bug 1169944, before removing prod.txt, to avoid errors during update.py.
That has now occurred, so we can remove the file.
For bug 1124278, we're going to want to sprinkle new relic annotations
around the codebase, so by always installing it, we save having to stub
these out in development/on Travis. It also seems wise to have prod
running as close to the same packages as in development.
Since NEW_RELIC_LICENSE_KEY isn't set locally, plus
NEW_RELIC_DEVELOPER_MODE is set to true, the New Relic agent doesn't
submit anything. See:
https://docs.newrelic.com/docs/agents/python-agent/installation-configuration/python-agent-configuration#developer_mode
dj-database-url extracts DB host, port, username, password and database
name from the env variable 'DATABASE_URL' (unless another env variable
name is specified). If the env variable is not defined, it falls back to
the default passed to dj_database_url.config().
This means for Heroku and similar we can replace the multiple DB env
variables with just one URL for default & one for read_only.
This also effectively makes the setting of the read only DB variable
mandatory for stage/production/heroku, since DEFAULT_DATABASE_URL won't
be valid for them - so prevents us inadvertently not using the read only
DB.
The deployment script also had to be updated, so that we set the
prod/stage-specific environment variables before using manage.py, since
dj-database-url cannot rely on what's in the stage/prod local.py config
(which isn't a bad thing, since we're deprecating that file).
This patch upgrades the version stored in the requirements file and fixes some issues introduced by breaking changes in the new version of the library:
- Writable nested fields are not available anymore, you need an explicit create method on the serializer to write a nested field.
- ModelViewSet now requires serializer_class and queryset attributes.
- @action and @link decorators are now replaced by either @detail_route or @list_route.
- any attempt to create a ModelSerializer instance with an attribute which type is either dict or list will raise an exception.
* Fixes the `offset` parameter, since it previously used the value for
`limit` instead.
* The `limit` and `offset` parameters are now cast to int, to prevent
SQL injection if those parameters were not sanitised in the app.
Note: This intentionally removes the ability to pass a comma delimited
`limit` string of say "100,200" - since the now-working `offset`
parameter makes this redundant.
https://github.com/jeads/datasource/compare/v0.8...v0.9
dj-database-url extracts DB host, port, username, password and database
name from the env variable 'DATABASE_URL' (unless another env variable
name is specified). If the env variable is not defined, it falls back to
the default passed to dj_database_url.config().
This means for Heroku and similar we can replace the multiple DB env
variables with just one URL for default & one for read_only.
This also effectively makes the setting of the read only DB variable
mandatory for stage/production/heroku, since DEFAULT_DATABASE_URL won't
be valid for them - so prevents us inadvertently not using the read only
DB.
Before this is deployed, we'll need to update the stage/prod puppet
configs & Heroku settings to add the new environment variable.
Since it only speeds up parsing by a few percent of total runtime, and
is therefore not worth the added complexity for deployment and local
hack-test-debug cycles when working on the log parser.
The .gitignore and update.py entries will be removed in a later commit,
once the stage/prod src directories have been cleaned up.
In order that we can serve the UI on Heroku, we wrap the Django wsgi app
with WhiteNoise, so both the UI and API requests are served by gunicorn.
In the Vagrant environment, Apache has been removed and Varnish instead
now proxies all requests to gunicorn/Django runserver directly, without
Apache as a go-between.
The UI on production will not be affected by this commit, since the
Apache config there will still intercept requests for the UI assets
rather than proxying them to gunicorn.
It's worth noting too, that we're not able to make use of WhiteNoise's
automatic Django GZip/caching support since that assumes we are using
Django templates and referring to resources using {% static "foo.css" %}
However, we can sub-class WhiteNoise (or more specifically the
DjangoWhiteNoise class) and override the is_immutable_file() method to
add caching support at a later date:
http://whitenoise.evans.io/en/latest/base.html#caching-headers
Documentation for WhiteNoise can be found at:
http://whitenoise.evans.io/
Since otherwise we play tug of war with Read the Docs build process.
Instead let's allow them to update us as they see fit, but at the same
time retain a docs.txt that will work locally.
I added a Procfile listing all the different python services treeherder needs.
Heroku provides deployment-specific settings via environment variables, so I had to modify the settings file to listen to them where that wasn't the case. I created an enviroment variable IS_HEROKU which allows to have a heroku-only configuration where needed.
The db service is provided by Amazon RDS, which requires a ssl connection. To enable ssl in the MySQLdb python client I had to modify Datasource (and bump up the version used).
The cache service is provided by the memcachier heroku addon. Heroku recommends to use pylibmc, so I set it up according to the docs here https://devcenter.heroku.com/articles/memcachier#python.
The amqp service is provided by the CloudAMQP addon.
I added a post_compile script that runs every time we deploy. We should run every build step we require in there, like static asset minification, collection, etc.
To share the oauth credentials among the various services I used an environment variable. I also added an option to export_project_credentials so that the credentials can be printed to stdout. This should come handy when we will need to update the environment-stored credentials with the ones in the db.
Changes:
e8d1d57145...v0.10
Using a specific version release of django-browserid (vs the Git zip
archive for a specific revision) means peep doesn't have to re-download
the package each time.
Changes:
e8d1d57145...v0.11.1
This will fix the spurious "Setting BROWSERID_VERIFY_CLASS not found"
errors in logs, as well as possibly help with people getting logged out
intermittently. It also brings us up to the latest django-browserid
release, which means updating later to a (yet to be released) Django 1.8
compatible version of django-browserid should be much easier.
Using a specific version release of django-browserid (vs the Git zip
archive for a specific revision) also means peep doesn't have to
re-download the package each time.
This is a no-op, the v0.6 release tag corresponds to the same revision,
it just avoids peeps SHA special-casing which causes continual
re-installation of the package & subsequent Travis cache invalidation.
https://github.com/jeads/datasource/releases/tag/v0.6
Changes:
f236a3487e...1.1
Using a specific version release of treeherder-client (vs the Git zip
archive for a specific revision) also means peep doesn't have to
re-download the package each time.
We're no longer using the vendor directory & this script wasn't entirely
reliable anyway, so let's remove it. The virtualenv package can be
removed from dev.txt, since virtualenv is installed globally, and
nothing inside our virtualenv (which is where the packages in dev.txt
end up) needs a local installation of it.
This differentiation was only useful when explaining which packages
could be listed in which requirements file (since compiled packages
could not be added to checked-in.txt). Now that all packages are peep
installed, common.txt contains both pure and compiled packages.
Previously, the requests package had to be listed in dev.txt even
though it was in the vendor directory, since it was used by conftest.py
before the vendor directory was added to the Python path. Now that the
packages in checked-in.txt have been moved to common.txt, 'requests' is
listed in two requirements files that are peep installed, so we can
remove the dupe.
Now that we're using virtualenvs and peep to manage packages in
production, there's no need to use an in-repo vendor directory. As
such, all packages that were in checked-in.txt have been moved to
common.txt, so they will now be peep installed during deployment/testing
and also during the provision of the Vagrant environment.
The whole point of peep is that it errors out if (a) hashes aren't
specified for a package, or (b) the provided hash is incorrect. As
such before we can start using peep, we must add the hashes. The
requirements files are still compatible with pip, since it just
treats them like any other comment.
We're not currently using socketio - and if we start doing so in the
future we'll likely want to update to a newer version/adjust the
implementation anyway. Removing the dependencies from common.txt speeds
up the pip install on Travis. The old files will still be in version
control should we wish to refer to them :-)
The packages in this file are already a mixture of pure and compiled
packages. It's not worth moving the pure packages to checked-in.txt,
since we'll eventually be removing checked-in.txt and the associated
vendor/ and moving everything in there to this file. As such, common.txt
more accurately reflects the purpose of this file.
There were many packages that end up being installed via dependency
chains, that were not themselves listed in the requirements files. To
ensure determinism with pip (and to prevent errors with peep, since it
uses --no-deps by default), all packages must be listed explicitly.
I've avoided adding any more packages to checked-in.txt since we will
soon be deleting the vendor directory, so it seems silly to pollute it
further. compiled.txt is now rather unfortunately named, since it lists
packages that are pure and could have been in vendor/.
Versions have been set to match those currently used in production, or
in the case of blessings (which is not installed in production, we're
just lucky our use of mozlog does not hit the import & so haven't seen
the error), I've just set it to the latest available version.
Using git+git means cloning the repo, trying to determine if the ref
specified is a tag or revision or branch etc. Instead Github provides
direct archive zips that are much faster when using pip install.
For changes in compiled.txt, the version matches that installed globally
in production. For those in checked-in.txt I've used the version in
vendor/ which is actually different from that in production's global
site-packages, but is what we're actually using, since vendor/ is
earlier in the Python path.
flake8 is pyflakes+pep8. In a later PR I'll add a mention of it to the
docs - particularly how to set it up as a local git commit hook, but for
now I'm just keen to not regress the passing flake8 run. We may also
need to further tweak the ignore settings in setup.cfg if we find
certain warning types to be too annoying.
To pick up the version number bump. This doesn't change the contents of
vendor/datasource/ so in theory shouldn't be necessary. However prod
pip installs the requirements in pure.txt when it shouldn't, and the
vendor directory is later in python.path than site-packages, so we end
up using the version installed globally. Unfortunately some of the nodes
have an older version of datasource installed, but without the version
bump it's hard to tell which.