The `manage.py check --deploy` command causes the Django models to be
initialised, which requires Elasticsearch to be accessible, if
`ELASTICSEARCH_URL` is set. The variable is set globally for the Travis
runs, however for the linters job (which is where `check --deploy` was
previously being run) Elasticsearch isn't actually installed.
Whilst we could have just unset `ELASTICSEARCH_URL` for the linters job,
it's best to move the test to the main Python job chunk, since:
* The command doesn't only hit Elasticsearch, but also the DB, and the
linters job is running the older mysql 5.5, rather than mysql 5.6.
* Whilst Elasticsearch isn't currently enabled for anywhere other than
the prototype Heroku instance, we'll soon be using it everywhere, so
should get `check --deploy` to actually test what we plan to deploy.
Note: The exception during `check --deploy` was not turning the Travis
run red, due to `check --deploy` being piped to awk. When we update to
Django 1.10 the awk workaround can be removed, however in the meantime
we can use the bash `pipefail` option to ensure the exit code from the
`manage.py` command propagates through to the one that Travis sees.
Add support for matching test failures where the test, subtest, status,
and expected status are all exact matches, but the message is not an
exact match. The matching uses ElasticSearch and is initially optimised
for cases where the messages differ only in numeric values since this is
a relatively common case.
This commit also adds ElasticSearch to the travis environment.
Since they are now taking much longer than before (40% of the total
test suite runtime). Prior to this change, chunk A was taking ~9mins
and chunk B ~4 mins. After, the chunk times should be ~{4,4,6}.
Since if not specified it defaults to `/usr/bin/python`, which on
Travis Precise containers is 2.7.6 and Trusty GCE is 2.7.3, since the
Python 2.7.11 install they provide is not system-installed but instead
just placed on the `PATH`.
This means that until now we've not actually been testing using the
same version of Python as has been running in production :-/
Test names, messages, etc. may contain UTF8 characters from beyond the
Basic Multilingual Plane ("astral" characters). Unfortunately MySQL's
"utf8" character set is nothing of the sort and will only store a
maximum of three bytes per character, thus restricting it to BMP
characters. The correct fix to this is to switch to the utf8mb4
character set. Since such a change is somewhat involved, however, we
address the immediate problem with a hack.
When storing failure lines, if the operation fails for character set
related reasons, try again with any non-BMP characters replaced by a
marker of the form <U+codepoint> e.g. <U+10FFFF>.
Note further that whether or not MySQL fails here or silently replaces
each byte of the original character with a U+FFFD replacement character
depends on the value of the sql_mode setting. If this is set to
STRICT_ALL_TABLES, we get an error, otherwise silent data
loss. Therefore it is important this setting is consistent across all
environments.
`manage.py check --deploy` is now run during Travis testing and as part
of stage/prod/Heroku deployment. It checks for a number of common
configuration mistakes & ensures security best practices are being
followed:
https://docs.djangoproject.com/en/1.8/ref/checks/
Since it's now supported:
https://blog.travis-ci.com/2016-05-03-caches-are-coming-to-everyone
This will reduce the job runtime by 60-80s.
Similar to the container job (job chunk 1), since we're now caching the
virtualenv, we have to workaround the default virtualenv being polluted
with existing packages (by creating a clean one in another directory),
to avoid constant cache churn.
As of pip 8, peep has now been integrated into pip.
Migrating from peep to this native feature has several advantages:
* It avoids the complexity/learning curve of using a wrapper around pip.
* It means we do not need to fork the official Heroku Python buildpack
(which handles pip installation of requirements files) in order to use
hash verification on Heroku. (Once the buildpack updates to pip 8.)
* Omitted sub-dependencies result in install-time errors rather than
the user discovering omissions at run-time.
* pip's native caching is used, and all packages are installed in one
pip invocation, so it's significantly faster.
* It has better handling of errors and corner cases.
Key facts about the native feature:
* hash-checking mode is enabled if at least one hash is found in the
requirements files passed to pip, or can be force enabled by passing
`--requires-hashes` when running `pip install`.
* Once enabled, hash-checking mode enforces that all packages:
- are pinned to a specific version
- have hashes listed
- have all sub-dependencies specified
* Older versions of pip will error out if either `--require-hashes` or
the requirements file `--hash` syntax is used, meaning it's not
possible to accidentally lose hash-checking protection if the pip used
is older than expected.
For more details, see:
https://pip.pypa.io/en/stable/user_guide/#hash-checking-modehttps://pip.pypa.io/en/stable/reference/pip_install/#hash-checking-mode
The pip version on Travis and in the Vagrant virtualenv has been updated
to 8.0.2 in bug 1241144, and the stage/prod virtualenv in bug 1241519.
The Heroku Python buildpack pip was updated in bug 1241909.
The requirements files hashes were ported using `peep port`, and then
comments/URLs re-added by hand.
So that we can confirm that peep still operates under pip v8, prior to
the transition from peep to pip v8's new hashing feature.
https://pip.readthedocs.org/en/stable/news/
The individual RABBITMQ_* variables are never used on their own, so it
makes more sense to switch to a URL variable that combines all of them.
Travis/Vagrant have also had BROKER URL explicitly set, since we're
generally moving away from having testing defaults set in settings.py.
Once stage/prod have BROKER_URL set, we can remove the fallback to the
old variable names, and also remove defaults entirely, making missing
settings fail fast.
Previously the python chunks were approx 3m55s and 2m55s respectively,
since we couldn't move one of the largest running directories (etl) to
the second chunk, since the tests would fail due to bug 1219922.
After this change the python chunk runtimes are 3m15s and 3m25s.
Since they take approx 35% of the py.test run time. Ideally we'd move
the ETL tests instead of the E2E ones to this separate job (since they
are more related to the log parsing tests than some of the others, and
would make the split more even), but they currently break when run
standalone, so we'll do that once bug 1219922 is fixed.
The python-tests job is run on non-container infra so can't use caching.
Peep install takes 60s compared to 30s for pip (part hashing cost, part
peep's unavoidable design inefficiencies due to not being built into
pip), which is painful given that we do a fully install every time due
to lack of caching.
The linters step (which can use caching) is already testing the peep
install works, so we can fall back to plain pip for this job part to
save an extra 30s from the runtime.
This also adds the `language: python` key which was the missing
ingredient to getting a virtualenv set up for us. Now that we're not
using peep, it doesn't matter that the provided virtualenv is using
pip v7.x (which is incompatible with peep).
Move the JS karma tests and the grunt build smoketest to their own job,
that runs on the container infra. The results are now available after
~50s rather than ~4mins previously. Ad added advantage is that since
we can now use `language: node_js`, we can pin to a specific version,
reducing the chance of breakage.
In addition, the python test run now no longer needs to npm install
(which isn't cacheable) and doesn't have the JS test/grunt build
runtime, saving ~120s.
Since this job sets `sudo: false`, it will run on the container infra,
so be able to use caching. The `env: linters` is so that a nice label
appears in the Travis sub-job UI, eg see:
https://travis-ci.org/mozilla/treeherder/builds/87763293
We install the full set of node and python dependencies, since
hardcoding just the packages we need here could lead to the versions
getting out of sync with package.json/the pip requirements files. In
addition isort requires a populated virtualenv, and npm sucks if
passing package names directly (it reinstalls them even if installed).
It's worth noting that with the cache, full dependency installation
only takes 2.5s, so this is all moot.
This job (once it has populated the cache) now only takes 24s
end-to-end, whereas previously it would have been ~3.5 mins before the
linters steps had completed.
Now that we have better integration between pytest and django we don't
need to create the test db upfront. The test db will be created on the
first test that requiring it,
pytest-django doesn't setup a test database for every single test, but
only for those tests that actually require a db. Tests that require a db
need to either be marked with `@pytest.mark.django_db` or use a fixture
that has a dependency on `db` or `transactional_db`.
Using a non transactional db would make tests execution much faster, but
unfortunately it doesn't play well with the treeherder datasource
creation so I used a transactional_db.
pytest-django also allows you to specify a settings file to use for
tests in a pytest.ini file, which is nicer than monkeypatch the original
settings file in the pytest session start function 😃.
We were previously using the same database (test_treeherder) for both the
jobs and reference data model. I centralized the new db name in the test
settings file. All the test requiring the jobs db or its repository counterpart
can now access it using the `test_project` fixture, while utility functions use
directly the metioned setting. Where the project name is hardcoded in a static
file, I just replaced it with the new name `test_treeherder_jobs`
Stage/production/Vagrant/Heroku's RDS all use mysql 5.6, however Travis
is currently running v5.5. Installing mysql 5.6 manually on the Travis
container infra is currently broken:
https://github.com/travis-ci/apt-package-whitelist/issues/1206#issuecomment-149884653
To use sudo (for apt-get) we either have to fall back to the legacy
non-container infra, or else use the new Trusty beta infra:
http://blog.travis-ci.com/2015-10-14-opening-up-ubuntu-trusty-beta/http://docs.travis-ci.com/user/trusty-ci-environment/#Runtimes
The Trusty beta infra is also non-container, but at least isn't EOL.
Unfortunately similar to the legacy non-container infra, it doesn't
offer caching, so incurs a setup time penalty of approx 3 minutes
(including the mysql 5.6 install, npm install and peep install). See:
https://github.com/travis-ci/travis-ci/issues/4997
If/when the container infra uses mysql 5.6 by default, or the bug
preventing installing it using the apt travis.yml option is fixed, we
should switch back to the container infra, to speed up the Travis run.
In this commit, we use `--user` with the peep install, since the
non-container infra doesn't set up a virtualenv, and we cannot use sudo
due to:
https://github.com/travis-ci/travis-ci/issues/4989
In addition, the current user ("travis") now doesn't have permissions to
create the Treeherder DB, so we have to use `-u root` (the password is
blank).
The new infra is running Python 2.7.10 (rather than the v2.7.9 of the
container infra) which now matches what runs in the Vagrant environment.
In future bugs we should update stage/prod and Heroku to 2.7.10 too.
Previously if TREEHERDER_DJANGO_SECRET_KEY was not set, we'd silently
fall back to a default value for SECRET_KEY, meaning we wouldn't realise
we were using an insecure key on a live deployment instance.
With this change, TREEHERDER_DJANGO_SECRET_KEY being missing from the
environment is fatal, resulting in:
"ImproperlyConfigured: The SECRET_KEY setting must not be empty."
The MPL 2.0 terms state that as long as a LICENSE file is present, the
per-file header text is not required. See "Exhibit A" at the end of:
https://www.mozilla.org/MPL/2.0/
dj-database-url extracts DB host, port, username, password and database
name from the env variable 'DATABASE_URL' (unless another env variable
name is specified). If the env variable is not defined, it falls back to
the default passed to dj_database_url.config().
This means for Heroku and similar we can replace the multiple DB env
variables with just one URL for default & one for read_only.
This also effectively makes the setting of the read only DB variable
mandatory for stage/production/heroku, since DEFAULT_DATABASE_URL won't
be valid for them - so prevents us inadvertently not using the read only
DB.
The deployment script also had to be updated, so that we set the
prod/stage-specific environment variables before using manage.py, since
dj-database-url cannot rely on what's in the stage/prod local.py config
(which isn't a bad thing, since we're deprecating that file).
dj-database-url extracts DB host, port, username, password and database
name from the env variable 'DATABASE_URL' (unless another env variable
name is specified). If the env variable is not defined, it falls back to
the default passed to dj_database_url.config().
This means for Heroku and similar we can replace the multiple DB env
variables with just one URL for default & one for read_only.
This also effectively makes the setting of the read only DB variable
mandatory for stage/production/heroku, since DEFAULT_DATABASE_URL won't
be valid for them - so prevents us inadvertently not using the read only
DB.
Before this is deployed, we'll need to update the stage/prod puppet
configs & Heroku settings to add the new environment variable.
Since it only speeds up parsing by a few percent of total runtime, and
is therefore not worth the added complexity for deployment and local
hack-test-debug cycles when working on the log parser.
The .gitignore and update.py entries will be removed in a later commit,
once the stage/prod src directories have been cleaned up.
Sets the default values (and now also those used by Vagrant) to the same
as those used by Travis, so we can avoid specifying different values all
over the place.
There's no need to make multiple calls to peep - we can just combine
them into one. Not changing the puppet instances for Vagrant, since the
calls are made in two separate puppet modules and so would require a bit
of refactoring, which is going to occur in bug 1074151 and friends.
This merges the service and UI Travis configs, to get the Karma UI tests
running on Travis in the new repo. We can only set 'language' to one
value, however that doesn't matter, since nodejs is installed by default
and all the 'language: node_js' did was set a few default build cycle
steps - and we can define those ourselves manually.
We install the deps using npm install, ensure they are cached by adding
the node_modules directory to the cache list, get xvfb running for Karma
(see http://docs.travis-ci.com/user/gui-and-headless-browsers/) and use
|npm test| to run Karma using karma.conf.js.
The end to end tests (karma-e2e.conf.js) are not currently running, same
as before the repo merge.
We want to start using peep in production, to alleviate security
concerns with the idea of auto-updating packages from PyPI on deploy.
As a first step, we switch to using peep in the Vagrant environment,
on Travis and in the Docker build - so we can confirm the hashes are
correct.
Close bug 1143350.
Steps in before_script (and any other part of the build lifecycle apart
from 'script') terminate the run immediately on failure. This means in
the case of flake8 failures, the job changes from "in progress" to
"errored" virtually immediately (with the virtualenv caching changes) so
the user can see the problem and re-push straight away. See:
http://docs.travis-ci.com/user/build-lifecycle/