The split of test directories between each Python chunk has been
adjusted to more evenly balance runtime. The `tests/e2e` directory
has been moved from chunk B to chunk A, and the `tests/webapp/embed`
+ `tests/webapp/graphql` directories from chunk C to chunk A.
Short of moving tests in the `tests/webapp/api` directory around,
this is the best we can do for now. Hopefully bug 1348947 can
improve the situation in the future.
Travis occasionally update the stable release Trusty image, however
not as often as would be ideal. The newer images now have recent
yarn pre-installed, meaning the manual install steps can be removed.
In addition, the `$HOME/bin` directory now exists by default in the
image.
Whilst the sudo-less EC2 container builds have a faster boot time than
the fully virtualised GCE jobs (1s vs 20-30s), they have less CPU/RAM
available due to resource contention, so are slower than GCE for
longer running jobs.
Switching roughly halves the runtime of this job.
As of a few days ago, the `latest` alias now maps to Firefox 55 rather
than Firefox 54.0.1. However the JS tests for some reason frequently
time out with Firefox 55, so until that's resolved we must explicitly
use the older version.
Verifies installed packages have compatible dependencies, to help
prevent issues like bug 1324707. This will reduce the time taken to
review pyup bot PRs.
Example output if errors found:
```
Running pip check
celery 3.1.25 has requirement kombu<3.1,>=3.0.37, but you have kombu 4.1.0.
```
Now that we're using MySQL 5.7 server, there is no longer a conflict
between our desired choice of client library (which had to be 5.7 to
avoid security vulnerabilities) and the server version.
The library is still vendored on Heroku for now (in `bin/pre_compile`),
but that can stop too, once we switch to the Heroku-16 stack which is
based on Ubuntu 16.04 (bug 1365567).
The Vagrant environment is running Ubuntu 16.04, which has a native
mysql-server-5.7 package available, so doesn't need a custom APT
repository to be set.
Travis is still on Ubuntu 14.04 (since they don't offer 16.04 yet),
so has to use the upstream mysql.com repository instead.
In the future I'll be switching both Travis and local development to
use the same Docker images to avoid these kind of differences.
After this lands, we'll want to open a PR against the Terraform configs
to switch the treeherder-dev RDS instance to MySQL 5.7, before doing
the same for stage/prod.
Initially these will continue to be run in Jenkins, however later they
will be converted to run on Travis along with the existing local
Selenium tests.
* ES 5.x now requires JDK 8, and Ubuntu 14.04 only ships with openjdk 7,
so a third party PPA must be used in Vagrant:
https://launchpad.net/~openjdk-r/+archive/ubuntu/ppa
* ES 5.x changed the way it manages the heap size, such that:
- The variables for controlling it have changed (now set via eg
`ES_JAVA_OPTS="-Xms1g -Xmx1g"` or in the jvm.options file). See:
https://www.elastic.co/guide/en/elasticsearch/reference/5.2/heap-size.html
- The default heap size has increased from ~(min:256MB, max:1GB) to
(min: 2GB, max: 2GB) which causes OOM in the VM, unless either
lowered back down or the VM RAM increased.
* The Python ES clients must be updated to the latest releases:
https://elasticsearch-py.readthedocs.io/en/master/#compatibility
* The previous test failures were fixed in #2403.
* The Vagrant provision now also waits for Elasticsearch to be ready
before trying to run the Django migrations, since Elasticsearch can
take a while to start (and always has). This prevents failures when
the pip/yarn install steps are no-ops (when already up to date),
causing the Django migration to run immediately after the ES install
step.
The MySQL service takes several seconds to restart, so we skip doing so
if the config file already exists with the expected content.
The privileges/database creation steps are idempotent and extremely
fast, so it's not worth the boilerplate of checking before running.
Since it's faster, deterministic and doesn't given obscure errors when
using `--no-bin-links` (which is required for both npm and yarn on
Windows hosts), and as such unblocks the work in bug 1343624.
Many of the commands are the same as with npm. See:
https://yarnpkg.com/en/docs/usage
For the same reason as the previous commit.
Ideally we'd remove the grunt abstraction entirely and call eslint from
the `lint` command, but we might as well save that to the Neutrino PR.
Routing commands via npm/yarn is preferred, since it avoids
having to do global installs of grunt-cli, which simplifies contributor
setup, and means less effort when we switch to Yarn (since it requires
manual PATH setup for globally installed packages).
Vagrant uses the latest 7.x.x release, which is now 7.7.2. To reduce
differences between environments whilst the Neutrino/webpack work is
stabilised, it makes sense to update Heroku/Travis again too.
* Written in react.js for speed
* Does not require custom treeherder backend API's
* Improved scrolling
* Fixes issues in URL when linking directly to line numbers
The Elasticsearch background process takes up to 8 seconds to start but
isn't required until the tests run, so needn't block the other setup
tasks from running in parallel.
Travis have just announced a beta for container-based Ubuntu Trusty
builds:
https://blog.travis-ci.com/2016-11-08-trusty-container-public-beta/
This means the linter runs can now run on the same version of Ubuntu
used by the other tests and production. This also reduces runtime since
Travis can skip the glibc patching step required on the older image.
The image has received several updates, including now defaulting to
mysql-server 5.6:
https://docs.travis-ci.com/user/build-environment-updates/2016-12-01/
...which means we can skip the 20-25s install of mysql-server-5.6.
The issue with Java on the newer image has now been fixed, so we no
longer need to remain on the older (`group: deprecated`) release.
This is required because the older version of grunt-angular-templates
was hanging with component templates. In order to upgrade everything it
was also necessary to switch from grunt-cache-busting, which is no longer
maintained, to grunt-cache-bust, which is.
To reduce duplication and ensure the configurations remain in sync.
Note: The Vagrant config does set `bind-address` to allow non-localhost
connections, which isn't necessary on Travis, however this is fine since
(a) it's only Travis, (b) the user grants on Travis won't actually allow
non-localhost connections anyway, even if Travis' network settings
allowed connections between test nodes.
In addition:
* Quietens curl's output to avoid progress bar logspam.
* Removes the unnecessary dpkg option `--force-confnew` (since it's only
needed if overwriting an existing Elasticsearch installation).
* Reduces the number of places where we duplicate the version number.
The latest versions of libmysqlclient 5.5/5.6 (used by mysqlclient) are
still vulnerable to TLS stripping, even after last year's backports of
5.7.x fixes:
- https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-3152
- http://bugs.mysql.com/bug.php?id=82383
Ideally we'd just use the standalone Connector/C library instead of the
libmysqlclient packages, however the latest release is too old:
- http://bugs.mysql.com/bug.php?id=82448
Heroku's cedar-14 stack comes with libmysqlclient 5.5.x, so until it is
updated to 5.7.x (see https://github.com/heroku/stack-images/pull/38) we
must manually vendor 5.7.x ourselves, so that connections between the
Heroku dynos and our public RDS instances are secure. We can do this and
still remain on MySQL server 5.6, since newer client releases are
backwards compatible with older server versions.
Whilst the Vagrant/Travis MySQL instances don't use TLS (and so aren't
affected), we still want them to use libmysqlclient 5.7, to be
consistent with production.
Installing the newer libmysqlclient isn't sufficient on it's own. Any
packages compiled against the older version (in our case mysqlclient)
need to be recompiled. We ensure this happens by pip uninstalling the
existing package if it was already installed.
The `manage.py check --deploy` command causes the Django models to be
initialised, which requires Elasticsearch to be accessible, if
`ELASTICSEARCH_URL` is set. The variable is set globally for the Travis
runs, however for the linters job (which is where `check --deploy` was
previously being run) Elasticsearch isn't actually installed.
Whilst we could have just unset `ELASTICSEARCH_URL` for the linters job,
it's best to move the test to the main Python job chunk, since:
* The command doesn't only hit Elasticsearch, but also the DB, and the
linters job is running the older mysql 5.5, rather than mysql 5.6.
* Whilst Elasticsearch isn't currently enabled for anywhere other than
the prototype Heroku instance, we'll soon be using it everywhere, so
should get `check --deploy` to actually test what we plan to deploy.
Note: The exception during `check --deploy` was not turning the Travis
run red, due to `check --deploy` being piped to awk. When we update to
Django 1.10 the awk workaround can be removed, however in the meantime
we can use the bash `pipefail` option to ensure the exit code from the
`manage.py` command propagates through to the one that Travis sees.
Add support for matching test failures where the test, subtest, status,
and expected status are all exact matches, but the message is not an
exact match. The matching uses ElasticSearch and is initially optimised
for cases where the messages differ only in numeric values since this is
a relatively common case.
This commit also adds ElasticSearch to the travis environment.
Since they are now taking much longer than before (40% of the total
test suite runtime). Prior to this change, chunk A was taking ~9mins
and chunk B ~4 mins. After, the chunk times should be ~{4,4,6}.
Since if not specified it defaults to `/usr/bin/python`, which on
Travis Precise containers is 2.7.6 and Trusty GCE is 2.7.3, since the
Python 2.7.11 install they provide is not system-installed but instead
just placed on the `PATH`.
This means that until now we've not actually been testing using the
same version of Python as has been running in production :-/
Test names, messages, etc. may contain UTF8 characters from beyond the
Basic Multilingual Plane ("astral" characters). Unfortunately MySQL's
"utf8" character set is nothing of the sort and will only store a
maximum of three bytes per character, thus restricting it to BMP
characters. The correct fix to this is to switch to the utf8mb4
character set. Since such a change is somewhat involved, however, we
address the immediate problem with a hack.
When storing failure lines, if the operation fails for character set
related reasons, try again with any non-BMP characters replaced by a
marker of the form <U+codepoint> e.g. <U+10FFFF>.
Note further that whether or not MySQL fails here or silently replaces
each byte of the original character with a U+FFFD replacement character
depends on the value of the sql_mode setting. If this is set to
STRICT_ALL_TABLES, we get an error, otherwise silent data
loss. Therefore it is important this setting is consistent across all
environments.
`manage.py check --deploy` is now run during Travis testing and as part
of stage/prod/Heroku deployment. It checks for a number of common
configuration mistakes & ensures security best practices are being
followed:
https://docs.djangoproject.com/en/1.8/ref/checks/
Since it's now supported:
https://blog.travis-ci.com/2016-05-03-caches-are-coming-to-everyone
This will reduce the job runtime by 60-80s.
Similar to the container job (job chunk 1), since we're now caching the
virtualenv, we have to workaround the default virtualenv being polluted
with existing packages (by creating a clean one in another directory),
to avoid constant cache churn.
As of pip 8, peep has now been integrated into pip.
Migrating from peep to this native feature has several advantages:
* It avoids the complexity/learning curve of using a wrapper around pip.
* It means we do not need to fork the official Heroku Python buildpack
(which handles pip installation of requirements files) in order to use
hash verification on Heroku. (Once the buildpack updates to pip 8.)
* Omitted sub-dependencies result in install-time errors rather than
the user discovering omissions at run-time.
* pip's native caching is used, and all packages are installed in one
pip invocation, so it's significantly faster.
* It has better handling of errors and corner cases.
Key facts about the native feature:
* hash-checking mode is enabled if at least one hash is found in the
requirements files passed to pip, or can be force enabled by passing
`--requires-hashes` when running `pip install`.
* Once enabled, hash-checking mode enforces that all packages:
- are pinned to a specific version
- have hashes listed
- have all sub-dependencies specified
* Older versions of pip will error out if either `--require-hashes` or
the requirements file `--hash` syntax is used, meaning it's not
possible to accidentally lose hash-checking protection if the pip used
is older than expected.
For more details, see:
https://pip.pypa.io/en/stable/user_guide/#hash-checking-modehttps://pip.pypa.io/en/stable/reference/pip_install/#hash-checking-mode
The pip version on Travis and in the Vagrant virtualenv has been updated
to 8.0.2 in bug 1241144, and the stage/prod virtualenv in bug 1241519.
The Heroku Python buildpack pip was updated in bug 1241909.
The requirements files hashes were ported using `peep port`, and then
comments/URLs re-added by hand.
So that we can confirm that peep still operates under pip v8, prior to
the transition from peep to pip v8's new hashing feature.
https://pip.readthedocs.org/en/stable/news/
The individual RABBITMQ_* variables are never used on their own, so it
makes more sense to switch to a URL variable that combines all of them.
Travis/Vagrant have also had BROKER URL explicitly set, since we're
generally moving away from having testing defaults set in settings.py.
Once stage/prod have BROKER_URL set, we can remove the fallback to the
old variable names, and also remove defaults entirely, making missing
settings fail fast.
Previously the python chunks were approx 3m55s and 2m55s respectively,
since we couldn't move one of the largest running directories (etl) to
the second chunk, since the tests would fail due to bug 1219922.
After this change the python chunk runtimes are 3m15s and 3m25s.
Since they take approx 35% of the py.test run time. Ideally we'd move
the ETL tests instead of the E2E ones to this separate job (since they
are more related to the log parsing tests than some of the others, and
would make the split more even), but they currently break when run
standalone, so we'll do that once bug 1219922 is fixed.
The python-tests job is run on non-container infra so can't use caching.
Peep install takes 60s compared to 30s for pip (part hashing cost, part
peep's unavoidable design inefficiencies due to not being built into
pip), which is painful given that we do a fully install every time due
to lack of caching.
The linters step (which can use caching) is already testing the peep
install works, so we can fall back to plain pip for this job part to
save an extra 30s from the runtime.
This also adds the `language: python` key which was the missing
ingredient to getting a virtualenv set up for us. Now that we're not
using peep, it doesn't matter that the provided virtualenv is using
pip v7.x (which is incompatible with peep).
Move the JS karma tests and the grunt build smoketest to their own job,
that runs on the container infra. The results are now available after
~50s rather than ~4mins previously. Ad added advantage is that since
we can now use `language: node_js`, we can pin to a specific version,
reducing the chance of breakage.
In addition, the python test run now no longer needs to npm install
(which isn't cacheable) and doesn't have the JS test/grunt build
runtime, saving ~120s.
Since this job sets `sudo: false`, it will run on the container infra,
so be able to use caching. The `env: linters` is so that a nice label
appears in the Travis sub-job UI, eg see:
https://travis-ci.org/mozilla/treeherder/builds/87763293
We install the full set of node and python dependencies, since
hardcoding just the packages we need here could lead to the versions
getting out of sync with package.json/the pip requirements files. In
addition isort requires a populated virtualenv, and npm sucks if
passing package names directly (it reinstalls them even if installed).
It's worth noting that with the cache, full dependency installation
only takes 2.5s, so this is all moot.
This job (once it has populated the cache) now only takes 24s
end-to-end, whereas previously it would have been ~3.5 mins before the
linters steps had completed.
Now that we have better integration between pytest and django we don't
need to create the test db upfront. The test db will be created on the
first test that requiring it,
pytest-django doesn't setup a test database for every single test, but
only for those tests that actually require a db. Tests that require a db
need to either be marked with `@pytest.mark.django_db` or use a fixture
that has a dependency on `db` or `transactional_db`.
Using a non transactional db would make tests execution much faster, but
unfortunately it doesn't play well with the treeherder datasource
creation so I used a transactional_db.
pytest-django also allows you to specify a settings file to use for
tests in a pytest.ini file, which is nicer than monkeypatch the original
settings file in the pytest session start function 😃.
We were previously using the same database (test_treeherder) for both the
jobs and reference data model. I centralized the new db name in the test
settings file. All the test requiring the jobs db or its repository counterpart
can now access it using the `test_project` fixture, while utility functions use
directly the metioned setting. Where the project name is hardcoded in a static
file, I just replaced it with the new name `test_treeherder_jobs`
Stage/production/Vagrant/Heroku's RDS all use mysql 5.6, however Travis
is currently running v5.5. Installing mysql 5.6 manually on the Travis
container infra is currently broken:
https://github.com/travis-ci/apt-package-whitelist/issues/1206#issuecomment-149884653
To use sudo (for apt-get) we either have to fall back to the legacy
non-container infra, or else use the new Trusty beta infra:
http://blog.travis-ci.com/2015-10-14-opening-up-ubuntu-trusty-beta/http://docs.travis-ci.com/user/trusty-ci-environment/#Runtimes
The Trusty beta infra is also non-container, but at least isn't EOL.
Unfortunately similar to the legacy non-container infra, it doesn't
offer caching, so incurs a setup time penalty of approx 3 minutes
(including the mysql 5.6 install, npm install and peep install). See:
https://github.com/travis-ci/travis-ci/issues/4997
If/when the container infra uses mysql 5.6 by default, or the bug
preventing installing it using the apt travis.yml option is fixed, we
should switch back to the container infra, to speed up the Travis run.
In this commit, we use `--user` with the peep install, since the
non-container infra doesn't set up a virtualenv, and we cannot use sudo
due to:
https://github.com/travis-ci/travis-ci/issues/4989
In addition, the current user ("travis") now doesn't have permissions to
create the Treeherder DB, so we have to use `-u root` (the password is
blank).
The new infra is running Python 2.7.10 (rather than the v2.7.9 of the
container infra) which now matches what runs in the Vagrant environment.
In future bugs we should update stage/prod and Heroku to 2.7.10 too.
Previously if TREEHERDER_DJANGO_SECRET_KEY was not set, we'd silently
fall back to a default value for SECRET_KEY, meaning we wouldn't realise
we were using an insecure key on a live deployment instance.
With this change, TREEHERDER_DJANGO_SECRET_KEY being missing from the
environment is fatal, resulting in:
"ImproperlyConfigured: The SECRET_KEY setting must not be empty."
The MPL 2.0 terms state that as long as a LICENSE file is present, the
per-file header text is not required. See "Exhibit A" at the end of:
https://www.mozilla.org/MPL/2.0/
dj-database-url extracts DB host, port, username, password and database
name from the env variable 'DATABASE_URL' (unless another env variable
name is specified). If the env variable is not defined, it falls back to
the default passed to dj_database_url.config().
This means for Heroku and similar we can replace the multiple DB env
variables with just one URL for default & one for read_only.
This also effectively makes the setting of the read only DB variable
mandatory for stage/production/heroku, since DEFAULT_DATABASE_URL won't
be valid for them - so prevents us inadvertently not using the read only
DB.
The deployment script also had to be updated, so that we set the
prod/stage-specific environment variables before using manage.py, since
dj-database-url cannot rely on what's in the stage/prod local.py config
(which isn't a bad thing, since we're deprecating that file).
dj-database-url extracts DB host, port, username, password and database
name from the env variable 'DATABASE_URL' (unless another env variable
name is specified). If the env variable is not defined, it falls back to
the default passed to dj_database_url.config().
This means for Heroku and similar we can replace the multiple DB env
variables with just one URL for default & one for read_only.
This also effectively makes the setting of the read only DB variable
mandatory for stage/production/heroku, since DEFAULT_DATABASE_URL won't
be valid for them - so prevents us inadvertently not using the read only
DB.
Before this is deployed, we'll need to update the stage/prod puppet
configs & Heroku settings to add the new environment variable.
Since it only speeds up parsing by a few percent of total runtime, and
is therefore not worth the added complexity for deployment and local
hack-test-debug cycles when working on the log parser.
The .gitignore and update.py entries will be removed in a later commit,
once the stage/prod src directories have been cleaned up.
Sets the default values (and now also those used by Vagrant) to the same
as those used by Travis, so we can avoid specifying different values all
over the place.
There's no need to make multiple calls to peep - we can just combine
them into one. Not changing the puppet instances for Vagrant, since the
calls are made in two separate puppet modules and so would require a bit
of refactoring, which is going to occur in bug 1074151 and friends.
This merges the service and UI Travis configs, to get the Karma UI tests
running on Travis in the new repo. We can only set 'language' to one
value, however that doesn't matter, since nodejs is installed by default
and all the 'language: node_js' did was set a few default build cycle
steps - and we can define those ourselves manually.
We install the deps using npm install, ensure they are cached by adding
the node_modules directory to the cache list, get xvfb running for Karma
(see http://docs.travis-ci.com/user/gui-and-headless-browsers/) and use
|npm test| to run Karma using karma.conf.js.
The end to end tests (karma-e2e.conf.js) are not currently running, same
as before the repo merge.
We want to start using peep in production, to alleviate security
concerns with the idea of auto-updating packages from PyPI on deploy.
As a first step, we switch to using peep in the Vagrant environment,
on Travis and in the Docker build - so we can confirm the hashes are
correct.
Close bug 1143350.