Previously test_new_job_in_exclusion_profile was attempting to download
logs from ftp.mozilla.org, due to the log parser not being mocked, which
caused intermittent test timeouts on Travis.
The Heroku pre_compile script is currently run prior to the cache being
restored (https://github.com/heroku/heroku-buildpack-python/pull/321),
which means we have to tweak PATH so vendor-libmysqlclient.sh can find
the binaries from the cache instead of the app directory.
However the workaround added in #1770 only added one of the two extra
required PATHs, this adds the other.
Prior to this the buildpack compile would output:
> ./bin/vendor-libmysqlclient.sh: line 65: pip: command not found
...and so wouldn't purge the old mysqlclient package, which is needed to
force recompilation against the newer libmysqlclient.
Once the PR against heroku-buildpack-python is merged, these workarounds
can be removed.
If mysqlclient has been compiled against a vulnerable version of
libmysqlclient then this test will fail. There is overlap between this
and our custom Django system check for ensuring mysqlclient has been
compiled against libmysqlclient >= 5.7.11, however there advantages in
having both:
* the system check is run during deploy, unlike this test
* however this test is more thorough since it actually checks TLS
behaviour and not just version numbers (but this method cannot be used
in the system check run during production deployment, since it relies on
having a MySQL server instance that doesn't support TLS, to emulate the
TLS being stripped by an attacker)
This registers a custom Django system check (that is run as part of
`./manage.py check` during testing/deploys, and also prior to commands
such as migrate), to check that mysqlclient has been compiled against a
version of libmysqlclient that isn't vulnerable to TLS stripping. See:
https://docs.djangoproject.com/en/1.8/topics/checks/#writing-your-own-checks
The latest versions of libmysqlclient 5.5/5.6 (used by mysqlclient) are
still vulnerable to TLS stripping, even after last year's backports of
5.7.x fixes:
- https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-3152
- http://bugs.mysql.com/bug.php?id=82383
Ideally we'd just use the standalone Connector/C library instead of the
libmysqlclient packages, however the latest release is too old:
- http://bugs.mysql.com/bug.php?id=82448
Heroku's cedar-14 stack comes with libmysqlclient 5.5.x, so until it is
updated to 5.7.x (see https://github.com/heroku/stack-images/pull/38) we
must manually vendor 5.7.x ourselves, so that connections between the
Heroku dynos and our public RDS instances are secure. We can do this and
still remain on MySQL server 5.6, since newer client releases are
backwards compatible with older server versions.
Whilst the Vagrant/Travis MySQL instances don't use TLS (and so aren't
affected), we still want them to use libmysqlclient 5.7, to be
consistent with production.
Installing the newer libmysqlclient isn't sufficient on it's own. Any
packages compiled against the older version (in our case mysqlclient)
need to be recompiled. We ensure this happens by pip uninstalling the
existing package if it was already installed.
This is required in order to create a unique index on title,
value and job_id to prevent duplicates. The index will be
created in a later PR.
This also uses update_or_create instead of get_or_create as
this will be the mechanism going forward to prevent duplicates.
* Bug 1292270 - Pass a User object down to JobManager.update_after_verification.
This is required to create the BugJobMap instance in the post-datasource world.
Also to the machine_platform table.
This is necessary because we use a get_or_create() on these
tables, but without the unique index, we can (and did) get
duplicates which then blocked data ingestion of jobs on try.
We need a 'branch' field on the repository so that we can determine
which repo to use for incoming resultsets from pulse exchanges. In
the past, projects like gaia-taskcluster have had their own maps of
github repositories/names/branches to Treeherder projects. But
Treeherder should be the one owning that mapping. The only thing on
this table was the branch that's used. So here it is.
The "branch" field will default to "master" which is appropriate for
several of the repos. But a few will need more custom values set.
These are laid out in the fixtures/repository.json file. But they
will need to be manually entered into the databases on Prod/Stage
and Heroku.
Since whilst most of the time there will be no port specified (and so
they'll be equivalent), it's really just the hostname we want, so let's
be clear about it.
urlparse's `netloc` attribute (which I'd copied from the `SITE_HOSTNAME`
usage elsewhere in settings.py) includes the port number as well as the
hostname, and so was causing the SCL3 hostname check to not match,
meaning TLS was enabled for celery on SCL3 when the rabbitmq instance
there doesn't support it.
Celery uses Kombu to connect to the RabbitMQ instance, which defaults to
not enabling SSL, unless the URL scheme is `amqps://` or the query
string contains `ssl=1` / `ssl=true`.
On Heroku we're using CloudAMQP, who don't use either string in their
automatically defined `CLOUDAMQP_URL` environment variable, so we must
set the Celery preference `BROKER_USE_SSL` to ensure TLS is still used:
http://docs.celeryproject.org/en/latest/configuration.html#broker-use-ssl
I've contacted CloudAMQP to encourage them to use `amqps://` in their
URLs, however even if they do switch, using `BROKER_USE_SSL` is a
sensible defence-in-depth measure we should take regardless.
TLS support isn't set up for the rabbitmq servers on SCL3, Travis or in
the Vagrant environment, so `BROKER_USE_SSL` must not be set there. In
the future we may decide it's worth the effort to use self-signed
certificates to add support for TLS to Travis/Vagrant too.
Now that Treeherder's data ingestion process doesn't hit it's own API:
* `./manage.py runserver` is less susceptible to memory issues.
* The runserver/gunicorn process doesn't need to be running whilst the
data ingestion takes place.
Since:
* It avoids unnecessary code duplication.
* Avoids us accidentally making any new API endpoints writable to
anonymous users (since if not specified, the default is `AllowAny`).
* Makes it easier to temporarily block API access in the case of
maintenance (eg for bug 1277304), since there are fewer places where
`permissions_classes` will need updating.
The previous message implied the client_id wasn't known at all, when in
reality it could either be unknown or just not authorised. In the future
we may allow authenticating even if the credentials are not 'authorised'
(for example for higher rate limits for GETs, even if the user isn't
permitted to make POSTs to job endpoints etc), but for now we should try
and avoid confusion.
The tests previously used the `HasHawkPermissions` permissions class,
which is not actually used in Treeherder at all. More useful would be to
test `HasHawkPermissionsOrReadOnly`, since that is used. The only
difference is that for the latter, read-only requests (such as GETs) are
allowed to succeed even if no credentials were provided.
Note however, that even if a request is read-only, if incorrect
credentials are given (or that user isn't 'authorised'), then the
request won't succeed regardless.
Since anonymous GETs are now allowed to succeed, the expected response
content has been changed to something more generic to avoid confusion.
This change makes `HasHawkPermissions` unused, and so it will be merged
into `HasHawkPermissionsOrReadOnly` later.
Since currently only no-auth GETs are tested. In a later commit, the
GET case will be made to succeed, since the test will instead use the
`permissions_class` of `HasHawkPermissionsOrReadOnly`, which is actually
what the rest of Treeherder uses.