Граф коммитов

50 Коммитов

Автор SHA1 Сообщение Дата
Ed Morley 6945c4c471
Bug 1165259 - Add infrastructure documentation (#4766)
- Adds a new "infrastructure" section to the docs, which describes
  architecture, administration and troubleshooting (fixes bug 1165259).
- Adds code comments to any deployment-related files in the repository.
- Adds documentation for the various ways in which users can access
  Treeherder data (fixes bug 1335172).
- Reorganises the structure of some of the existing non-infrastructure
  docs, to make the documentation easier to navigate.
2019-03-11 21:39:34 +00:00
Ed Morley 6211b27794
Bug 1513506 - Stop using --maxtasksperchild with log parser tasks (#4751)
Since the memory leaks that made it necessary to restart the workers
after they had processed 50 tasks appear to no longer occur.
2019-03-05 06:50:46 +00:00
Ed Morley 844aa9ea56
Bug 1470243 - Use REMAP_SIGTERM=SIGQUIT with Celery on Heroku (#4747)
As recommended by:
https://devcenter.heroku.com/articles/celery-heroku#using-remap_sigterm

For more context see:
https://github.com/celery/celery/issues/2839
2019-03-04 16:09:07 +00:00
Ed Morley 03c94bf286
Bug 1531271 - Remove unused Celery queues (#4719)
The `store_failure_lines*` and `crossreference_error_lines*` queues have
been unused since the log parsing tasks were combined in #2544.
2019-02-28 17:22:40 +00:00
Ed Morley 0a3131b2fe
Bug 1530206 - Remove the now emptied store_pulse_resultsets queue (#4715)
In #4709 push ingestion was changed to use a new `store_pulse_pushes`
queue (since we now prefer the term "push" instead of "resultset"), with
the old queue left behind to ensure any tasks in it at the time of
deployment would still be run.

Now that the old queue is empty it can be removed.
2019-02-27 17:56:01 +00:00
Ed Morley 5e729a389f Bug 1530206 - Rename read_pulse_* to pulse_listener_*
To make it clearer that the commands are listening to Pulse, as opposed
to the processing/storing of pulse data that occurs later.
2019-02-27 07:12:33 +00:00
Ed Morley 2961c74a60 Bug 1530206 - Rename store_pulse_resultsets to store_pulse_pushes
Leaving behind the old queue/task for now so that any tasks in it at the
time of deployment can be completed.
2019-02-27 07:12:33 +00:00
Ed Morley b832f4002d Bug 1530206 - Heroku: Re-organise the Procfile
* Adds documentation
* Renames several of the `Procfile` process types to be clearer.
* Moves pushlog ingestion into `worker_misc`.
* Combines push and job storing into `worker_store_pulse_data`

Before:
- web
- worker_beat
- worker_pushlog
- worker_store_pulse_jobs
- worker_store_pulse_resultsets
- worker_read_pulse_jobs
- worker_read_pulse_pushes
- worker_default
- worker_log_parser

After:
- web
- celery_scheduler
- pulse_listener_pushes
- pulse_listener_jobs
- worker_store_pulse_data
- worker_log_parser
- worker_misc
2019-02-27 07:12:33 +00:00
Sarah Clements 24bda1ed83
Bug 1508228 - Remove Intermittents Commenter celery task (#4300)
Remove celery task and change Commenter weekly_mode default arg in
preparation for the move to the heroku scheduler
2018-11-27 14:14:38 -08:00
Ed Morley 445766d958
Bug 1443251 - Remove support for buildbot job ingestion (#4087)
The buildapi celerybeat tasks were disabled previously in #4007, so
these tasks are unused.
2018-10-02 11:07:27 +01:00
Ed Morley ffaa2e4b2a
Bug 1443251 - Remove runnable jobs support for buildbot (#4071)
Runnable jobs for buildbot were calculated via a celerybeat task
(that was disabled in #4007) and the results stored in the
`runnable_jobs` table. This can all be removed now that buildbot is
EOL, since the remaining support for Taskcluster runnable jobs does
not use that celery task/Django model.
2018-09-27 19:15:47 +01:00
Ed Morley eae4fa006f
Bug 1492462 - Remove retrigger/cancel APIs and pulse publisher (#4042)
Since as of #3980 (bug 1470622) the frontend no longer calls the
`/retrigger/` `/cancel/` or `/cancel_all/` Treeherder APIs.

Whilst looking at the pulse related fixtures, I spotted that the
`mock_message_broker` fixture was already unused.
2018-09-21 17:39:03 +01:00
Cameron Dawson 982c1ec2fe
Bug 1176492 - Move fetch_bugs and cycle_data to heroku scheduler (#4019) 2018-09-11 13:51:15 -07:00
Ed Morley 1b52b07728
Bug 1445325 - Stop mirroring classifications to OrangeFactor (#3677)
Now that consumers of OrangeFactor have been switched to the new
intermittent failures view UI/API, we can stop submitting failure
classifications to OrangeFactor's Elasticsearch instance.
2018-08-03 00:16:35 +02:00
Sarah Clements 5386f7ed40
Bug 1453760 - bug commentor (#3571)
* Bug 1453760 - bug commentor
replacement for Orange Factor bug commentor
2018-07-20 16:36:40 -07:00
Ed Morley bc0ca97102
Bug 1419965 - Remove the estimated job time remaining backend (#2990)
The UI has already been removed. This cleans up the data ingestion
and removes the `JobDuration` model, however leaves the `running_eta`
field on the `Job` model for the next time that table is touched (since
the table is large, so altering the schema would likely require
downtime).
2017-12-04 22:09:38 +00:00
Alisha Aneja e842ae7701 Bug 1330773 - Use the term 'runnablejobs' rather than 'allthethings' (#2743) 2017-09-03 22:10:56 +01:00
Ed Morley 33fa97ec84 Bug 1387487 - Reduce gunicorn maximum request time to 20 seconds
To ensure that:
* the Heroku router's 30 second timeout doesn't beat gunicorn to it,
  given that routing time is included in Heroku's timing
* the app is more likely to remain responsive, when receiving many
  badly filtered API calls
2017-08-04 18:49:35 +02:00
Max Chehab c6e0c26bc8 Bug 1336272 - Refactor changing wording from resultset to push (#2644)
Change of new environment variable `PULSE_PUSH_SOURCES`.

Keep old `publish-resultset-runnable-job-action` task name by creating a 
method that points to `publish_push_runnable_job_action`.
2017-08-04 09:38:57 -07:00
Ed Morley f6e320aa92 Bug 1340132 - Reduce usage of --maxtasksperchild with celery
Now we're no longer using datasource, the memory leaks previously
seen have gone. As such there is no need to make the celery worker
processes restart so frequently (they will now only be restarted at
the daily Heroku dyno restart), which will improve performance.

In addition, this will reduce the number of transactions that don't get
reported to New Relic (when the maxtasksperchild threshold is
reached, the worker is forcibly killed before it can submit the
New Relic payload).

The log parser tasks have been left unchanged for now, since they
appear to still be leaking.
2017-05-06 17:30:57 +01:00
Ed Morley 6502ec24b1 Bug 1340123 - Stop using --max-requests with gunicorn
Since hopefully now we're no longer using datasource, the leaks should
have gone. The gunicorn processes will now only be restarted at the
daily Heroku dyno restart, rather than multiple times per minute,
improving performance.
2017-02-16 17:38:01 +00:00
jgraham 10677eda26 Bug 1339510 - Remove unused detect intermittents code. (#2169) 2017-02-15 22:27:35 +00:00
Ed Morley 3abb18b46e Bug 1215102 - Adjust Celery settings to match CloudAMQP recommendations
CloudAMQP recommend disabling events/heartbeats/gossip/mingle messages,
in order to reduce unnecessary load on the rabbitmq server:
https://www.cloudamqp.com/docs/celery.html

For settings reference, see:
http://docs.celeryproject.org/en/3.1/configuration.html
http://docs.celeryproject.org/en/3.1/whatsnew-3.1.html#gossip-worker-worker-communication
http://docs.celeryproject.org/en/3.1/reference/celery.bin.worker.html#cmdoption-celery-worker--without-gossip

The Procfile is now becoming exceptionally verbose, however that's a
problem to consider solving in another bug, perhaps via a runner script.
2017-02-13 23:02:06 +00:00
Armen Zambrano G ed8e39221d Bug 1306709 - Add SETA to Treeherder 2017-01-05 14:33:55 -05:00
Ed Morley 4df19bddd2 Bug 1318021 - Vagrant: Stop using log files for gunicorn/celery output
Outputting to the console rather than a log file:
* is more user-friendly during development
* is more consistent with Heroku
* means the Vagrant-specific Django LOGGING config is now closer to the
one in settings.py, and so more easily combined with it

Both gunicorn and celery default to outputting to stdout/stderr, so the
`logfile` options can be omitted entirely.
2016-11-23 11:47:04 +00:00
Ed Morley b6b9966921 Bug 1307785 - Lower gunicorn --max-requests on Heroku to match SCL3
This works around the apparent pre-existing memory leak until we're able
to fix it.
2016-10-05 14:49:21 +01:00
William Lachance 0300fb69ba Bug 1303055 - Increase the number of log processing workers (#1857)
This should increase throughput, especially in cases where we are backlogged
due to network latency.
2016-09-15 11:51:34 -04:00
William Lachance f0105d47f5 Bug 1258861 - Move text log steps and errors into main database (#1696)
This changes ingestion, the API endpoints, and the frontend to match
the new structure. For now we continue to store text_log_summary artifacts,
though they don't do anything anymore.
2016-09-12 12:30:36 -04:00
camd b2e5e714aa Bug 1264074 - Use Pulse for creation of Github resultsets (#1692)
* Bug 1264074 - Move to_timestamp function to a reusable location

* Bug 1264074 - Refactor JobConsumer to have a PulseConsumer super class

Much of what was in the JobConsumer is reusable by the upcoming
ResultsetConsumer.  So refactor those parts out so that each specific
consumer can reuse code as much as possible.

* Bug 1264074 - Add ability to ingest Github Resultsets via Pulse

This introduces a ResultsetConsumer and a read_pulse_resultsets
management command to ingest resultsets from the TaskCluster
github exchanges.

When a supported Github repo has a Pull Request created or
updated, or a push is made to master, then it will kick off a
Pulse message.  We will receive it and then fetch any additional
information we need from github's API and store the Resultset.

This follows a very similar pattern to the Job Pulse ingestion.

* Bug 1264074 - Old code/comments cleanup

* Bug 1264074 - Tests for the Github resultset pulse loader
2016-08-22 16:29:55 -07:00
Cameron Dawson bd366539bf Bug 1266229 - Allocate Heroku dynos for pulse queue reading 2016-06-06 12:09:05 -07:00
Cameron Dawson 3402f00fbc Bug 1266229 - Create store_pulse_jobs queue and prep to turn on Pulse ingestion
Rename ``ingest_from_pulse`` management command to ``read_pulse_jobs`` to
indicate that this step does not actually do any ingesting.  It just populates
the celery queue ``store_pulse_jobs`` that DOES do the actual ingesting.
2016-06-01 16:31:14 -07:00
Ed Morley 92793e569e Bug 1191934 - Remove the now-redundant fetch-missing-pushlogs task
The task was a workaround for us missing pushes, however the root causes
of these have since been fixed.
2016-05-27 15:47:04 +01:00
Ed Morley a54d292673 Bug 1275761 - Remove the now unused high priority log parsing queues
Since they are unused after bug 1273231.
2016-05-26 15:34:55 +01:00
James Graham b98f7a81fa Remove support for parsing mozlog json logs 2016-04-19 19:16:08 +01:00
James Graham 7e4084d24f Bug 1252854 - Backend work for matching unstructured and structured log summary lines. 2016-04-19 19:16:08 +01:00
Ed Morley 859230c9bf Bug 1165229 - Heroku: Move the deploy tasks to their own script
Since we'll soon be adding reporting deploys to New Relic, which will be
too verbose to include in the Procfile. Also adds additional log output
(which follows the buildpack compile log formatting convention) to make
it easier to find & follow the release tasks on Papertrail.

Uses the `set -euo pipefail` recommendation from:
http://redsymbol.net/articles/unofficial-bash-strict-mode/
2016-03-17 11:50:21 +00:00
James Graham b6f4533bd7 Bug 1255087 - Add detect_intermittents and autoclassify queues to celery worker files
This will not cause any additional tasks to actually run since
the code that adds messages to these queues is gated on the
AUTOCLASSIFY_JOBS setting.
2016-03-16 17:46:16 +00:00
Ed Morley bf2b5d1370 Bug 1245472 - Heroku: Update to the new release-phase implementation
The beta release-phase feature is making a backwards incompatible change
today - moving from an app.json (and corresponding buildpack) approach
to specifying a `release` Procfile entry that blocks the app deploy.

For more info, see:
https://devcenter.heroku.com/articles/release-phase?preview=1
2016-02-03 12:33:54 +00:00
Cameron Dawson b6258d9cd4 Bug 1237474 - add error_summary and store_error_summary tasks to Heroku procfile 2016-01-26 09:19:00 -08:00
William Lachance a8e663d61d Bug 1228154 - Generate new performance alerts as data is ingested 2015-12-02 13:20:17 -05:00
Ed Morley 620228cbf3 Bug 1196764 - Rename calculate_eta to calculate_durations
Since we're not calculating ETAs (the UI does that once it knows the
start time and expected duraction), we're calculating recent average
durations instead.
2015-11-30 11:36:18 +00:00
Alice Scarpa 5d9e430cac Bug 1194830: Add a runnable_job API endpoint
This creates a 'runnable_job' table in the database, as well as an API
endpoint at /api/project/{branch}/runnable_jobs listing all existing
buildbot jobs and their symbols. A new daily task 'fetch_allthethings' is
added to update the this table.
2015-11-14 13:56:06 -02:00
Ed Morley a5999ac2b9 Bug 1197186 - Move wsgi.py to a generic config/ directory
Since it's not specific to the Django app 'webapp'.
2015-10-08 19:59:44 +01:00
William Lachance 066f437ca5 Bug 1192976 - Refactor performance data + store in master db 2015-09-14 10:16:25 -04:00
Ed Morley 90ba77e596 Bug 1192801 - Remove per-file MPL boilerplate since it's unnecessary
The MPL 2.0 terms state that as long as a LICENSE file is present, the
per-file header text is not required. See "Exhibit A" at the end of:
https://www.mozilla.org/MPL/2.0/
2015-08-18 23:32:11 +01:00
Cameron Dawson 00cfe6643d Bug 1140349 - Remove the objectstore code
After the previous commit, the Objectstore is effectively "dead code".
So this commit removes all the dead code after anything left over in
the Objectstore has been drained and added to the DB.
2015-07-21 14:13:21 -07:00
Cameron Dawson d4c0de6276 Bug 1164888 - Tune Heroku build4hr handling
By taking the concurrency down to 1, we can control how much memory is
used and increase throughput by assigning more dynos.  This prevents us
overflowing the memory allocations on each dyno due to the growth of
the build4hr process.
2015-06-18 08:23:19 -07:00
Ed Morley 5b74d1d9c1 Bug 1164868 - Split the buildapi tasks into {pending,running,4hr} queues
Previously all three of the buildapi ETL ingestion tasks (pending,
running, 4hr) were run under one queue. In order to be able to tell
issues (such as backlogs or leaks) apart, these have now been split onto
their own queues.

On Heroku, these queues now also have a dyno each - which should mean we
can easily tell which is leaking and possibly also downgrade from the
expensive performance dyno even before the leak is fixed.
2015-05-21 15:44:58 +01:00
Jonathan French 19b71bc4b4 Bug 1164881 - Add MPL2.0 headers to recent treeherder repo files 2015-05-14 11:45:26 -04:00
Mauro Doglio 597282fe58 Bug 1145606 - Setup treeherder to deploy on heroku
I added a Procfile listing all the different python services treeherder needs.
Heroku provides deployment-specific settings via environment variables, so I had to modify the settings file to listen to them where that wasn't the case. I created an enviroment variable IS_HEROKU which allows to have a heroku-only configuration where needed.
The db service is provided by Amazon RDS, which requires a ssl connection. To enable ssl in the MySQLdb python client I had to modify Datasource (and bump up the version used).
The cache service is provided by the memcachier heroku addon. Heroku recommends to use pylibmc, so I set it up according to the docs here https://devcenter.heroku.com/articles/memcachier#python.
The amqp service is provided by the CloudAMQP addon.
I added a post_compile script that runs every time we deploy. We should run every build step we require in there, like static asset minification, collection, etc.
To share the oauth credentials among the various services I used an environment variable. I also added an option to export_project_credentials so that the credentials can be printed to stdout. This should come handy when we will need to update the environment-stored credentials with the ones in the db.
2015-05-14 13:54:41 +01:00