Граф коммитов

245 Коммитов

Автор SHA1 Сообщение Дата
Frank Bertsch ad47959239 Prefix constants with AUTH0 2018-11-07 08:57:25 -06:00
Frank Bertsch 2ca728a01c Remove talisman; add headers explicitly 2018-11-07 08:57:25 -06:00
Frank Bertsch 79f4b7d800 Add authorization header 2018-11-07 08:57:25 -06:00
Frank Bertsch ece0d725c7 Add Talisman for CORS headers 2018-11-07 08:57:25 -06:00
Frank Bertsch dfaf2d3b1b WIP: Authorization for release aggregates 2018-11-07 08:57:25 -06:00
Frank Bertsch f1ed189361 Aggregate event counts 2018-10-01 14:46:52 -05:00
Rob Hudson 1509016ea7 Follow master branch removal of telemetry enabled 2018-09-20 14:37:33 -07:00
Rob Hudson dd1dae49fd Add tests for parquet aggregates 2018-09-20 14:37:33 -07:00
Rob Hudson 26d8ee0a4f Write aggregates to parquet (bug 1345064) 2018-09-20 14:37:33 -07:00
Frank Bertsch de96552e50 Remove trailing slash 2018-09-12 16:07:06 -05:00
Frank Bertsch e5481c85f7 Enable clients without telemetry enabled 2018-09-12 16:07:06 -05:00
Frank Bertsch 5322ba5212 Address review feedback 2018-08-24 10:42:10 -05:00
Frank Bertsch aa11f09bb6 Prevent public viewing of release data
To enable release data, there are two options:
1. Add the probes you wish to display to PUBLIC_RELEASE_METRICS
2. Set the $SHOW_RELEASE_METRICS env var to "True" (and all
   probes will be displayed, with no restrictions)
2018-08-24 10:42:10 -05:00
Frank Bertsch 72873a3999 Add 404 descriptions for easier debugging 2018-08-24 10:42:10 -05:00
Rob Hudson e7194b5cde Add `osVersion` to allowed query strings (bug 1481832) 2018-08-15 09:08:04 -07:00
Rob Hudson 1bf46275f7 Aggregate Dev Edition (bug 1476323) 2018-08-09 13:24:48 -07:00
Jeff Klukas ca8a6bdf61 Bug 1474590 Client-side logging for database warnings 2018-07-30 10:50:25 -04:00
Jeff Klukas e2de6e9b1e Bug 1472621 Truncate aggregates on bigint overflow
This logs warnings in PostgreSQL whenever we truncate,
including the pre-truncation value so we have evidence of how
severe the truncation is.
2018-07-10 09:38:11 -04:00
Frank Bertsch 04938aeeb6 Update comment for null dimensions 2018-06-13 13:19:22 -07:00
Frank Bertsch 9d31d64ec8 Bug 1467860 - Ignore aggregates with null chars
This extends the "ignore null labels"
to any dimensions with null values.
2018-06-13 13:19:22 -07:00
Frank Bertsch f628b2adff Blacklist dynamic event summaries 2018-05-02 11:38:21 -05:00
Frank Bertsch c09a6d8b1a Bug 1451779: 404 event summaries 2018-05-02 11:38:21 -05:00
Rob Hudson fb1e249e3a Bump python_moztelemetry version and update code 2018-04-10 15:30:24 -07:00
Rob Hudson dcc0b0f3b8 Import Histogram from where it is defined 2018-04-10 12:31:56 -07:00
Chris H-C 14acfd0db4 bug 1451779 - Do not aggregate event counts scalars r?frank 2018-04-05 13:00:04 -05:00
Rob Hudson 8ace3284af Remove e10s from aggregates dimensions (bug 1441586) 2018-03-29 21:13:35 -07:00
Rob Hudson 386ca7177d Misc code cleanup 2018-03-29 21:13:35 -07:00
Rob Hudson db80847a9e Remove aurora 2018-03-20 08:26:39 -07:00
Rob Hudson bd1e131795 Don't use cloudwatch logs in development 2018-03-14 16:02:44 -07:00
Rob Hudson 4672fd68a2 Use nightly histogram URL as default 2018-03-14 09:32:03 -05:00
Rob Hudson e4476c92db Cleanup on aisle pep8 2018-03-14 09:32:03 -05:00
Harold Woo b14172c9a5 [Bug 1342947] - Refactor mozaggregator service to use flask dockerflow 2018-03-09 10:39:33 -08:00
Rob Hudson 8f57ca7aa6 Fix test 404s for missing metrics 2018-03-08 23:36:26 -06:00
Rob Hudson 229c08a6e5 Rewrite service tests (bug 1342027)
By using Flask's built-in app we can avoid running a server for the tests.
2018-03-08 23:36:26 -06:00
Frank Bertsch 4321495521 Make missing scalars 404
Bug 1432547
2018-01-23 12:55:41 -06:00
Chris H-C 8ae5ba5050 Remove aggregator-side use counter handling.
bug 1412384

Now that the service is overwriting the false values this code so carefully
(and incorrectly) crafts, we can just remove it. And the test.
2018-01-17 10:07:03 -05:00
Chris H-C ad74f877c7 Fix Use Counters at the Service Level
bug 1412382

Use counters in the database are wrong, and have been since bug 1204994 changed
_extract_main_histograms. It only counted "False" values in pings that had at
least one "True" value (because if there were only "False" values, we didn't
send a use counter in that ping).

This fixes this by subbing in the False value from
(TOP_LEVEL_)CONTENT_DOCUMENTS_DESTROYED, which has the correct number.

This results in an interesting side-effect that use counters that don't exist
will get valid responses from the service. This is because the service can't
tell the difference between a use counter that doesn't exist and one that just
didn't happen to have a single 'True' value in that row in that table.

This is beneficial for testing, so that I don't have to manipulate
histograms_template or generate_payload in dataset.py to be able to support
probes that aren't in every ping. The math for expected_count in test_db
would only get worse.

Instead I can ask for a use counter that doesn't exist and ensure that it
reports the correct number of False values and the correct count.
2018-01-17 10:07:03 -05:00
Frank Bertsch 8bb49e0e08 Make build_id insertions add histograms correctly
Bug 1403994

Previously, old build_id histograms would merge with new ones
by just adding; but instead we want to first add the histograms,
then the sums and counts separately.

Unfortunately this means that [0] did not fix historical build_id
aggregates. Those are broken in the db for good.

[0] https://github.com/mozilla/python_mozaggregator/pull/57
2018-01-12 13:36:25 -06:00
Frank Bertsch cba42043d7 Aggregate histograms of different lengths correctly
This separates aggregating the histogram buckets (keys 0:n-2)
from the sum and ping counts (keys n-1 and n). This way we can
add new buckets without the new values polluting those last
two buckets.

See bug 1403994
2018-01-10 15:39:08 -06:00
Frank Bertsch 1bde537dd0 Ignore NULL main histogram representations
Previously, an edge case for a NULL histogram caused the job to
fail.

See bug 1425464
2017-12-18 14:55:54 -06:00
Frank Bertsch dc09968003 Fix build
* Fix travis issue https://github.com/travis-ci/travis-ci/issues/7940

* Remove expired histogram usage

The new keyed histogram should never expire
2017-12-18 14:10:38 -06:00
Frank Bertsch 3c286a59c1 Ignore stream describe throttling 2017-05-22 11:26:16 -05:00
Frank Bertsch 0859301595 Ignore cloudwatch log upload failures
Also fixes a bug where the response from cloudwatch logs is
never set.
2017-05-15 18:35:47 +01:00
Frank Bertsch cdf5e74001 Add read replica for service 2017-03-14 07:49:05 +00:00
Frank Bertsch 817c6b8f1f Use moztelemetry Scalar implementation 2017-03-08 11:20:04 -05:00
Frank Bertsch 88b781ab5d Add Cache Control and ETag headers
This change implements both cache control and ETag headers.

For Cache-Control:
For all requests but submission-date aggregates, the max-age
is set until the data is kicked from the local cache
(we know the response won't change until then).

For submission-date aggregates, max-age is always set to 24h.

ETags:
Etags are not set on any requests but submission-date aggregates.
The ETags are the same for all values, since submission-date
aggregates will never change, unless we do a backfill.

Thus, the single ETag value can be updated, invalidating all
previous ETags. This should only be done after a backfill.
2017-03-07 11:59:06 -05:00
Frank Bertsch 889a3a573d Aggregate process and gpu scalars 2017-03-07 05:26:47 -05:00
Frank Bertsch 0931f319a6 Enable content/parent process types
In https://github.com/mozilla/python_mozaggregator/pull/29, we
added the new processes to the get_filter_options, but not to
get_metrics (so they weren't correctly transformed for querying
the database). This change transforms content => true and
parent => false for child GET params.
2017-02-23 12:38:32 -06:00
Frank Bertsch 70c7fd59ff Change logging to include Origin
Previously, logging was done for referer URL and referer.
Unfortunately, the data was not properly accessed, so the fields
were never present. This change fixes that by properly retrieving
HTTP Referer, and also using HTTP Origin.
2017-02-23 13:38:02 +00:00
Frank Bertsch ee4ffc30ab Fix retrieval of child/parent histograms
With the change to add gpu histograms, the values for "child"
went from true/false to "true"/"false"/"gpu". This means that
querying for {"child": True} only returns data from before
we made that change [0].

With this change, we now query for both {"child": True}
and for {"child": "true"}. The database function is
also backwards compatible, just in case anyone is running
an older version of the service.

[0] https://github.com/mozilla/python_mozaggregator/pull/29
2017-02-22 09:41:12 -06:00
Frank Bertsch 580e593c7a Aggregate Fennec saved-session pings 2017-02-21 16:40:58 -06:00
Frank Bertsch 1073d73494 Replace papertrail logging with CloudWatch Logs 2017-02-17 17:50:57 +00:00
George Wright e1d897fe98 bug 1314227 - Support payloads with GPU process Histograms
Original implementation by :gw280, credit to him. (I just shined it up a tad)

GPU processes are coming online for various clients, so we'd like to see their
measurements in places like tmo.

Add child=(gpu, content, parent) values to filter sets, with content and parent
mapping to true and false, respectively, to maintain backward compatibility.

Needed to increase the number of pings per dimension in the tests so we can
test that two new-style pings can aggregate the gpu process information
properly.

Also, properly check that each process_type has the appropriate counts in the
tests. Previously we were getting away with just checking that the entire sum
was correct, when we could have done better.
2017-02-03 07:29:25 -06:00
Frank Bertsch 3a466853f8 Make prefixes and labels constant 2017-01-11 08:49:46 -06:00
Frank Bertsch 02f01af61e Get scalar descriptions from Scalars.yml
The class that is added here will eventually be moved to
python_moztelemetry, and imported.
2017-01-11 08:49:46 -06:00
Frank Bertsch a406c2e876 Ignore "browser.engagement.navigation" scalars
Business development requires that these scalars not be made public
2017-01-11 08:49:46 -06:00
Frank Bertsch a20c7825ab Aggregate keyed scalars 2017-01-11 08:49:46 -06:00
Frank Bertsch 23724cee51 Aggregate numeric scalars
This change does not include keyed numeric scalars.
2017-01-11 08:49:46 -06:00
Frank Bertsch 31aea0445b Log requests against the service
We want to be able to track how the service is being used. This analysis
will aid in future iterations of the aggregator.
2016-12-30 06:11:58 -06:00
Frank Bertsch e8dad504e9 Use main pings instead of saved_session
Saved_session pings are being deprecated.
However, main pings are both opt-in and opt-out, while saved_session
were just opt-in. For this reason we are filtering to include only
opt-in users, so the results should be similar.

Note that results will not be identical, since saved_session pings
often lag main pings, due to the main ping submission on date split.
2016-12-15 10:22:24 -06:00
Frank Bertsch e821905701 Handle null unicode char in labels
Postgres doesn't handle char \u0000, so we'll have to ignore it.
We can't just use printable chars, since there aren't any requirements
for keyed histograms labels to be limited to those.
2016-12-08 14:54:02 -06:00
Roberto Agostino Vitillo 2ef938d4d1 Revert "Use main pings instead of saved_session"
This reverts commit da1f17a72e. The
change has caused catastrophic failures of the aggregation job.
2016-11-25 07:32:22 +00:00
Roberto Agostino Vitillo 7db8ca2767 Deal with unexpected types when checking if telemetry is enabled. 2016-11-24 14:13:51 +00:00
Frank Bertsch da1f17a72e Use main pings instead of saved_session
Saved_session pings are being deprecated.
However, main pings are both opt-in and opt-out, while saved_session
were just opt-in. For this reason we are filtering to include only
opt-in users, so the results should be similar.

Note that results will not be identical, since saved_session pings
often lag main pings, due to the main ping submission on date split.
2016-11-23 15:36:38 +00:00
Frank Bertsch 0f1c7099ab Use Dataset API
The get_pings API has been deprecated
2016-11-21 09:34:22 -06:00
Chris H-C 5def37884f Bug 1286951 - Update mozaggregator to account for change in childPayloads
When bug 1218576 lands, Firefox will start sending child histograms in
.processes.content.{keyedH|h}istograms
instead of
.childPayloads[i].{keyedH|h}istograms

Handle pings that have child histograms in either (or a mix of both) config.
2016-08-08 10:09:21 -04:00
Roberto Agostino Vitillo 11f4938d97 Ignore keyed histograms with too many labels.
See https://bugzilla.mozilla.org/show_bug.cgi?id=1275010 and
https://bugzilla.mozilla.org/show_bug.cgi?id=1275019.
2016-05-24 08:39:23 +01:00
Roberto Agostino Vitillo 588850c688 Deal with invalid USE_COUNTER_2 histograms. 2016-05-19 10:47:29 +01:00
Roberto Agostino Vitillo 6ff6d3e8fc Ignore pings with non-numeric build-id.
See https://bugzilla.mozilla.org/show_bug.cgi?id=1271961.
2016-05-11 13:21:23 +00:00
Roberto Agostino Vitillo edb4966e7a Ignore pings with invalid environment section.
See https://bugzilla.mozilla.org/show_bug.cgi?id=1271961.
2016-05-11 13:21:19 +00:00
Roberto Agostino Vitillo 01a6ba3c98 Fix BotoClientError when open connection to S3.
When using SigV4 a 'host' parameter must be specified.
2016-03-08 08:46:16 +00:00
anthony 26ddfdc8d4 Fix MIME types, document testing 2016-02-11 11:38:23 -05:00
Roberto Agostino Vitillo f29601ff13 Don't return SEARCH_COUNTS.
SEARCH_COUNTS telemetry is considered confidential data and must not be
 made public, see https://bugzilla.mozilla.org/show_bug.cgi?id=1247303.
2016-02-10 18:12:43 +00:00
Roberto Agostino Vitillo c8ca9a4a7c Use e10sEnabled setting instead of E10S_AUTOSTART.
See https://bugzilla.mozilla.org/show_bug.cgi?id=1111701.
2016-01-19 11:15:45 +00:00
Roberto Agostino Vitillo 331662fbf2 Make sure values fit within a pgsql bigint. 2015-12-04 18:03:03 +00:00
Roberto Agostino Vitillo 1c57d4d299 Make sure values fit within a pgsql bigint. 2015-12-04 17:40:44 +00:00
Roberto Agostino Vitillo ef84b4591f Make sure values fit within a pgsql bigint. 2015-12-04 13:03:51 +00:00
Roberto Agostino Vitillo bed777b1f9 Clean up. 2015-12-03 10:43:59 +00:00
Roberto Agostino Vitillo fe113e228a Disable final vacuum step. 2015-12-03 10:34:58 +00:00
Roberto Agostino Vitillo 2489c4bb8e Read RDS configuration in driver and limit number of reducers. 2015-12-03 07:24:42 +00:00
Roberto Agostino Vitillo 756083fb8f Get rid of the groupBy operation. 2015-12-02 23:00:13 +00:00
Roberto Agostino Vitillo 5e7574306a Bug 1228126
* mozaggregator/aggregator.py:
2015-11-26 13:11:32 +00:00
Roberto Agostino Vitillo 5bf82bdf01 Add new lines. 2015-10-23 12:45:24 -04:00
Roberto Agostino Vitillo 2ea2470f0d Refactor project configuration. 2015-10-22 09:51:32 -04:00
Roberto Agostino Vitillo 938880ac95 Allow histogram names to have lowercase characters (USE_COUNTER2_*). 2015-09-25 12:16:29 +01:00
Roberto Agostino Vitillo da35665ebf Use isinstance instead for type checks. 2015-09-24 17:48:45 +01:00
Roberto Agostino Vitillo 2313187226 Deal with invalid data. 2015-09-23 20:00:20 +01:00
Roberto Agostino Vitillo 526ea9e18f Track both documents and pages for USE_COUNTER2 histograms. 2015-09-23 19:48:55 +01:00
Roberto Agostino Vitillo 6130408770 Add support for USE_COUNTER2_* histograms. 2015-09-23 18:06:01 +01:00
Roberto Agostino Vitillo 0ecb3b1fad Ignore USE_COUNTER2_* histograms until Bug 1204994 lands. 2015-09-16 11:36:48 +01:00
Roberto Agostino Vitillo 3499865471 Redirect http to https. 2015-08-26 13:33:33 +00:00
Roberto Agostino Vitillo dde7a8fe4b Use precomputed sampleid. 2015-08-24 14:18:10 +01:00
Roberto Agostino Vitillo 8171c92d8a Properly handle missing histogram definition. 2015-08-06 13:54:52 +01:00
Roberto Agostino Vitillo 468f2d365f Catch all errors due to a missing definition. 2015-08-06 13:36:10 +01:00
Roberto Agostino Vitillo b88926e79e Bump version. 2015-08-06 13:23:51 +01:00
Roberto Agostino Vitillo ca17305c2c Ignore submissions where extended telemetry is disabled. 2015-08-06 13:00:40 +01:00
Roberto Agostino Vitillo 35cf8b8198 Yield 404 on missing histogram definition. 2015-08-06 12:03:07 +01:00
Roberto Agostino Vitillo 319f6bc063 Issue 404 on empty search result. 2015-08-06 11:57:26 +01:00
Roberto Agostino Vitillo 2e45424987 Use simple cache as long as one process is good enough. 2015-07-29 11:52:44 +01:00
Roberto Agostino Vitillo 40f90a1e21 Speedup database service. 2015-07-29 11:36:38 +01:00