Bug 1403994
Previously, old build_id histograms would merge with new ones
by just adding; but instead we want to first add the histograms,
then the sums and counts separately.
Unfortunately this means that [0] did not fix historical build_id
aggregates. Those are broken in the db for good.
[0] https://github.com/mozilla/python_mozaggregator/pull/57
This separates aggregating the histogram buckets (keys 0:n-2)
from the sum and ping counts (keys n-1 and n). This way we can
add new buckets without the new values polluting those last
two buckets.
See bug 1403994
This change implements both cache control and ETag headers.
For Cache-Control:
For all requests but submission-date aggregates, the max-age
is set until the data is kicked from the local cache
(we know the response won't change until then).
For submission-date aggregates, max-age is always set to 24h.
ETags:
Etags are not set on any requests but submission-date aggregates.
The ETags are the same for all values, since submission-date
aggregates will never change, unless we do a backfill.
Thus, the single ETag value can be updated, invalidating all
previous ETags. This should only be done after a backfill.
In https://github.com/mozilla/python_mozaggregator/pull/29, we
added the new processes to the get_filter_options, but not to
get_metrics (so they weren't correctly transformed for querying
the database). This change transforms content => true and
parent => false for child GET params.
Previously, logging was done for referer URL and referer.
Unfortunately, the data was not properly accessed, so the fields
were never present. This change fixes that by properly retrieving
HTTP Referer, and also using HTTP Origin.
With the change to add gpu histograms, the values for "child"
went from true/false to "true"/"false"/"gpu". This means that
querying for {"child": True} only returns data from before
we made that change [0].
With this change, we now query for both {"child": True}
and for {"child": "true"}. The database function is
also backwards compatible, just in case anyone is running
an older version of the service.
[0] https://github.com/mozilla/python_mozaggregator/pull/29
Original implementation by :gw280, credit to him. (I just shined it up a tad)
GPU processes are coming online for various clients, so we'd like to see their
measurements in places like tmo.
Add child=(gpu, content, parent) values to filter sets, with content and parent
mapping to true and false, respectively, to maintain backward compatibility.
Needed to increase the number of pings per dimension in the tests so we can
test that two new-style pings can aggregate the gpu process information
properly.
Also, properly check that each process_type has the appropriate counts in the
tests. Previously we were getting away with just checking that the entire sum
was correct, when we could have done better.
Saved_session pings are being deprecated.
However, main pings are both opt-in and opt-out, while saved_session
were just opt-in. For this reason we are filtering to include only
opt-in users, so the results should be similar.
Note that results will not be identical, since saved_session pings
often lag main pings, due to the main ping submission on date split.
Postgres doesn't handle char \u0000, so we'll have to ignore it.
We can't just use printable chars, since there aren't any requirements
for keyed histograms labels to be limited to those.
Saved_session pings are being deprecated.
However, main pings are both opt-in and opt-out, while saved_session
were just opt-in. For this reason we are filtering to include only
opt-in users, so the results should be similar.
Note that results will not be identical, since saved_session pings
often lag main pings, due to the main ping submission on date split.