Use approximate client count in GLAM scalar_percentiles_v1 (#3039)

This is a follow-up to https://github.com/mozilla/bigquery-etl/pull/3037 which unblocked `scalar_bucket_counts_v1`.
`scalar_percentiles_v1` uses the same source table (`clients_scalar_aggregates_v1`) and started failing today with the same error (disk/memory limits exceeded for shuffle operations).

`APPROX_COUNT_DISTINCT` used here runs HLL under the hood. The reason for using it here is that we can't split the aggregation here into two stages as in the aforementioned PR due to quantiles calculation.

I have run this query locally and confirmed that it works.
This commit is contained in:
akkomar 2022-06-21 16:55:08 +02:00 коммит произвёл GitHub
Родитель 98549e3cb8
Коммит ceda6dd35f
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
3 изменённых файлов: 15 добавлений и 4 удалений

Просмотреть файл

@ -23,7 +23,7 @@ percentiles AS (
{{ aggregate_attributes }},
agg_type AS client_agg_type,
'percentiles' AS agg_type,
COUNT(DISTINCT(client_id)) AS total_users,
APPROX_COUNT_DISTINCT(client_id) AS total_users,
APPROX_QUANTILES(value, 1000) AS aggregates
FROM
all_combos

Просмотреть файл

@ -52,8 +52,8 @@ percentiles AS (
key,
agg_type AS client_agg_type,
'percentiles' AS agg_type,
COUNT(DISTINCT(client_id)) AS total_users,
APPROX_QUANTILES(value, 100) AS aggregates
APPROX_COUNT_DISTINCT(client_id) AS total_users,
APPROX_QUANTILES(value, 1000) AS aggregates
FROM
all_combos
GROUP BY
@ -69,7 +69,10 @@ percentiles AS (
)
SELECT
* REPLACE (
mozfun.glam.map_from_array_offsets([5.0, 25.0, 50.0, 75.0, 95.0], aggregates) AS aggregates
mozfun.glam.map_from_array_offsets_precise(
[0.1, 1.0, 5.0, 25.0, 50.0, 75.0, 95.0, 99.0, 99.9],
aggregates
) AS aggregates
)
FROM
percentiles

Просмотреть файл

@ -1,5 +1,9 @@
- agg_type: percentiles
aggregates:
- key: '0.1'
value: 4.0
- key: '1'
value: 4.0
- key: '5'
value: 4.0
- key: '25'
@ -10,6 +14,10 @@
value: 8.0
- key: '95'
value: 8.0
- key: '99'
value: 8.0
- key: '99.9'
value: 8.0
app_build_id: '*'
app_version: 84
channel: '*'