Use approximate client count in GLAM scalar_percentiles_v1 (#3039)
This is a follow-up to https://github.com/mozilla/bigquery-etl/pull/3037 which unblocked `scalar_bucket_counts_v1`. `scalar_percentiles_v1` uses the same source table (`clients_scalar_aggregates_v1`) and started failing today with the same error (disk/memory limits exceeded for shuffle operations). `APPROX_COUNT_DISTINCT` used here runs HLL under the hood. The reason for using it here is that we can't split the aggregation here into two stages as in the aforementioned PR due to quantiles calculation. I have run this query locally and confirmed that it works.
This commit is contained in:
Родитель
98549e3cb8
Коммит
ceda6dd35f
|
@ -23,7 +23,7 @@ percentiles AS (
|
|||
{{ aggregate_attributes }},
|
||||
agg_type AS client_agg_type,
|
||||
'percentiles' AS agg_type,
|
||||
COUNT(DISTINCT(client_id)) AS total_users,
|
||||
APPROX_COUNT_DISTINCT(client_id) AS total_users,
|
||||
APPROX_QUANTILES(value, 1000) AS aggregates
|
||||
FROM
|
||||
all_combos
|
||||
|
|
|
@ -52,8 +52,8 @@ percentiles AS (
|
|||
key,
|
||||
agg_type AS client_agg_type,
|
||||
'percentiles' AS agg_type,
|
||||
COUNT(DISTINCT(client_id)) AS total_users,
|
||||
APPROX_QUANTILES(value, 100) AS aggregates
|
||||
APPROX_COUNT_DISTINCT(client_id) AS total_users,
|
||||
APPROX_QUANTILES(value, 1000) AS aggregates
|
||||
FROM
|
||||
all_combos
|
||||
GROUP BY
|
||||
|
@ -69,7 +69,10 @@ percentiles AS (
|
|||
)
|
||||
SELECT
|
||||
* REPLACE (
|
||||
mozfun.glam.map_from_array_offsets([5.0, 25.0, 50.0, 75.0, 95.0], aggregates) AS aggregates
|
||||
mozfun.glam.map_from_array_offsets_precise(
|
||||
[0.1, 1.0, 5.0, 25.0, 50.0, 75.0, 95.0, 99.0, 99.9],
|
||||
aggregates
|
||||
) AS aggregates
|
||||
)
|
||||
FROM
|
||||
percentiles
|
||||
|
|
|
@ -1,5 +1,9 @@
|
|||
- agg_type: percentiles
|
||||
aggregates:
|
||||
- key: '0.1'
|
||||
value: 4.0
|
||||
- key: '1'
|
||||
value: 4.0
|
||||
- key: '5'
|
||||
value: 4.0
|
||||
- key: '25'
|
||||
|
@ -10,6 +14,10 @@
|
|||
value: 8.0
|
||||
- key: '95'
|
||||
value: 8.0
|
||||
- key: '99'
|
||||
value: 8.0
|
||||
- key: '99.9'
|
||||
value: 8.0
|
||||
app_build_id: '*'
|
||||
app_version: 84
|
||||
channel: '*'
|
||||
|
|
Загрузка…
Ссылка в новой задаче