bigquery-etl

Граф коммитов

Автор	SHA1	Сообщение	Дата
Frank Bertsch	65053ad5e1	Fix monthly_searches to account for null (#612 ) * Fix monthly_searches to account for null * Fix missing closing parens * Add UDF to get NULL key	2019-12-19 15:19:10 -05:00
Frank Bertsch	6c825425b3	Search clients last seen (#451 ) * Improve error message for ndjson parsing * Make JSON error messages nicer * Cast BYTES fields to/from string BYTES types are not JSON-serializable. To deal with that, we do two things: 1. Assume the input tables are hex strings, and decode them to get the BYTES fields values (on input) 2. Encode BYTES fields as hex strings (on output) This means that any data files use hex strings for BYTES fields. Note: This only works on top-level fields * Add better discrepancy reporting for test assertions When JSON blobs differ, it can be hard to tell what is wrong. These functions easily show what's different, and automatically prints them to be available when tests fail. * Add search_clients_last_seen for 1-year of history This new dataset, search_clients_last_seen, contains a year of history for each client. It is split into 3 main parts: 1. Recent info that is contained in search_clients_daily, similar to how we store that in clients_last_seen 2. A year of history, represented as a BYTES field, indicating which days they were active for different types of activity 3. Among the major search providers, arrays of totals of different metrics, split into 12 parts, to account for each months total This dataset will power LTV. * Fix linting issues * Enforce sampling on search_clients_daily * Address review feedback - Change all bits/bytes functions to include no. of bits - Use fileobj for tests - Rename some vars - Use base64 for bytes in/out * Generate sql * Add missing comma * Move search_clients_ls to search_derived * Generate moar sql * Use clients_daily_v8 * Fix query * Move tests to search_derived * Fix tests for search_clients_daily_v8 * Don't dryrun with search_clients_last_seen * Update udf/new_monthly_engine_searches_struct.sql Co-Authored-By: Jeff Klukas <jeff@klukas.net> * sample_id is now an int * Add documentation * Update schemas * Make tests use int sample-id	2019-12-12 12:43:09 -05:00

Автор

SHA1

Сообщение

Дата

Frank Bertsch

65053ad5e1

Fix monthly_searches to account for null (#612 )

* Fix monthly_searches to account for null

* Fix missing closing parens

* Add UDF to get NULL key

2019-12-19 15:19:10 -05:00

Frank Bertsch

6c825425b3

Search clients last seen (#451 )

* Improve error message for ndjson parsing

* Make JSON error messages nicer

* Cast BYTES fields to/from string

BYTES types are not JSON-serializable. To deal with that, we do
two things:
1. Assume the input tables are hex strings, and decode them
   to get the BYTES fields values (on input)
2. Encode BYTES fields as hex strings (on output)

This means that any data files use hex strings for BYTES fields.

Note: This only works on top-level fields

* Add better discrepancy reporting for test assertions

When JSON blobs differ, it can be hard to tell what is wrong.
These functions easily show what's different, and automatically
prints them to be available when tests fail.

* Add search_clients_last_seen for 1-year of history

This new dataset, search_clients_last_seen, contains a year
of history for each client. It is split into 3 main parts:

1. Recent info that is contained in search_clients_daily,
   similar to how we store that in clients_last_seen
2. A year of history, represented as a BYTES field,
   indicating which days they were active for different
   types of activity
3. Among the major search providers, arrays of totals of
   different metrics, split into 12 parts, to account for
   each months total

This dataset will power LTV.

* Fix linting issues

* Enforce sampling on search_clients_daily

* Address review feedback

- Change all bits/bytes functions to include no. of bits
- Use fileobj for tests
- Rename some vars
- Use base64 for bytes in/out

* Generate sql

* Add missing comma

* Move search_clients_ls to search_derived

* Generate moar sql

* Use clients_daily_v8

* Fix query

* Move tests to search_derived

* Fix tests for search_clients_daily_v8

* Don't dryrun with search_clients_last_seen

* Update udf/new_monthly_engine_searches_struct.sql

Co-Authored-By: Jeff Klukas <jeff@klukas.net>

* sample_id is now an int

* Add documentation

* Update schemas

* Make tests use int sample-id

2019-12-12 12:43:09 -05:00

2 Коммитов