Граф коммитов

2 Коммитов

Автор SHA1 Сообщение Дата
Frank Bertsch 65053ad5e1
Fix monthly_searches to account for null (#612)
* Fix monthly_searches to account for null

* Fix missing closing parens

* Add UDF to get NULL key
2019-12-19 15:19:10 -05:00
Frank Bertsch 6c825425b3
Search clients last seen (#451)
* Improve error message for ndjson parsing

* Make JSON error messages nicer

* Cast BYTES fields to/from string

BYTES types are not JSON-serializable. To deal with that, we do
two things:
1. Assume the input tables are hex strings, and decode them
   to get the BYTES fields values (on input)
2. Encode BYTES fields as hex strings (on output)

This means that any data files use hex strings for BYTES fields.

Note: This only works on top-level fields

* Add better discrepancy reporting for test assertions

When JSON blobs differ, it can be hard to tell what is wrong.
These functions easily show what's different, and automatically
prints them to be available when tests fail.

* Add search_clients_last_seen for 1-year of history

This new dataset, search_clients_last_seen, contains a year
of history for each client. It is split into 3 main parts:

1. Recent info that is contained in search_clients_daily,
   similar to how we store that in clients_last_seen
2. A year of history, represented as a BYTES field,
   indicating which days they were active for different
   types of activity
3. Among the major search providers, arrays of totals of
   different metrics, split into 12 parts, to account for
   each months total

This dataset will power LTV.

* Fix linting issues

* Enforce sampling on search_clients_daily

* Address review feedback

- Change all bits/bytes functions to include no. of bits
- Use fileobj for tests
- Rename some vars
- Use base64 for bytes in/out

* Generate sql

* Add missing comma

* Move search_clients_ls to search_derived

* Generate moar sql

* Use clients_daily_v8

* Fix query

* Move tests to search_derived

* Fix tests for search_clients_daily_v8

* Don't dryrun with search_clients_last_seen

* Update udf/new_monthly_engine_searches_struct.sql

Co-Authored-By: Jeff Klukas <jeff@klukas.net>

* sample_id is now an int

* Add documentation

* Update schemas

* Make tests use int sample-id
2019-12-12 12:43:09 -05:00