* Improve error message for ndjson parsing
* Make JSON error messages nicer
* Cast BYTES fields to/from string
BYTES types are not JSON-serializable. To deal with that, we do
two things:
1. Assume the input tables are hex strings, and decode them
to get the BYTES fields values (on input)
2. Encode BYTES fields as hex strings (on output)
This means that any data files use hex strings for BYTES fields.
Note: This only works on top-level fields
* Add better discrepancy reporting for test assertions
When JSON blobs differ, it can be hard to tell what is wrong.
These functions easily show what's different, and automatically
prints them to be available when tests fail.
* Add search_clients_last_seen for 1-year of history
This new dataset, search_clients_last_seen, contains a year
of history for each client. It is split into 3 main parts:
1. Recent info that is contained in search_clients_daily,
similar to how we store that in clients_last_seen
2. A year of history, represented as a BYTES field,
indicating which days they were active for different
types of activity
3. Among the major search providers, arrays of totals of
different metrics, split into 12 parts, to account for
each months total
This dataset will power LTV.
* Fix linting issues
* Enforce sampling on search_clients_daily
* Address review feedback
- Change all bits/bytes functions to include no. of bits
- Use fileobj for tests
- Rename some vars
- Use base64 for bytes in/out
* Generate sql
* Add missing comma
* Move search_clients_ls to search_derived
* Generate moar sql
* Use clients_daily_v8
* Fix query
* Move tests to search_derived
* Fix tests for search_clients_daily_v8
* Don't dryrun with search_clients_last_seen
* Update udf/new_monthly_engine_searches_struct.sql
Co-Authored-By: Jeff Klukas <jeff@klukas.net>
* sample_id is now an int
* Add documentation
* Update schemas
* Make tests use int sample-id