Граф коммитов

1614 Коммитов

Автор SHA1 Сообщение Дата
Anna Scholtz f6bf253144 Move copy_deduplicate logic to bigquery_etl 2020-09-24 08:55:35 -07:00
Anna Scholtz 6f31338ecd Move view related scripts to view module 2020-09-24 08:55:35 -07:00
Jeff Klukas de9dbc4346
Bug 1666768 Truncate histogram values above 2^31 (#1334)
* Bug 1666768 Truncate histogram values above 2^31

Reverts the temporary fix from #1333

* Use LEAST

Co-authored-by: Daniel Thorn <dthorn@mozilla.com>

Co-authored-by: Daniel Thorn <dthorn@mozilla.com>
2020-09-24 08:51:48 -04:00
Jeff Klukas 504c007d6c
Bug 1635906 Add client ID lookup table for AET (#1335)
* Bug 1635906 Add client ID lookup table for AET

Builds on top of the eco_uid lookup table added in #1323

* Exempt from dry run
2020-09-24 08:43:18 -04:00
Daniel Thorn 7e4054fb5e
Don't assert on import when running bqetl command (#1338)
* Don't assert on import when running bqetl command

* Update cli.py
2020-09-23 17:41:48 -07:00
Daniel Thorn 6534b78814
Update stripe schemas (#1337)
and include them in load jobs when available
2020-09-23 17:17:23 -07:00
Anthony Miyaguchi dd283c264f
Add glam cli for incremental backfill (#1313)
* Add glam cli for listing processed app ids

* Make backfill scripts more consistent

* Add export to glam glean cli

* Add pandas dependency

* Add black format of glam-cli

* Commit hashes based on bigquery-etl container

* Fix various linting issues

* Be stricter with is_logical matching

* Fix more linting issues
2020-09-23 14:45:44 -07:00
dependabot[bot] 4032ed4fcf
Bump google-cloud-bigquery from 1.27.2 to 1.28.0 (#1332) 2020-09-23 20:09:13 +00:00
Jeff Klukas d539fafb59
Bug 1635906 Add bqetl support for scripts and script for AET lookup (#1323)
* Bug 1635906 Add bqetl support for scripts and script for AET lookup

There are some code changes here for DAG generation and for testing.

* Apply suggestions from code review

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* DAG fixups

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2020-09-23 15:28:25 -04:00
Ryan VanderMeulen 6201f5610b
Add the Windows 10 October 2020 Update to Windows 10 Aggregates (#1328)
Microsoft will be rolling out the October 2020 update (version 20H2) soon. Update the query to break on the final build number for it, 19042.

Co-authored-by: Frank Bertsch <fbertsch@mozilla.com>
2020-09-23 14:56:10 -04:00
Jeff Klukas db873bc8f5
Bug 1666768 Exclude specific pathological record from main_summary (#1333)
* Bug 1666768 Exclude specific pathological record from main_summary

* Include temp filter in part2 as well
2020-09-23 10:30:24 -04:00
Frank Bertsch da5786fe2a
Drop submission_date from event types view (#1331) 2020-09-22 12:44:22 -04:00
dependabot[bot] e8718da836
Bump google-cloud-storage from 1.31.0 to 1.31.1 (#1330) 2020-09-22 15:41:24 +00:00
Daniel Thorn 7f448e17ae
Handle legacy IDs from glean apps in shredder (#1326) 2020-09-17 14:34:51 -07:00
Frank Bertsch 957e63c9a8
Remove trailing comma (#1325) 2020-09-17 11:24:26 -07:00
Ben Miroglio f00e0681c3
Update query.sql (#1324)
* Update query.sql

Hide revenue and total LTV fields from view

* Update sql/revenue_derived/client_ltv_normalized/query.sql

Co-authored-by: Frank Bertsch <fbertsch@mozilla.com>
2020-09-17 10:58:32 -07:00
Daniel Thorn 208c551209
Add stripe import script (#1316) 2020-09-17 10:06:06 -07:00
Daniel Thorn 4ff4ae198b
Install bqetl in docker (#1322) 2020-09-16 15:16:11 -07:00
Frank Bertsch ad8e11b2ac
Fix fields for normalized view (#1321) 2020-09-16 12:53:03 -07:00
Anna Scholtz f592bc947c Fix UDF publish metadata tests 2020-09-16 10:40:42 -07:00
Anna Scholtz 268ea08e81 Add tests for UDF description publishing 2020-09-16 10:40:42 -07:00
Anna Scholtz 94212a1479 Publish UDF descriptions 2020-09-16 10:40:42 -07:00
Arkadiusz Komarzewski 4608e6595d
Bug 1661250 - prepare Shredder script for adding Pioneer support (#1314)
This adds a new `environment` parameter to Shredder delete script, accepting either `telemetry` or `pioneer` values.
The goal of this parameter is to have a way for choosing different source and target tables. By default it is set to `telemetry` (which keeps current behavior unchanged).

Next step is to define logic for discovering Pioneer study tables in `config.py` and invoke it in delete script (`NotImplementedError`).
2020-09-16 17:52:16 +02:00
Anna Scholtz de6037755d
Add project ID to events_daily view definition (#1319) 2020-09-16 08:33:13 -04:00
Frank Bertsch 53dcd214e4
Remove order by clause (#1315) 2020-09-15 14:51:54 -04:00
Frank Bertsch 5257299104
Add submission date parameter (#1312) 2020-09-14 16:53:28 -04:00
dependabot[bot] 05461e5d54
Bump pytest from 6.0.1 to 6.0.2 (#1311) 2020-09-14 18:14:12 +00:00
Frank Bertsch c48bef314c
Require destination_table or sql_file_path (#1310) 2020-09-14 12:23:04 -04:00
Anna Scholtz 682c3bad6d Add docs for scheduling queries using bqetl CLI 2020-09-11 14:26:59 -07:00
Anna Scholtz ce23af6d0b Add bqetl docs 2020-09-11 14:26:59 -07:00
Anna Scholtz 0eda40c051 CLI improvements 2020-09-11 14:26:59 -07:00
Frank Bertsch ceb7379858
Create view for event_types from recent day's data (#1306)
* Add option to specify (or not) dest table

* Add option to DAG generation for single dag

* Add query for recent day's event_types

* Use explicit None for no destination table

* Reformat SQL

* Use event_types view in events_daily

* Update tests

* Run black

* Add view for events_daily

* Format SQL

* Ignore event properties in query

* Shorten events_name_v1

* Fix naming for test

* Update README.md

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2020-09-11 16:39:07 -04:00
Jeff Klukas 38c91925db
Allow dry runs to succeed for events daily init.sql (#1308)
CI is currently failing for other PRs.
2020-09-11 16:32:59 -04:00
Jeff Klukas b86a6595f2
Tolerate new 10-digit Fenix app_build format (#1250)
Fixes https://github.com/mozilla/bigquery-etl/issues/1248
2020-09-11 16:17:04 -04:00
Frank Bertsch 722e6ece6e
Init statement & sample id addition (#1304)
* Include sample_id in events_daily

* Add init query for events_daily

* Add sample_id to init
2020-09-11 11:53:21 -04:00
Jeff Klukas a9f00f3b15 Reformat 2020-09-11 11:31:34 -04:00
Jeff Klukas 30c70346bc Remove unused string_to_arr function 2020-09-11 11:31:34 -04:00
Jeff Klukas 9295a5d30f Add DETERMINISTIC modifier to JS functions
This recently released feature allows query results using JS UDFs to be cached.

See https://issuetracker.google.com/issues/138310623
2020-09-11 11:31:34 -04:00
Frank Bertsch f7090245d2
Drop order by clause in favor of clustering (#1303) 2020-09-11 07:10:56 -04:00
Frank Bertsch 079a409672
Use submission_date for join criteria (#1302) 2020-09-10 20:19:31 -04:00
Frank Bertsch 818f680052
Create DAG for events rollup (#1301)
* Create DAG for events rollup

* Update sql/org_mozilla_firefox_derived/events_daily_v1/metadata.yaml

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2020-09-10 18:50:45 -04:00
Frank Bertsch 5a17168e63
Create events daily rollup (#1296)
* Create events daily rollup

This takes the most recent days' data, rolls up the events,
and encodes them as unicode.

Add tests for android events daily

* Remove unecessary file

* Use map.mode_last

* Reformat sql

* Fix experiment aggregation

* Address review feedback

* Fix submission_date

* Git proper alias to events table
2020-09-10 17:48:31 -04:00
Frank Bertsch 8ed92c9838
Give events ping a proper alias (#1300) 2020-09-10 17:35:00 -04:00
Frank Bertsch ddbab093ab
Fix event timestamps (#1299)
* Use correct timestamp representation

* Fix event timestamps for query
2020-09-10 17:14:41 -04:00
Frank Bertsch 156fd98d3d
Revenue LTV updates (#1291)
* Revenue LTV updates

* Remove explicit date references
2020-09-10 15:54:30 -04:00
Frank Bertsch aca88a3d45
Query for updates to event_types_v1 (#1295)
* Query for update event_types_v1

This query takes yesterday's events, yesterday's event_types,
and adds the new events, event_properties, and property values.

It writes it out to a new partition. This is not strictly
necessary but will aid debugging and redoes.

* Format SQL
2020-09-10 12:53:04 -04:00
Frank Bertsch e71840cec6
Android event types init (#1289)
* Init SQL for event_types_v1

* Fix comparison of differently-sized lists

* Add support for tests of init stmts

* Include metadata for event_types_v1

* Add tests for event_types init

* Reformat SQL

* Run black

* Skip invalid unicode sections in event_code_points_to_string

* Allow for init-only queries

* Partition events_daily_v1 by submission-date

This is not strictly required, but will aid in
debugging and reruns.

* Add assertion for not null

* Lint

* Alias events ping name

* Ignore time_ms event property
2020-09-10 12:12:56 -04:00
Anna Scholtz 7566b7c3f0 Add CLI UDF tests 2020-09-09 14:10:44 -07:00
Anna Scholtz f04d7bf507 Add UDF CLI command for publishing UDFs 2020-09-09 14:10:44 -07:00
Anna Scholtz e87bb3f4d4 Add CLI command for validating UDFs 2020-09-09 14:10:44 -07:00