Граф коммитов

1378 Коммитов

Автор SHA1 Сообщение Дата
Alexander 4cfa504e49
chore(deploys): Catch and raise for missing dataset/project in table deploys (#6067)
* chore(deploys): Catch NotFound for missing dataset/project in table deploys

* Update bigquery_etl/deploy.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* update test exception text match

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2024-08-15 13:56:36 -04:00
Anna Scholtz 26bcdb0410
Remove parallelism parameter from dependency CLI command (#6041) 2024-08-08 21:57:47 -07:00
Anna Scholtz 72269ac955
Fix dependency record (#6040)
* Undo change on parallelizing dependency record

* fix flake8
2024-08-08 17:26:16 -07:00
Anna Scholtz 0dda2651a4
Fall back to running query to get view schema (#6035) 2024-08-08 16:32:28 -07:00
Anna Scholtz 7024b6976e
Pass ID token to dryrun instances to speed things up (#6019)
* Pass ID token to dryrun instances to speed things up

* Parallelize metadata and dependency generation

* Use table schema from dryrun function
2024-08-08 12:38:43 -07:00
Lucia 851ac84f17
Auxiliary functions for shredder mitigation (#6002)
* Auxiliary functions required to generate the query for a backfill with shredder mitigation.

* Exception handling.

* isort & docstrings.

* Apply flake8 to test file.

* Remove variable assignment to different types.

* Make search case insensitive in function.

* Add test cases for function and update naming in a funcion's parameters for clarity.

* Update bigquery_etl/backfill/shredder_mitigation.py

Co-authored-by: Leli <33942105+lelilia@users.noreply.github.com>

* Add test cases for missing parameters or not matching parameters where expected. minimize the calls for get_bigquery_type().

---------

Co-authored-by: Leli <33942105+lelilia@users.noreply.github.com>
2024-08-05 20:02:16 +02:00
Ben Wu 117d1e67e2
Remove glam intermediate tables from shredder (#5982) 2024-07-26 17:59:29 -04:00
Eduardo Filho 54d4ef0d53
fix(GLAM): remove dangling comma that breaks a query (#5979) 2024-07-26 11:42:36 -04:00
Winnie Chan 6bf63b0dd4
DENG-4283: Updated managed backfills backup table name (#5908)
* Updated back up table name

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2024-07-25 14:47:11 -07:00
akkomar ad659415ad
Add Glean FxA tables to shredder config (#5972) 2024-07-25 18:14:17 +02:00
Ben Wu de40bdfe93
Use updated table object in view publish (#5973) 2024-07-25 11:56:45 -04:00
Eduardo Filho b994884098
GLAM purge percentile calculations and prep downstream (#5966)
* Remove percentiles

* Remove tests that test percentiles

* Refresh scripts insert null to new percentiles

* Remove percentile columns from queries and schemas

* Delete more percentile tables

* Formatting

* histogram_cast_struct's keys are strings

* Re-add test after fixing failure cause
2024-07-25 10:44:43 -04:00
Ben Wu b77db6d083
Update view publish to not replace project names in comments (#5969) 2024-07-24 16:32:46 -04:00
Ben Wu ce4abdc092
Add deletion_request_v1 to fx_accounts_v1 shredder config (#5963)
Co-authored-by: akkomar <akkomar@users.noreply.github.com>
2024-07-24 11:51:47 +02:00
Ben Wu ea890a07b4
Update shredder config for new ltv tables to use legacy deletions (#5958) 2024-07-23 10:40:52 -04:00
Sean Rose cdf9564157
Include task group in `TaskRef.task_key` if present. (#5894)
There could be multiple tasks in a DAG with the same task ID but in different task groups.
2024-07-09 09:05:24 -07:00
akkomar 7d06e5a1cf
Bug 1895503 - Add script to export telemetry data for DSARs (#5833) 2024-07-03 19:50:28 +02:00
Eduardo Filho 16a1d44cf7
GLAM filter out versions above latest (#5868) 2024-07-02 12:28:42 -04:00
akkomar b5f99c4475
Bug 1889144 - Add fx_accounts_v1 to Shredder config (#5865) 2024-07-01 12:57:50 +02:00
Anna Scholtz 57bd939905
Fully qualified identifiers in SQL queries (#5764)
* Add fully-qualified identifiers when formatting queries

* Fully-qualified identifiers for queries in sql/

* Check in only formatted SQL to generated-sql branch

* Add comment

* Fully qualify more tables

* Fully qualify test files

* Formatting improvements around CTEs and unit tests

* Option to skip auto qualifying queries
2024-06-27 09:53:33 -07:00
Anna Scholtz b64c845782
Revert "Fix initialize logic for materialized views (#5640)" (#5826)
This reverts commit 9552136dca.
2024-06-21 09:25:11 -07:00
Anna Scholtz 9552136dca
Fix initialize logic for materialized views (#5640) 2024-06-20 09:44:20 -07:00
Ben Wu 7dc9d000f2
[DENG-3918] Ignore glean stable tables with no client id for shredder (#5783) 2024-06-13 09:54:22 -04:00
Eduardo Filho 4ba55f8232
fix(glam): disambiguate column name (#5774) 2024-06-10 10:29:56 -04:00
Sean Rose 4b8574f3bd
Fix bugs in `derived_view_schemas` SQL generator (#5592)
* Limit `derived_view_schemas` SQL generator to actual view directories.

* Fix the `derived_view_schemas` SQL generator to get the view schemas by dry-running their latest SQL and/or from their latest `schema.yaml` file.  Getting the schema from the currently deployed view wasn't appropriate because it wouldn't reflect the latest view code.

* Rename `View.view_schema` to `View.schema`.

* Change `View` so its schema dry-runs use the cloud function (CI doesn't have permission to run dry-run queries directly).

* Apply partition column filters in view dry-run queries when possible for speed/efficiency.

* Don't allow missing fields to prevent view schema enrichment.

* Only copy column descriptions during view schema enrichment.

* Only try enriching view schemas from their reference table `schema.yaml` files if those files actually exist.

* Change `main_1pct` view to select directly from the `main_remainder_1pct_v1` table, so the `derived_view_schemas` SQL generator can detect the partition column to use and successfully dry-run the view to determine its schema.

* Formalize the order `bqetl generate all` runs the SQL generators in.

* Have `bqetl generate all` run `derived_view_schemas` last, in case other SQL generators create derived views.

* Fix `Schema._traverse()` to only recurse if both fields are records.
2024-06-07 19:05:36 -07:00
Ben Wu 80de43a57f
Print list of failed queries at end of cli dry run (#5771) 2024-06-07 18:18:42 -04:00
Eduardo Filho 36cf98ca65
fix(glam): remove extra space that breaks table name (#5770) 2024-06-07 17:30:05 -04:00
Winnie Chan c58c7fbd5c
DENG-3869 Create schema before deploying backfill staging table (#5643)
* Added schema from query file

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2024-06-07 12:47:09 -07:00
Eduardo Filho 99d982af0d
fix(glam): add missing fields for percentiles and probe counts with non-norm data (#5769) 2024-06-07 13:53:12 -04:00
Eduardo Filho 73208383a0
fix(glam): remove extra parameter to fill_buckets on non_norm data (#5768) 2024-06-07 15:22:06 +02:00
Eduardo Filho c129115660
feature(glam): Add non-normalized histograms on Glean metrics (#5684)
* feature(glam): Add non-normalized histograms on Glean metrics

* Remove useless column definition
2024-06-06 15:45:23 -04:00
Sean Rose 5a7d7985c1
Fix `format_timedelta` function's parsing of negative timedeltas (#5740)
* Fix `format_timedelta` function's parsing of negative timedeltas.

The entire timedelta can be negative.

* Refactor to use a single timedelta regular expression.

* Fix typo in `format_timedelta` function argument.
2024-06-05 09:05:12 -07:00
Winnie Chan 11b891febb
Added dataset id (#5721) 2024-06-04 13:16:21 -07:00
Alexander 40c2b58482
fix(deploy): skip (instead of fail) deploys with explicitly null destination_table (#5700) 2024-05-31 12:50:47 -04:00
Alexander 0cd4295478
chore: refactor schema deploys, add and use deploy utils (#5674)
* chore: refactor schema deploys, add and use utils

* Update tests

* Add deploy tests

* Use string representation of table object in log statements
2024-05-30 16:23:53 -04:00
Anna Scholtz 22dce0557a
Fix setting partitioning metadata (#5690) 2024-05-30 12:11:38 -07:00
Ben Wu 8626f02bff
[DENG-3905] Support queries with temp udfs when billing project is set (#5668) 2024-05-29 10:50:07 -04:00
Alexander da9293fa24
fix(backfills): switch from process to thread to avoid pickling bigquery object (#5656) 2024-05-24 11:16:28 -04:00
Alexander 9f5faf697c
fix(backfills): followup to concurrent.futures, raise Error if any failed dates (#5654) 2024-05-23 16:38:30 -04:00
Alexander d7b5bad870
fix(backfills): switch to concurrent.futures to improve debuggability (#5653) 2024-05-23 16:27:10 -04:00
Ben Wu 9f3135ba33
Use information_schema to find experiment tables for shredder (#5635) 2024-05-22 12:39:43 -04:00
Winnie Chan ce9b8c40c1
DENG-3719: Allow setting billing project for managed backfills (#5605)
* Added default billing project and param
2024-05-21 12:27:55 -07:00
Lucia 71c5d1a8a9
Add clients_last_seen_v2 to shredder config (#5593) 2024-05-16 09:53:34 -05:00
Ben Wu 82c06afbcd
Add unit test to verify shredder delete target and source counts (#5589) 2024-05-15 16:03:46 -05:00
Braunk bd5ffe4916
feat(query-backfill): adding a more flexible approach to overriding scheduling attributes (#5540)
* feat(query-backfill): adding a more flexible approach to overriding scheduling attributes

* Update tests/cli/test_cli_query.py

Co-authored-by: Alexander <anicholson@mozilla.com>

* feat(query.py): adding cleaner override logic per PR comments and also cleaning up comments and rogue print statement

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2024-05-15 15:16:51 -05:00
Katie Windau 760ce16de0
DENG-3288 Create Mobile Engagement Model Tables & Views (#5525) 2024-05-15 14:47:03 -05:00
Ben Wu 7b23ec68d8
[DENG-3335] Set project in experiments shredder config (#5574) 2024-05-14 17:09:34 -04:00
Marlene Hirose 5ebb5dd418
Deng 3187 desktop retention model (#5496)
* initial commit for desktop_retention_clients view

* initial commit for desktop_retention_clients view - after formatting

* initial add of query script for telemetry_derived.retention_v1 table

* change view to table as I need to use the submission_date query parameter

* move view to query for desktop_retention_clients

* reformat file and add COALESCE on submission_date for new_profiles

* take out app_name from retention_v1 query. Change isp_name to isp, take out mozfun UDF on normalized_os_version in CTE and update main query from retention_clients

* run formatting on retention_v1/query.sql

* move files from retention to desktop_retention, retention_clietns to desktop_retention_clients

* add newline to end of schema.yaml file

* reinstate retention_v1 deprecated folder/metadata.yaml file

* refactor desktop_retention_clients_v1/desktop_retention metadata.yaml - add in clustering, take out extra space indent

* refactor metadata.yaml, add schema.yaml, change query.sql to pull from telemetry_derived.desktop_retention_clients_v1 not view

* add in metric_date for desktop_retention_v1 partition, take out require partition date filtering for both retention and retention_clients

* add desktop_retention_model to dags.yaml

* remove retention_clients_v1

* change column name is_new_profile to new_profile_metric_date

* take out 'app_name' from group by

* add telemetry_derived.desktop_retention_clients_v1 to shredder config
2024-05-14 13:50:15 -07:00
Katie Windau 6108d64d37
DENG-3186 Update shredder config to match updated table name (#5572)
* DENG-3186 update shredder config to match updated table name

* DENG-3186 remove dryrun skip
2024-05-14 13:24:24 -05:00
Eduardo Filho 8bd936e1bc
fix(glam) add fully qualified table names in legacy telemetry queries (#5559)
* fix(glam) fix table names to fully qualified

* Fix column order for glean probe counts

* Add fully qualified table name to tests
2024-05-13 14:58:37 -04:00