bigquery-etl

Граф коммитов

Автор	SHA1	Сообщение	Дата
Alexander	4cfa504e49	chore(deploys): Catch and raise for missing dataset/project in table deploys (#6067 ) * chore(deploys): Catch NotFound for missing dataset/project in table deploys * Update bigquery_etl/deploy.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * update test exception text match --------- Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>	2024-08-15 13:56:36 -04:00
Anna Scholtz	26bcdb0410	Remove parallelism parameter from dependency CLI command (#6041 )	2024-08-08 21:57:47 -07:00
Anna Scholtz	72269ac955	Fix dependency record (#6040 ) * Undo change on parallelizing dependency record * fix flake8	2024-08-08 17:26:16 -07:00
Anna Scholtz	0dda2651a4	Fall back to running query to get view schema (#6035 )	2024-08-08 16:32:28 -07:00
Anna Scholtz	7024b6976e	Pass ID token to dryrun instances to speed things up (#6019 ) * Pass ID token to dryrun instances to speed things up * Parallelize metadata and dependency generation * Use table schema from dryrun function	2024-08-08 12:38:43 -07:00
Lucia	851ac84f17	Auxiliary functions for shredder mitigation (#6002 ) * Auxiliary functions required to generate the query for a backfill with shredder mitigation. * Exception handling. * isort & docstrings. * Apply flake8 to test file. * Remove variable assignment to different types. * Make search case insensitive in function. * Add test cases for function and update naming in a funcion's parameters for clarity. * Update bigquery_etl/backfill/shredder_mitigation.py Co-authored-by: Leli <33942105+lelilia@users.noreply.github.com> * Add test cases for missing parameters or not matching parameters where expected. minimize the calls for get_bigquery_type(). --------- Co-authored-by: Leli <33942105+lelilia@users.noreply.github.com>	2024-08-05 20:02:16 +02:00
Ben Wu	117d1e67e2	Remove glam intermediate tables from shredder (#5982 )	2024-07-26 17:59:29 -04:00
Eduardo Filho	54d4ef0d53	fix(GLAM): remove dangling comma that breaks a query (#5979 )	2024-07-26 11:42:36 -04:00
Winnie Chan	6bf63b0dd4	DENG-4283: Updated managed backfills backup table name (#5908 ) * Updated back up table name --------- Co-authored-by: Alexander <anicholson@mozilla.com>	2024-07-25 14:47:11 -07:00
akkomar	ad659415ad	Add Glean FxA tables to shredder config (#5972 )	2024-07-25 18:14:17 +02:00
Ben Wu	de40bdfe93	Use updated table object in view publish (#5973 )	2024-07-25 11:56:45 -04:00
Eduardo Filho	b994884098	GLAM purge percentile calculations and prep downstream (#5966 ) * Remove percentiles * Remove tests that test percentiles * Refresh scripts insert null to new percentiles * Remove percentile columns from queries and schemas * Delete more percentile tables * Formatting * histogram_cast_struct's keys are strings * Re-add test after fixing failure cause	2024-07-25 10:44:43 -04:00
Ben Wu	b77db6d083	Update view publish to not replace project names in comments (#5969 )	2024-07-24 16:32:46 -04:00
Ben Wu	ce4abdc092	Add deletion_request_v1 to fx_accounts_v1 shredder config (#5963 ) Co-authored-by: akkomar <akkomar@users.noreply.github.com>	2024-07-24 11:51:47 +02:00
Ben Wu	ea890a07b4	Update shredder config for new ltv tables to use legacy deletions (#5958 )	2024-07-23 10:40:52 -04:00
Sean Rose	cdf9564157	Include task group in `TaskRef.task_key` if present. (#5894 ) There could be multiple tasks in a DAG with the same task ID but in different task groups.	2024-07-09 09:05:24 -07:00
akkomar	7d06e5a1cf	Bug 1895503 - Add script to export telemetry data for DSARs (#5833 )	2024-07-03 19:50:28 +02:00
Eduardo Filho	16a1d44cf7	GLAM filter out versions above latest (#5868 )	2024-07-02 12:28:42 -04:00
akkomar	b5f99c4475	Bug 1889144 - Add fx_accounts_v1 to Shredder config (#5865 )	2024-07-01 12:57:50 +02:00
Anna Scholtz	57bd939905	Fully qualified identifiers in SQL queries (#5764 ) * Add fully-qualified identifiers when formatting queries * Fully-qualified identifiers for queries in sql/ * Check in only formatted SQL to generated-sql branch * Add comment * Fully qualify more tables * Fully qualify test files * Formatting improvements around CTEs and unit tests * Option to skip auto qualifying queries	2024-06-27 09:53:33 -07:00
Anna Scholtz	b64c845782	Revert "Fix initialize logic for materialized views (#5640 )" (#5826 ) This reverts commit `9552136dca`.	2024-06-21 09:25:11 -07:00
Anna Scholtz	9552136dca	Fix initialize logic for materialized views (#5640 )	2024-06-20 09:44:20 -07:00
Ben Wu	7dc9d000f2	[DENG-3918] Ignore glean stable tables with no client id for shredder (#5783 )	2024-06-13 09:54:22 -04:00
Eduardo Filho	4ba55f8232	fix(glam): disambiguate column name (#5774 )	2024-06-10 10:29:56 -04:00
Sean Rose	4b8574f3bd	Fix bugs in `derived_view_schemas` SQL generator (#5592 ) * Limit `derived_view_schemas` SQL generator to actual view directories. * Fix the `derived_view_schemas` SQL generator to get the view schemas by dry-running their latest SQL and/or from their latest `schema.yaml` file. Getting the schema from the currently deployed view wasn't appropriate because it wouldn't reflect the latest view code. * Rename `View.view_schema` to `View.schema`. * Change `View` so its schema dry-runs use the cloud function (CI doesn't have permission to run dry-run queries directly). * Apply partition column filters in view dry-run queries when possible for speed/efficiency. * Don't allow missing fields to prevent view schema enrichment. * Only copy column descriptions during view schema enrichment. * Only try enriching view schemas from their reference table `schema.yaml` files if those files actually exist. * Change `main_1pct` view to select directly from the `main_remainder_1pct_v1` table, so the `derived_view_schemas` SQL generator can detect the partition column to use and successfully dry-run the view to determine its schema. * Formalize the order `bqetl generate all` runs the SQL generators in. * Have `bqetl generate all` run `derived_view_schemas` last, in case other SQL generators create derived views. * Fix `Schema._traverse()` to only recurse if both fields are records.	2024-06-07 19:05:36 -07:00
Ben Wu	80de43a57f	Print list of failed queries at end of cli dry run (#5771 )	2024-06-07 18:18:42 -04:00
Eduardo Filho	36cf98ca65	fix(glam): remove extra space that breaks table name (#5770 )	2024-06-07 17:30:05 -04:00
Winnie Chan	c58c7fbd5c	DENG-3869 Create schema before deploying backfill staging table (#5643 ) * Added schema from query file --------- Co-authored-by: Alexander <anicholson@mozilla.com>	2024-06-07 12:47:09 -07:00
Eduardo Filho	99d982af0d	fix(glam): add missing fields for percentiles and probe counts with non-norm data (#5769 )	2024-06-07 13:53:12 -04:00
Eduardo Filho	73208383a0	fix(glam): remove extra parameter to fill_buckets on non_norm data (#5768 )	2024-06-07 15:22:06 +02:00
Eduardo Filho	c129115660	feature(glam): Add non-normalized histograms on Glean metrics (#5684 ) * feature(glam): Add non-normalized histograms on Glean metrics * Remove useless column definition	2024-06-06 15:45:23 -04:00
Sean Rose	5a7d7985c1	Fix `format_timedelta` function's parsing of negative timedeltas (#5740 ) * Fix `format_timedelta` function's parsing of negative timedeltas. The entire timedelta can be negative. * Refactor to use a single timedelta regular expression. * Fix typo in `format_timedelta` function argument.	2024-06-05 09:05:12 -07:00
Winnie Chan	11b891febb	Added dataset id (#5721 )	2024-06-04 13:16:21 -07:00
Alexander	40c2b58482	fix(deploy): skip (instead of fail) deploys with explicitly null destination_table (#5700 )	2024-05-31 12:50:47 -04:00
Alexander	0cd4295478	chore: refactor schema deploys, add and use deploy utils (#5674 ) * chore: refactor schema deploys, add and use utils * Update tests * Add deploy tests * Use string representation of table object in log statements	2024-05-30 16:23:53 -04:00
Anna Scholtz	22dce0557a	Fix setting partitioning metadata (#5690 )	2024-05-30 12:11:38 -07:00
Ben Wu	8626f02bff	[DENG-3905] Support queries with temp udfs when billing project is set (#5668 )	2024-05-29 10:50:07 -04:00
Alexander	da9293fa24	fix(backfills): switch from process to thread to avoid pickling bigquery object (#5656 )	2024-05-24 11:16:28 -04:00
Alexander	9f5faf697c	fix(backfills): followup to concurrent.futures, raise Error if any failed dates (#5654 )	2024-05-23 16:38:30 -04:00
Alexander	d7b5bad870	fix(backfills): switch to concurrent.futures to improve debuggability (#5653 )	2024-05-23 16:27:10 -04:00
Ben Wu	9f3135ba33	Use information_schema to find experiment tables for shredder (#5635 )	2024-05-22 12:39:43 -04:00
Winnie Chan	ce9b8c40c1	DENG-3719: Allow setting billing project for managed backfills (#5605 ) * Added default billing project and param	2024-05-21 12:27:55 -07:00
Lucia	71c5d1a8a9	Add clients_last_seen_v2 to shredder config (#5593 )	2024-05-16 09:53:34 -05:00
Ben Wu	82c06afbcd	Add unit test to verify shredder delete target and source counts (#5589 )	2024-05-15 16:03:46 -05:00
Braunk	bd5ffe4916	feat(query-backfill): adding a more flexible approach to overriding scheduling attributes (#5540 ) * feat(query-backfill): adding a more flexible approach to overriding scheduling attributes * Update tests/cli/test_cli_query.py Co-authored-by: Alexander <anicholson@mozilla.com> * feat(query.py): adding cleaner override logic per PR comments and also cleaning up comments and rogue print statement --------- Co-authored-by: Alexander <anicholson@mozilla.com>	2024-05-15 15:16:51 -05:00
Katie Windau	760ce16de0	DENG-3288 Create Mobile Engagement Model Tables & Views (#5525 )	2024-05-15 14:47:03 -05:00
Ben Wu	7b23ec68d8	[DENG-3335] Set project in experiments shredder config (#5574 )	2024-05-14 17:09:34 -04:00
Marlene Hirose	5ebb5dd418	Deng 3187 desktop retention model (#5496 ) * initial commit for desktop_retention_clients view * initial commit for desktop_retention_clients view - after formatting * initial add of query script for telemetry_derived.retention_v1 table * change view to table as I need to use the submission_date query parameter * move view to query for desktop_retention_clients * reformat file and add COALESCE on submission_date for new_profiles * take out app_name from retention_v1 query. Change isp_name to isp, take out mozfun UDF on normalized_os_version in CTE and update main query from retention_clients * run formatting on retention_v1/query.sql * move files from retention to desktop_retention, retention_clietns to desktop_retention_clients * add newline to end of schema.yaml file * reinstate retention_v1 deprecated folder/metadata.yaml file * refactor desktop_retention_clients_v1/desktop_retention metadata.yaml - add in clustering, take out extra space indent * refactor metadata.yaml, add schema.yaml, change query.sql to pull from telemetry_derived.desktop_retention_clients_v1 not view * add in metric_date for desktop_retention_v1 partition, take out require partition date filtering for both retention and retention_clients * add desktop_retention_model to dags.yaml * remove retention_clients_v1 * change column name is_new_profile to new_profile_metric_date * take out 'app_name' from group by * add telemetry_derived.desktop_retention_clients_v1 to shredder config	2024-05-14 13:50:15 -07:00
Katie Windau	6108d64d37	DENG-3186 Update shredder config to match updated table name (#5572 ) * DENG-3186 update shredder config to match updated table name * DENG-3186 remove dryrun skip	2024-05-14 13:24:24 -05:00
Eduardo Filho	8bd936e1bc	fix(glam) add fully qualified table names in legacy telemetry queries (#5559 ) * fix(glam) fix table names to fully qualified * Fix column order for glean probe counts * Add fully qualified table name to tests	2024-05-13 14:58:37 -04:00

1 2 3 4 5 ...

1378 Коммитов