bigquery-etl

Граф коммитов

Автор	SHA1	Сообщение	Дата
Lucia	b83224393c	Improve checks to account for null replacements (#6508 ) * Backfill info. * Update tests, * Update tests,	2024-11-15 17:03:02 +00:00
kik-kik	1f91163f6c	feat: adding an additional case for when metric.metric_type is null inside _update_bigconfig (#6482 )	2024-11-14 10:05:59 +00:00
Anna Scholtz	2bd130526e	Improve Bigeye error reporting (#6493 )	2024-11-14 00:55:36 +00:00
Ben Wu	3c13b9af65	Make shredder detect glean derived tables with client_info.client_id (#6459 ) * Make shredder detect glean derived tables with client_info.client_id * Add skip list	2024-11-12 23:01:47 +00:00
Katie Windau	26d3e9faf0	DENG-4835 - Deleting deprecated views & tables (#6455 ) * DENG-4835 - Delete deprecated views * DENG-4835 delete deprecated tables * DENG-4835 delete deprecated tables * DENG-4835 Remove a deleted table from shredder processing	2024-11-09 18:51:07 +00:00
Ben Wu	1f66c24445	Add logging to stage clean and fix --delete-expired (#6466 )	2024-11-08 18:16:01 +00:00
Ben Wu	69aba6a473	[DENG-5962] Change shredder state to only use end_date (#6450 )	2024-11-06 15:19:10 +00:00
Anna Scholtz	4ad4c6b9d4	Add CLI command to run Bigeye checks (#6383 ) * Add support for defining custom SQL rules * Added monitoring rollback command * Reformat * Add tests * Address review feedback * Add flags for deleting Bigeye checks * Add support for defining custom SQL rules * Added monitoring rollback command * Add run command to trigger Bigeye checks * Use bigquery_bigeye_check to trigger Bigeye runs * Add unit tests for monitoring run CLI command * Update DAG tests * Fix imports * Address review feedback * Format custom sql rules file * Remove bigconfig_custom_rules.sql * Fix bigeye update for public datasets	2024-11-05 18:02:42 +00:00
Anna Scholtz	209b75ae57	Bigeye - Add support for deploying and removing custom SQL rules (#6379 ) * Add support for defining custom SQL rules * Added monitoring rollback command * Reformat * Add tests * Address review feedback * Add flags for deleting Bigeye checks * Fix formatting * Fix tests	2024-11-04 17:41:46 +00:00
Anna Scholtz	c7521153be	CLI command to automate some migration of ETL checks to Bigeye (#6362 ) * Add command to automate migration of ETL checks * Migrate ETL checks for ssl_ratios * Update bigconfigs	2024-11-01 12:27:57 +00:00
Sean Rose	10f15c4fa4	Change Stripe tax transaction exchange rate to BIGNUMERIC type. (#6428 ) Some historical Stripe tax transactions contain exchange rates with 16 digits after the decimal, which exceeds the normal NUMERIC type's maximum scale of 9 digits after the decimal.	2024-10-31 21:05:02 +00:00
Ben Wu	70b794b434	Populate empty query.py in stage deploy instead of CI config (#6430 )	2024-10-31 20:50:22 +00:00
Alekhya	7f72235eb4	Switch to baseline beginning Aug 01st 2024 (#6425 ) * Switch to baseline beginning Aug 01st 2024 * Modify the search_engine_daily view to handle the cut off * Fix sql format * Skip mobile_clients_daily_v2 from dry run * Remove sql_generator files populating v1 * Add tests for mobile_search_clients_daily_v2 * remove unwanted tests	2024-10-31 19:11:36 +00:00
Anna Scholtz	381107db71	Revert #6418 (#6429 )	2024-10-31 15:41:23 +00:00
Ben Wu	759cbe3231	Fix backfill validation tests in CI (#6398 ) * Fix logging in backfill validation * validate backfill should fail * validate backfill should still fail * revert entry, move validate-backfills to after generate-sql	2024-10-30 20:25:41 +00:00
Ben Wu	b68db470bd	Sort query references result when parallelism > 0 (#6427 )	2024-10-30 20:25:31 +00:00
Anna Scholtz	922f5ee70f	Fully-qualify INFORMATION_SCHEMA datasets in ./bqetl view clean (#6426 )	2024-10-30 20:01:27 +00:00
Anna Scholtz	d486bed1b9	Use INFORMATION_SCHEMA to get existing views for cleaning (#6424 ) * Use INFORMATION_SCHEMA to get existing views for cleaning * Update _list_managed_views caller	2024-10-30 18:32:01 +00:00
Ben Wu	26adc67513	Fix materialized views failing stage deploy (#6421 ) * Create test materialized view * fix copy_sql_to_tmp_dir renaming * Add materialized views to view dependencies * remove test files	2024-10-30 15:33:28 +00:00
Anna Scholtz	2add865249	Speed up schema updates (#6418 ) * Parallelize dependency graph * Use GCP API to get table schema when not using cloud function * Reuse GCP credentials * Update dependency tests * Remove print	2024-10-30 14:52:05 +00:00
Anna Scholtz	6c87dfb547	Reuse GCP API credentials for view updates (#6415 ) * Reuse GCP API credentials for view updates * isort	2024-10-29 15:31:29 +00:00
Anna Scholtz	3b222a21e9	Parallelize metadata publish (#6403 )	2024-10-28 19:03:08 +00:00
Anna Scholtz	f8fa9ef3a4	Speed up view deploys by using processing pool (#6401 )	2024-10-28 19:02:45 +00:00
Anna Scholtz	03e88b1875	Skip updating schemas of tables that are skipped for deploys (#6410 )	2024-10-28 18:25:54 +00:00
Anna Scholtz	b06c2836ae	Skip list for schema deploys (#6404 )	2024-10-25 22:46:57 +00:00
Ben Wu	16a50f1378	Create query and script for alerts for missing shredder targets (#6385 )	2024-10-24 18:57:12 +00:00
Ben Wu	553307ab4a	Prevent shredder sampling for null partition (#6376 ) * Prevent shredder sampling for null partition * black	2024-10-24 18:56:39 +00:00
Anna Scholtz	c3c5cad7c1	Deploy all BigConfig files at once (#6351 )	2024-10-15 17:36:40 +00:00
whd	5dfb2ef0a5	Set analysis dataset default retention to 180 days (#6346 )	2024-10-14 21:46:40 +00:00
Mathijs Miermans	5e080716e0	[MC-1458] Add newtab_merino_priors DAG (#6303 ) * [MC-1458] Add newtab_merino_priors DAG * Extract shared JSON export function --------- Co-authored-by: Chelsey Beck <64881557+chelseybeck@users.noreply.github.com>	2024-10-14 17:06:12 +00:00
kik-kik	221cd9b9c2	fix: only generate Airflow task for BigEye if monitoring enabled in the metadata (#6326 )	2024-10-10 21:50:42 +00:00
Anna Scholtz	c0114f4626	Generate BigConfig files for views (#6312 ) * Generate BigConfig files for views * Re-enable monitoring for telemetry.releases	2024-10-10 14:12:54 +00:00
Ben Wu	5b5bdd98b7	Fix client id field in accounts_frontend events_stream shredder config (#6308 )	2024-10-08 15:17:26 +00:00
Anna Scholtz	9475778423	Add bqetl CLI command for setting partition columns (#6302 ) * Add bqetl CLI command for setting partition columns * Fix tests --------- Co-authored-by: Alekhya <88394696+alekhyamoz@users.noreply.github.com>	2024-10-07 20:31:24 +00:00
Anna Scholtz	a1a12791aa	Update Bigeye warehouse ID (#6297 ) * Update Bigeye warehouse ID * Update test bigeye warehouse ID	2024-10-03 21:12:24 +00:00
kik-kik	4257698cc8	feat(DENG-4602): add BigEye RunMetricOperator to DAG generation (#6285 ) * feat: add BigEye RunMetricOperator to DAG generation * Fix Bigeye DAG generation and move configuration to bqetl_project.yaml * Reformatting; test fixing * Install Airflow override requirements --------- Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2024-10-03 15:13:27 +00:00
Ben Wu	a9c904df37	Give reader role to dry run account for datasets in stage (#6288 ) * Give reader role to dry run account for datasets in stage * add sql changes to deploy * hardcode service accounts * Revert "add sql changes to deploy" * Remove func in dryrun * f	2024-10-02 21:55:19 +00:00
Ben Wu	66443eee29	Bug 1920544 Create view to union firefox desktop crashes (#6257 )	2024-09-27 17:48:44 +00:00
Anna Scholtz	f826580177	Allow specifying a collection for monitored tables (#6256 ) * Allow specifying a collection for monitored tables * Update bigquery_etl/cli/monitoring.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * Format bqetl --------- Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>	2024-09-24 19:39:11 +00:00
Curtis Morales	247729d217	In validation, only read schema files when necessary (#6252 ) * Remove deleted table from skip list * Parse schema file in validate_shredder_mitigation function so it works on priv-bqetl * Parse schema file in validate_shredder_mitigation function so it works on priv-bqetl * Clean up test	2024-09-24 17:03:28 +00:00
Lucia	29bb468663	Deng 877 autogenerate checks during shredder mitigation (#6243 ) * Reference to the shredder mitigation process during backfills. * missing dash * Auto-generate and run data checks. Validate shredder_mitigation label. * Checks template. * Fix tests. * Update checks to include additional EXCEPT, use table_id in staging dataset, and ensure that tests run and generate a failure that stops the backfill.	2024-09-24 15:35:25 +00:00
Anna Scholtz	66b37ada4a	Add support for secrets when generating Airflow DAGs (#6241 ) * Add support for secrets when generating Airflow DAGs * update test_publish_metadata owner * Conditional import of Airflow Secrets	2024-09-23 19:58:43 +00:00
Anna Scholtz	61f920cd3e	[Bug 1823724] Add flag to missing columns views to indicate that column exists in schema (#6215 ) * Add `column_exists_in_schema` field to structured_missing_columns * Add column_exists_in_schema to telemetry_missing_columns * Add UDF to convert column names to be compatible with schema conventions * Add UDF test for snake_case_columns * Fix stage deploys for INFORMATION_SCHEMA * Fix UDF test * Code review feedback * Review feedback	2024-09-19 21:12:31 +00:00
Ben Wu	8f493685d2	Set find_glean_targets thread count to 6 in shredder (#6219 )	2024-09-18 20:34:17 +00:00
Lucia	074731db7b	CI validation of tables with the shredder_mitigation label (#6217 ) * Larger wildcards to reduce the chance of collision with actual values. * Formatting * Update bigquery_etl/metadata/validate_metadata.py Co-authored-by: Ben Wu <12437227+BenWu@users.noreply.github.com> * Add test for validate_metadata.validate, add profile_id and profile_group_id to id-level_columns file. * Update tests/cli/test_cli_metadata.py Co-authored-by: Ben Wu <12437227+BenWu@users.noreply.github.com> --------- Co-authored-by: Ben Wu <12437227+BenWu@users.noreply.github.com>	2024-09-18 18:44:06 +00:00
Eduardo Filho	68937b4ad0	fix(glam_fog) SUM overflow (#6218 ) * fix(glam_fog): Create function histogram_filter_high_values to prevent INT64 overflow * fix(glam_fog): Use histogram_filter_high_values to avoid overflow	2024-09-18 16:40:41 +00:00
Anna Scholtz	13349e8589	Generate Bigeye monitoring configs in CI (#6194 ) * Generate Bigeye monitoring configs in CI * Ensure Bigconfig files are loaded exactly once * Avoid duplicate validation of Bigconfig files * Authentication using API key to Bigeye * Remove api-key option for Bigeye and rely on env var instead	2024-09-17 16:15:20 +00:00
Lucia	68159a7f1d	Test mitigation (#6205 ) * Add default values in template to fix sqlglot parsing error. * Adding backfill_date to exception message. Formatting. * Improve getting collumn dtypes by including the schema files. Change custom_query for custom_query_path for readibility. * DENG_4733. Improve getting column dtypes by including the schema files. Change custom_query for custom_query_path for readibility. * Formatting. * Set values back to NULL were corresponds. Improve output information. * Rename custom_query to custom_query_path to match the expected parameter. * Missing import * Larger wildcards to reduce the chance of collision with actual values.	2024-09-17 12:09:17 +00:00
Ben Wu	53f385510c	[DENG-4641] Add support for shredding per sample id to shredder (#6197 ) * [DENG-4641] Add support for shredding per sample id to shredder * Add comment explaining connection_pool_max_size	2024-09-16 17:34:38 +00:00
Anna Scholtz	47b9a51972	Add CLI commands to deploy BigConfig files (#6169 ) * Add CLI commands to deploy BigConfig files * Review feedback * Ignore bigeye_credentials changes	2024-09-12 20:16:35 +00:00

1 2 3 4 5 ...

1440 Коммитов