Граф коммитов

1440 Коммитов

Автор SHA1 Сообщение Дата
Lucia b83224393c
Improve checks to account for null replacements (#6508)
* Backfill info.

* Update tests,

* Update tests,
2024-11-15 17:03:02 +00:00
kik-kik 1f91163f6c
feat: adding an additional case for when metric.metric_type is null inside _update_bigconfig (#6482) 2024-11-14 10:05:59 +00:00
Anna Scholtz 2bd130526e
Improve Bigeye error reporting (#6493) 2024-11-14 00:55:36 +00:00
Ben Wu 3c13b9af65
Make shredder detect glean derived tables with client_info.client_id (#6459)
* Make shredder detect glean derived tables with client_info.client_id

* Add skip list
2024-11-12 23:01:47 +00:00
Katie Windau 26d3e9faf0
DENG-4835 - Deleting deprecated views & tables (#6455)
* DENG-4835 - Delete deprecated views

* DENG-4835 delete deprecated tables

* DENG-4835 delete deprecated tables

* DENG-4835 Remove a deleted table from shredder processing
2024-11-09 18:51:07 +00:00
Ben Wu 1f66c24445
Add logging to stage clean and fix --delete-expired (#6466) 2024-11-08 18:16:01 +00:00
Ben Wu 69aba6a473
[DENG-5962] Change shredder state to only use end_date (#6450) 2024-11-06 15:19:10 +00:00
Anna Scholtz 4ad4c6b9d4
Add CLI command to run Bigeye checks (#6383)
* Add support for defining custom SQL rules

* Added monitoring rollback command

* Reformat

* Add tests

* Address review feedback

* Add flags for deleting Bigeye checks

* Add support for defining custom SQL rules

* Added monitoring rollback command

* Add run command to trigger Bigeye checks

* Use bigquery_bigeye_check to trigger Bigeye runs

* Add unit tests for monitoring run CLI command

* Update DAG tests

* Fix imports

* Address review feedback

* Format custom sql rules file

* Remove bigconfig_custom_rules.sql

* Fix bigeye update for public datasets
2024-11-05 18:02:42 +00:00
Anna Scholtz 209b75ae57
Bigeye - Add support for deploying and removing custom SQL rules (#6379)
* Add support for defining custom SQL rules

* Added monitoring rollback command

* Reformat

* Add tests

* Address review feedback

* Add flags for deleting Bigeye checks

* Fix formatting

* Fix tests
2024-11-04 17:41:46 +00:00
Anna Scholtz c7521153be
CLI command to automate some migration of ETL checks to Bigeye (#6362)
* Add command to automate migration of ETL checks

* Migrate ETL checks for ssl_ratios

* Update bigconfigs
2024-11-01 12:27:57 +00:00
Sean Rose 10f15c4fa4
Change Stripe tax transaction exchange rate to BIGNUMERIC type. (#6428)
Some historical Stripe tax transactions contain exchange rates with 16 digits after the decimal, which exceeds the normal NUMERIC type's maximum scale of 9 digits after the decimal.
2024-10-31 21:05:02 +00:00
Ben Wu 70b794b434
Populate empty query.py in stage deploy instead of CI config (#6430) 2024-10-31 20:50:22 +00:00
Alekhya 7f72235eb4
Switch to baseline beginning Aug 01st 2024 (#6425)
* Switch to baseline beginning Aug 01st 2024

* Modify the search_engine_daily view to handle the cut off

* Fix sql format

* Skip mobile_clients_daily_v2 from dry run

* Remove sql_generator files populating v1

* Add tests for mobile_search_clients_daily_v2

* remove unwanted tests
2024-10-31 19:11:36 +00:00
Anna Scholtz 381107db71
Revert #6418 (#6429) 2024-10-31 15:41:23 +00:00
Ben Wu 759cbe3231
Fix backfill validation tests in CI (#6398)
* Fix logging in backfill validation

* validate backfill should fail

* validate backfill should still fail

* revert entry, move validate-backfills to after generate-sql
2024-10-30 20:25:41 +00:00
Ben Wu b68db470bd
Sort query references result when parallelism > 0 (#6427) 2024-10-30 20:25:31 +00:00
Anna Scholtz 922f5ee70f
Fully-qualify INFORMATION_SCHEMA datasets in ./bqetl view clean (#6426) 2024-10-30 20:01:27 +00:00
Anna Scholtz d486bed1b9
Use INFORMATION_SCHEMA to get existing views for cleaning (#6424)
* Use INFORMATION_SCHEMA to get existing views for cleaning

* Update _list_managed_views caller
2024-10-30 18:32:01 +00:00
Ben Wu 26adc67513
Fix materialized views failing stage deploy (#6421)
* Create test materialized view

* fix copy_sql_to_tmp_dir renaming

* Add materialized views to view dependencies

* remove test files
2024-10-30 15:33:28 +00:00
Anna Scholtz 2add865249
Speed up schema updates (#6418)
* Parallelize dependency graph

* Use GCP API to get table schema when not using cloud function

* Reuse GCP credentials

* Update dependency tests

* Remove print
2024-10-30 14:52:05 +00:00
Anna Scholtz 6c87dfb547
Reuse GCP API credentials for view updates (#6415)
* Reuse GCP API credentials for view updates

* isort
2024-10-29 15:31:29 +00:00
Anna Scholtz 3b222a21e9
Parallelize metadata publish (#6403) 2024-10-28 19:03:08 +00:00
Anna Scholtz f8fa9ef3a4
Speed up view deploys by using processing pool (#6401) 2024-10-28 19:02:45 +00:00
Anna Scholtz 03e88b1875
Skip updating schemas of tables that are skipped for deploys (#6410) 2024-10-28 18:25:54 +00:00
Anna Scholtz b06c2836ae
Skip list for schema deploys (#6404) 2024-10-25 22:46:57 +00:00
Ben Wu 16a50f1378
Create query and script for alerts for missing shredder targets (#6385) 2024-10-24 18:57:12 +00:00
Ben Wu 553307ab4a
Prevent shredder sampling for null partition (#6376)
* Prevent shredder sampling for null partition

* black
2024-10-24 18:56:39 +00:00
Anna Scholtz c3c5cad7c1
Deploy all BigConfig files at once (#6351) 2024-10-15 17:36:40 +00:00
whd 5dfb2ef0a5
Set analysis dataset default retention to 180 days (#6346) 2024-10-14 21:46:40 +00:00
Mathijs Miermans 5e080716e0
[MC-1458] Add newtab_merino_priors DAG (#6303)
* [MC-1458] Add newtab_merino_priors DAG

* Extract shared JSON export function

---------

Co-authored-by: Chelsey Beck <64881557+chelseybeck@users.noreply.github.com>
2024-10-14 17:06:12 +00:00
kik-kik 221cd9b9c2
fix: only generate Airflow task for BigEye if monitoring enabled in the metadata (#6326) 2024-10-10 21:50:42 +00:00
Anna Scholtz c0114f4626
Generate BigConfig files for views (#6312)
* Generate BigConfig files for views

* Re-enable monitoring for telemetry.releases
2024-10-10 14:12:54 +00:00
Ben Wu 5b5bdd98b7
Fix client id field in accounts_frontend events_stream shredder config (#6308) 2024-10-08 15:17:26 +00:00
Anna Scholtz 9475778423
Add bqetl CLI command for setting partition columns (#6302)
* Add bqetl CLI command for setting partition columns

* Fix tests

---------

Co-authored-by: Alekhya <88394696+alekhyamoz@users.noreply.github.com>
2024-10-07 20:31:24 +00:00
Anna Scholtz a1a12791aa
Update Bigeye warehouse ID (#6297)
* Update Bigeye warehouse ID

* Update test bigeye warehouse ID
2024-10-03 21:12:24 +00:00
kik-kik 4257698cc8
feat(DENG-4602): add BigEye RunMetricOperator to DAG generation (#6285)
* feat: add BigEye RunMetricOperator to DAG generation

* Fix Bigeye DAG generation and move configuration to bqetl_project.yaml

* Reformatting; test fixing

* Install Airflow override requirements

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2024-10-03 15:13:27 +00:00
Ben Wu a9c904df37
Give reader role to dry run account for datasets in stage (#6288)
* Give reader role to dry run account for datasets in stage

* add sql changes to deploy

* hardcode service accounts

* Revert "add sql changes to deploy"

* Remove func in dryrun

* f
2024-10-02 21:55:19 +00:00
Ben Wu 66443eee29
Bug 1920544 Create view to union firefox desktop crashes (#6257) 2024-09-27 17:48:44 +00:00
Anna Scholtz f826580177
Allow specifying a collection for monitored tables (#6256)
* Allow specifying a collection for monitored tables

* Update bigquery_etl/cli/monitoring.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Format bqetl

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2024-09-24 19:39:11 +00:00
Curtis Morales 247729d217
In validation, only read schema files when necessary (#6252)
* Remove deleted table from skip list

* Parse schema file in validate_shredder_mitigation function so it works on priv-bqetl

* Parse schema file in validate_shredder_mitigation function so it works on priv-bqetl

* Clean up test
2024-09-24 17:03:28 +00:00
Lucia 29bb468663
Deng 877 autogenerate checks during shredder mitigation (#6243)
* Reference to the shredder mitigation process during backfills.

* missing dash

* Auto-generate and run data checks. Validate shredder_mitigation label.

* Checks template.

* Fix tests.

* Update checks to include additional EXCEPT, use table_id in staging dataset,  and ensure that tests run and generate a failure that stops the backfill.
2024-09-24 15:35:25 +00:00
Anna Scholtz 66b37ada4a
Add support for secrets when generating Airflow DAGs (#6241)
* Add support for secrets when generating Airflow DAGs

* update test_publish_metadata owner

* Conditional import of Airflow Secrets
2024-09-23 19:58:43 +00:00
Anna Scholtz 61f920cd3e
[Bug 1823724] Add flag to missing columns views to indicate that column exists in schema (#6215)
* Add `column_exists_in_schema` field to structured_missing_columns

* Add column_exists_in_schema to telemetry_missing_columns

* Add UDF to convert column names to be compatible with schema conventions

* Add UDF test for snake_case_columns

* Fix stage deploys for INFORMATION_SCHEMA

* Fix UDF test

* Code review feedback

* Review feedback
2024-09-19 21:12:31 +00:00
Ben Wu 8f493685d2
Set find_glean_targets thread count to 6 in shredder (#6219) 2024-09-18 20:34:17 +00:00
Lucia 074731db7b
CI validation of tables with the shredder_mitigation label (#6217)
* Larger wildcards to reduce the chance of collision with actual values.

* Formatting

* Update bigquery_etl/metadata/validate_metadata.py

Co-authored-by: Ben Wu <12437227+BenWu@users.noreply.github.com>

* Add test for validate_metadata.validate, add profile_id and profile_group_id to id-level_columns file.

* Update tests/cli/test_cli_metadata.py

Co-authored-by: Ben Wu <12437227+BenWu@users.noreply.github.com>

---------

Co-authored-by: Ben Wu <12437227+BenWu@users.noreply.github.com>
2024-09-18 18:44:06 +00:00
Eduardo Filho 68937b4ad0
fix(glam_fog) SUM overflow (#6218)
* fix(glam_fog): Create function histogram_filter_high_values to prevent INT64 overflow

* fix(glam_fog): Use histogram_filter_high_values to avoid overflow
2024-09-18 16:40:41 +00:00
Anna Scholtz 13349e8589
Generate Bigeye monitoring configs in CI (#6194)
* Generate Bigeye monitoring configs in CI

* Ensure Bigconfig files are loaded exactly once

* Avoid duplicate validation of Bigconfig files

* Authentication using API key to Bigeye

* Remove api-key option for Bigeye and rely on env var instead
2024-09-17 16:15:20 +00:00
Lucia 68159a7f1d
Test mitigation (#6205)
* Add default values in template to fix sqlglot parsing error.

* Adding backfill_date to exception message. Formatting.

* Improve getting collumn dtypes by including the schema files. Change custom_query for custom_query_path for readibility.

* DENG_4733. Improve getting column dtypes by including the schema files. Change custom_query for custom_query_path for readibility.

* Formatting.

* Set values back to NULL were corresponds. Improve output information.

* Rename custom_query to custom_query_path to match the expected parameter.

* Missing import

* Larger wildcards to reduce the chance of collision with actual values.
2024-09-17 12:09:17 +00:00
Ben Wu 53f385510c
[DENG-4641] Add support for shredding per sample id to shredder (#6197)
* [DENG-4641] Add support for shredding per sample id to shredder

* Add comment explaining connection_pool_max_size
2024-09-16 17:34:38 +00:00
Anna Scholtz 47b9a51972
Add CLI commands to deploy BigConfig files (#6169)
* Add CLI commands to deploy BigConfig files

* Review feedback

* Ignore bigeye_credentials changes
2024-09-12 20:16:35 +00:00