* chore(deploys): Catch NotFound for missing dataset/project in table deploys
* Update bigquery_etl/deploy.py
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* update test exception text match
---------
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* Auxiliary functions required to generate the query for a backfill with shredder mitigation.
* Exception handling.
* isort & docstrings.
* Apply flake8 to test file.
* Remove variable assignment to different types.
* Make search case insensitive in function.
* Add test cases for function and update naming in a funcion's parameters for clarity.
* Update bigquery_etl/backfill/shredder_mitigation.py
Co-authored-by: Leli <33942105+lelilia@users.noreply.github.com>
* Add test cases for missing parameters or not matching parameters where expected. minimize the calls for get_bigquery_type().
---------
Co-authored-by: Leli <33942105+lelilia@users.noreply.github.com>
* Remove percentiles
* Remove tests that test percentiles
* Refresh scripts insert null to new percentiles
* Remove percentile columns from queries and schemas
* Delete more percentile tables
* Formatting
* histogram_cast_struct's keys are strings
* Re-add test after fixing failure cause
* Add fully-qualified identifiers when formatting queries
* Fully-qualified identifiers for queries in sql/
* Check in only formatted SQL to generated-sql branch
* Add comment
* Fully qualify more tables
* Fully qualify test files
* Formatting improvements around CTEs and unit tests
* Option to skip auto qualifying queries
* Limit `derived_view_schemas` SQL generator to actual view directories.
* Fix the `derived_view_schemas` SQL generator to get the view schemas by dry-running their latest SQL and/or from their latest `schema.yaml` file. Getting the schema from the currently deployed view wasn't appropriate because it wouldn't reflect the latest view code.
* Rename `View.view_schema` to `View.schema`.
* Change `View` so its schema dry-runs use the cloud function (CI doesn't have permission to run dry-run queries directly).
* Apply partition column filters in view dry-run queries when possible for speed/efficiency.
* Don't allow missing fields to prevent view schema enrichment.
* Only copy column descriptions during view schema enrichment.
* Only try enriching view schemas from their reference table `schema.yaml` files if those files actually exist.
* Change `main_1pct` view to select directly from the `main_remainder_1pct_v1` table, so the `derived_view_schemas` SQL generator can detect the partition column to use and successfully dry-run the view to determine its schema.
* Formalize the order `bqetl generate all` runs the SQL generators in.
* Have `bqetl generate all` run `derived_view_schemas` last, in case other SQL generators create derived views.
* Fix `Schema._traverse()` to only recurse if both fields are records.
* Fix `format_timedelta` function's parsing of negative timedeltas.
The entire timedelta can be negative.
* Refactor to use a single timedelta regular expression.
* Fix typo in `format_timedelta` function argument.
* feat(query-backfill): adding a more flexible approach to overriding scheduling attributes
* Update tests/cli/test_cli_query.py
Co-authored-by: Alexander <anicholson@mozilla.com>
* feat(query.py): adding cleaner override logic per PR comments and also cleaning up comments and rogue print statement
---------
Co-authored-by: Alexander <anicholson@mozilla.com>
* initial commit for desktop_retention_clients view
* initial commit for desktop_retention_clients view - after formatting
* initial add of query script for telemetry_derived.retention_v1 table
* change view to table as I need to use the submission_date query parameter
* move view to query for desktop_retention_clients
* reformat file and add COALESCE on submission_date for new_profiles
* take out app_name from retention_v1 query. Change isp_name to isp, take out mozfun UDF on normalized_os_version in CTE and update main query from retention_clients
* run formatting on retention_v1/query.sql
* move files from retention to desktop_retention, retention_clietns to desktop_retention_clients
* add newline to end of schema.yaml file
* reinstate retention_v1 deprecated folder/metadata.yaml file
* refactor desktop_retention_clients_v1/desktop_retention metadata.yaml - add in clustering, take out extra space indent
* refactor metadata.yaml, add schema.yaml, change query.sql to pull from telemetry_derived.desktop_retention_clients_v1 not view
* add in metric_date for desktop_retention_v1 partition, take out require partition date filtering for both retention and retention_clients
* add desktop_retention_model to dags.yaml
* remove retention_clients_v1
* change column name is_new_profile to new_profile_metric_date
* take out 'app_name' from group by
* add telemetry_derived.desktop_retention_clients_v1 to shredder config