Граф коммитов

96 Коммитов

Автор SHA1 Сообщение Дата
Curtis Morales 247729d217
In validation, only read schema files when necessary (#6252)
* Remove deleted table from skip list

* Parse schema file in validate_shredder_mitigation function so it works on priv-bqetl

* Parse schema file in validate_shredder_mitigation function so it works on priv-bqetl

* Clean up test
2024-09-24 17:03:28 +00:00
Lucia 074731db7b
CI validation of tables with the shredder_mitigation label (#6217)
* Larger wildcards to reduce the chance of collision with actual values.

* Formatting

* Update bigquery_etl/metadata/validate_metadata.py

Co-authored-by: Ben Wu <12437227+BenWu@users.noreply.github.com>

* Add test for validate_metadata.validate, add profile_id and profile_group_id to id-level_columns file.

* Update tests/cli/test_cli_metadata.py

Co-authored-by: Ben Wu <12437227+BenWu@users.noreply.github.com>

---------

Co-authored-by: Ben Wu <12437227+BenWu@users.noreply.github.com>
2024-09-18 18:44:06 +00:00
Lucia 9fc513ba9f
Generate query with shredder mitigation (#6060)
* Auxiliary functions required to generate the query for a backfill with shredder mitigation.

* Exception handling.

* isort & docstrings.

* Apply flake8 to test file.

* Remove variable assignment to different types.

* Make search case insensitive in function.

* Add test cases for function and update naming in a funcion's parameters for clarity.

* Update bigquery_etl/backfill/shredder_mitigation.py

Co-authored-by: Leli <33942105+lelilia@users.noreply.github.com>

* Add test cases for missing parameters or not matching parameters where expected. minimize the calls for get_bigquery_type().

* Encapsulate actions to generate and run custom queries to generate the subsets for shredder mitigation.

* Query template for shredder mitigation.

* Query template for shredder mitigation and formatting.

* Add check for "GROUP BY 1, 2, 3", improve code readibility, remove unnecesary properties in classes.

* Test coverage. Check for "GROUP BY 1, 2, 3", improve readibility, remove unrequired properties in class Subset.

* Increase test coverage. Expand DataType INTEGER required for UNION queries.

* Increase test coverage. Expand DataType INTEGER required for UNION queries.

* Separate INTEFER and NUMERIC types.

* Move util functions and convert method to property, both to resolve a circular import. Adjust tests. Update function return and tests.

* Adding backfill_date to exception message. Formatting.

* Adding backfill_date to exception message. Formatting.

---------

Co-authored-by: Leli <33942105+lelilia@users.noreply.github.com>
2024-09-06 16:21:56 +02:00
Anna Scholtz 7260510cc3
Add monitoring metadata support (#6152)
Co-authored-by: Alekhya <88394696+alekhyamoz@users.noreply.github.com>
2024-09-04 13:03:59 -04:00
Lucia 851ac84f17
Auxiliary functions for shredder mitigation (#6002)
* Auxiliary functions required to generate the query for a backfill with shredder mitigation.

* Exception handling.

* isort & docstrings.

* Apply flake8 to test file.

* Remove variable assignment to different types.

* Make search case insensitive in function.

* Add test cases for function and update naming in a funcion's parameters for clarity.

* Update bigquery_etl/backfill/shredder_mitigation.py

Co-authored-by: Leli <33942105+lelilia@users.noreply.github.com>

* Add test cases for missing parameters or not matching parameters where expected. minimize the calls for get_bigquery_type().

---------

Co-authored-by: Leli <33942105+lelilia@users.noreply.github.com>
2024-08-05 20:02:16 +02:00
Winnie Chan 6bf63b0dd4
DENG-4283: Updated managed backfills backup table name (#5908)
* Updated back up table name

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2024-07-25 14:47:11 -07:00
Winnie Chan c58c7fbd5c
DENG-3869 Create schema before deploying backfill staging table (#5643)
* Added schema from query file

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2024-06-07 12:47:09 -07:00
Winnie Chan 11b891febb
Added dataset id (#5721) 2024-06-04 13:16:21 -07:00
Alexander 0cd4295478
chore: refactor schema deploys, add and use deploy utils (#5674)
* chore: refactor schema deploys, add and use utils

* Update tests

* Add deploy tests

* Use string representation of table object in log statements
2024-05-30 16:23:53 -04:00
Winnie Chan ce9b8c40c1
DENG-3719: Allow setting billing project for managed backfills (#5605)
* Added default billing project and param
2024-05-21 12:27:55 -07:00
Braunk bd5ffe4916
feat(query-backfill): adding a more flexible approach to overriding scheduling attributes (#5540)
* feat(query-backfill): adding a more flexible approach to overriding scheduling attributes

* Update tests/cli/test_cli_query.py

Co-authored-by: Alexander <anicholson@mozilla.com>

* feat(query.py): adding cleaner override logic per PR comments and also cleaning up comments and rogue print statement

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2024-05-15 15:16:51 -05:00
Winnie Chan 78151e2b15
DENG-3680: Added depends on past error handling in managed backfills (#5551)
* Added depends on past error
---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2024-05-10 15:43:57 -07:00
Ben Wu b1cbbbf700
Add billing project option for bqetl queries and backfills (#5442) 2024-05-09 12:13:10 -04:00
Winnie Chan b60bc09aa6
DENG-2928 Add metadata.yaml support for static tables in bqetl (#5105)
* Added metadata yaml files

Co-authored-by: Sean Rose <srose@mozilla.com>
2024-04-25 11:27:39 -07:00
Winnie Chan 31afb2e282
DENG-3443: Added backfill initiate test (#5383)
* Added backfill initiate tests

* Update bigquery_etl/cli/backfill.py

Co-authored-by: Alexander <anicholson@mozilla.com>

* Updated parallelism to 16

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2024-04-19 09:37:50 -07:00
Braunk e87ad28626
feat(query.py): adding the ability to override scheduling parameters for backfills since backfill can only parse one kind of date parameter (#5365)
* feat(query.py): adding the ability to override scheduling parameters for backfills since backfill can only parse one kind of date parameter

* Update tests/cli/test_cli_query.py

Co-authored-by: Alexander <anicholson@mozilla.com>

* Update bigquery_etl/cli/query.py

Co-authored-by: Alexander <anicholson@mozilla.com>

* feat(query.py): adding back code needed for scheduling_parameters initial value

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2024-04-16 15:07:41 -05:00
Anna Scholtz eac0ac80c2
Remove telemetry_derived init.sql files (#5342)
* Remove init.sql files for telemetry_derived queries

* Remove init.sql for events_daily

* Remove init.sql from skip lists

* Remove init.sql references from tooling

* Add schema for baseline_clients_first_seen
2024-04-10 15:36:30 -07:00
Alexander ebbe30632e
fix(managed-backfills) Incorrect type for partition string when completing backfill (#5311) 2024-04-02 14:49:56 -04:00
Alexander 5ce5c20265
fix(managed-backfills): fix date formats from CLI and from backfill initiate (#5282)
* fix(managed-backfills): fix date formats from CLI and from backfill initiate

* Update tests
2024-03-26 17:11:17 -04:00
Winnie Chan 325d982f31
DENG-1019: Cleaned backfill validations (#5248)
* Cleaned backfill validations
2024-03-22 14:54:13 -07:00
Winnie Chan 33f9017c75
DENG-2823: Added deprecate cli command (#5219)
* Added deprecate cli command

* Fixed typo

* Fixed failed tests

* Fixed deletion date label

* Update bigquery_etl/metadata/parse_metadata.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Fixed deletion date

* Fixed arguments optional

* Added return back

* Added invalid deletion date test

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2024-03-19 11:17:32 -07:00
Alexander 8fc842d5ae
Write backfills json even if no backfills to process (#5209) 2024-03-14 12:04:42 -04:00
Alexander 47196d3ba3
DENG-2950 - Support getting scheduled backfills that need processing as well (#5170) 2024-03-08 12:42:35 -05:00
Alexander 027eb69562
DENG-2950 - Rename drafting -> initiate, modify json output of scheduled command (#5164)
* Update json write for scheduled commands to include date and watchers for DAG

* Rename drafting to initiate
2024-03-05 09:12:47 -05:00
Winnie Chan 6c501a620c
Deng-2845: Remove default deprecated false (#5118)
* Removed deprecated field

* Removed deprecated false in metadata yaml

* Fixed test
2024-02-27 10:33:50 -08:00
Winnie Chan 2387660ab7
DENG-822: Validate workgroups in backfills (#5081)
* Updated workgroup validations

* Fixed indentation in config.yml

* removed sys exit

Co-authored-by: Alexander <anicholson@mozilla.com>

* raise value error

Co-authored-by: Alexander <anicholson@mozilla.com>

* added workgroup constant

* Fixed value error

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2024-02-23 12:54:38 -08:00
Winnie Chan 8ec7516157
Issue 4135: Added publish metadata cli command (#5011)
* Added publish metadata cli command

* Removed publish metadata script
2024-02-12 11:12:14 -08:00
Anna Scholtz b0a1a32246
Speed up generate-sql (#4921)
* Speed up glean_usage generator

* Refactoring
2024-01-31 12:08:29 -08:00
Anna Scholtz 073b1f050d
Fix dataset deprecation metadata (#4874)
* Update dataset workgroup_access when deprecated: true

* Update deprecation metadata tests

* Add metadata.yaml files in telemetry_derived for tables that are managed through other tooling

* Deprecate telemetry_derived datasets
2024-01-26 22:03:54 +00:00
Frank Bertsch 401b8e7351
Use state_values_v2 for Android LTV pipeline (#4887)
* Don't try to write existing view files

* Use state_values_v2 for client ad click predictions

* Normalize countries in client_ltv

* Don't get view if unavailable

* Add test for new version of existing table

* Fully qualify tables in view defn
2024-01-26 13:56:39 -05:00
Alexander 6c5e59634e
Support backfilling unpartitioned tables and non-date query parameters (#4769)
* Initial commit

* Support non-date parameters and formatting
2024-01-10 13:37:33 -05:00
Alexander f045e9d849
Support offset backfills, require metadata (#4627)
* Skip backfills for queries without metadata.yaml

* Support date_partition_offset

* Fixed exclude, modified exception

* Add test for offset backfill

* Apply suggestions from code review

Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>

* Formatting

---------

Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>
2023-12-05 14:07:09 -05:00
Anna Scholtz 35ae323487
Funnel generators POC (#4390)
* Add funnel generation logic

* Example funnel config

* Fix funnel columns

* funnel generation dimensions

* Optimize segmenting generated funnels

* Add funnel generation docs

* Schedule generated funnels

* Skip DAGs with no tasks

* Add background info funnel generator

* Add funnel generation tests

* Fix join_previous_step_on

* Add funnel example config
2023-10-12 14:05:08 -07:00
Anna Scholtz 3f79cc5151
Generate soft etl checks (#4268)
* Add markers to check cli command to differentiate warning from hard failures

* Fix CI issues

* Fix dag generation

* Incorporate Feedback

* Generate Airflow tasks for #fail and #warn checks

---------

Co-authored-by: Alekhya Kommasani <akommasani@mozilla.com>
Co-authored-by: Alekhya <88394696+alekhyamoz@users.noreply.github.com>
2023-09-13 10:22:39 -07:00
Alekhya 2e916eb856
DENG1381 - Add bqetl support for deprecation metadata (#4213)
* Support bq dataset deprecation process (metadata)

* Add bqetl metadata cli command

* Initial draft for adding deprecation support to bqetl

* Incorporate Anna's feedback

* Fix based on whd's feedback

* Fix ci issues

* Remove unnecessary logic from metadata.py

* Add dataset metadata yaml for ga_derived

* Ignore dirs that do not have dataset_metadata yaml

* Remove unwanted dataset metadata yamls

* Update bigquery_etl/cli/metadata.py

Co-authored-by: whd <whd@users.noreply.github.com>

---------

Co-authored-by: whd <whd@users.noreply.github.com>
2023-09-12 18:47:54 +00:00
Lucia 27262acdfd
Default DAG for bqetl queries (#4143)
* DENG-1314 Implement changes to bqetl and create default DAG.

* DENG-1314. Update Documentation.

* DENG-1314. Dummy query to enable generating DAG and run tests.

* DENG-1314. Update tests.

* Update bigquery_etl/cli/query.py

Raise exception when scheduling information is missing.

Co-authored-by: Daniel Thorn <dthorn@mozilla.com>

* DENG-1314. Update tests.

* DS-3054. Update query creation to set bqetl_default as default value for --dag. Update tests.

* Default task and tests update.

* Default task and tests update.

* 3650 - Remove default DAG option, update DAG template comment & tests.

* 3650 - Condition for DAG warning.

* 3650 - Update docs.

* Clarification on sql/moz-fx-data-shared-prod/analysis/bqetl_default_task_v1/metadata.yaml

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Update docs/cookbooks/creating_a_derived_dataset.md

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

---------

Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
Co-authored-by: Daniel Thorn <dthorn@mozilla.com>
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-08-29 14:32:52 +02:00
Alexander 05ab70070f
DENG-899 - Add json write functionality to bqetl schedule command (#4139)
* DENG-899 - Add json write functionality to bqetl schedule command

* Patch client.get_table so we don't need access
2023-08-15 12:57:38 -04:00
kik-kik b927ed22be
feat(DENG-949): Added `render` subcommand and `--dry-run` flag to the bqetl check command (#4045)
* added render subcommand to the bqetl check command

* added a dry_run flag to bqqetl check run command

* added a test to make sure run command exists with status code 0

* added test for check render subcommand

* fixing linter checks

* attempting using an alternative way of testing the render command

* fixing render test by testing the _render() directly rather than the render cli wrapper

* removed dead test

* Apply suggestions from code review by ascholtz

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* fixed black and mypy errors

* fixed app_store_funnel_v1 check formatting

* reformatted tests checks

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-08-09 16:39:47 +02:00
Winnie Chan 91882dd150
DENG-806 Added backfill process cli command (#3936)
* Added backfill process command
2023-07-10 16:13:42 -07:00
Glenda Leonard c69fee0b5f
DENG-941 initial impl of check rendering and execution. (#3885)
* initial impl

* Updated based on PR feedback

* Moved check from query to separate command

* Expanded from --partition option to generic --parameter option

* Removed `query check` command (check moved to new command)

* Update bigquery_etl/cli/check.py

remove date param format check

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Removed 'parameter' parameter, everything is passed through ctx.args and then converted to a dict for Jinja rendering.  There are no restrictions on ctx.args values.

* Merge error

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-06-13 19:31:59 +00:00
Winnie Chan b9d01ca959
DENG-990 Refractor backfill cli commands (#3924)
* Refractored backfill cli commands

* Adjusted  validate command
2023-06-12 17:24:35 +00:00
Winnie Chan 58c96b4246
DENG-815 Add backfill info cli command (#3915)
* Added backfill info command

* Update bigquery_etl/cli/backfill.py

Co-authored-by: Alexander <anicholson@mozilla.com>

* Fixed status click choice

* Added backfill str method

* Added new backfill utils files

* Update bigquery_etl/cli/backfill.py

Co-authored-by: Alexander <anicholson@mozilla.com>

* Update bigquery_etl/cli/backfill.py

Co-authored-by: Alexander <anicholson@mozilla.com>

* Removed status default

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2023-06-09 17:32:32 +00:00
Winnie Chan 071c53e4cb
DENG-803/805: Create & Validate backfill cli commands (#3760)
* Added backfill create and validate cli ommand

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>
2023-06-01 10:06:09 -07:00
Daniel Thorn a0d810275b
Remove java dependency in favor of sqlglot (#3755) 2023-05-17 14:56:42 -07:00
Lucia c34653778f
DENG-774 Change Control for active_users_aggregates (#3687)
* DENG-774 Add change control to active_users_aggregates and test.

* DENG-774 Add test coverage.
---------

Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
2023-04-14 15:49:19 +02:00
Anna Scholtz 08b45a40fe
Jinja queries support (#3691)
* Support Jinja templating in query files

* Formatting for Jinja

* ./bqetl query render command

* Fix running templates
2023-03-30 11:00:12 -07:00
kik-kik 51def19185
Bug 1825545 - Revert "Support Jinja templating in query files (#3685)" (#3689)
Bug 1825545 - This reverts commit a1c51124ec.
2023-03-30 10:47:17 -04:00
Anna Scholtz a1c51124ec
Support Jinja templating in query files (#3685)
* Support Jinja templating in query files

* Formatting for Jinja

* ./bqetl query render command
2023-03-29 10:38:08 -07:00
Anna Scholtz 22e54ccb5f
[Bug 1812301] Publish only string typed labels (#3530)
* [Bug 1812301] Publish only string typed labels

* Document label publishing
2023-01-26 09:24:49 -08:00
Daniel Thorn c281400486
Enforce isort via pytest (#3384) 2022-11-30 11:45:05 -08:00