bigquery-etl

Граф коммитов

Автор	SHA1	Сообщение	Дата
Alexander	f5ee129b63	Restrict derived view schema generation to views with upstream schema files and directly copy reference schemas for simple views. (#4848 ) * Refactor * Copy reference schema directly if it's available * Refactor default view code	2024-01-19 16:24:50 -05:00
Sean Rose	a912c28c68	Fix `bqetl stage` to create parent dataset for stored procedures. (#4863 )	2024-01-19 12:46:35 -08:00
kik-kik	5c6f1429fb	feat(DENG-1590): added existing fxa tables to shredder config (#4851 ) * added existing fxa tables to shredder config * Apply suggestions from code review Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * removing some of the fxa tables from the config as suggested by srose --------- Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>	2024-01-19 12:56:28 +01:00
Anna Scholtz	ece50f6d2c	Fix duplicate wait_for tasks in public data JSON DAG (#4849 ) Co-authored-by: Katie Windau <153020235+kwindau@users.noreply.github.com>	2024-01-18 08:55:53 -06:00
kik-kik	573d5e2658	added firefox_ios_derived.firefox_ios_clients_v1 to shredder config (#4852 )	2024-01-18 15:33:10 +01:00
kik-kik	0bfc394689	added fenix client funnels to shredder config (#4833 )	2024-01-18 11:48:48 +01:00
Alexander	fe62e09781	Remove remaining mentions of no_partition (#4803 )	2024-01-17 10:38:47 -05:00
Sean Rose	9aea89370b	Add `fxa_delete_events_v2` ETL based on FxA logs from GCP (#4843 ) * Add `fxa_delete_events_v2` ETL based on FxA logs from GCP. * Add `fxa_delete_events` view combining `fxa_delete_events_v1` and `fxa_delete_events_v2` data. * Use `fxa_delete_events` view for Shredder. * Update sql/moz-fx-data-shared-prod/firefox_accounts_derived/fxa_delete_events_v2/metadata.yaml --------- Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>	2024-01-17 14:34:54 +01:00
Sean Rose	3062b502f9	Escape underscores in `LIKE` patterns (#4810 )	2024-01-11 17:21:24 -08:00
Sean Rose	1d1addb86c	Indent join conditions (#4223 ) * Indent join conditions. * Put parentheses around some `BETWEEN ... AND ...` join conditions.	2024-01-11 15:50:26 -08:00
Anna Scholtz	826e1881c0	Add skip-existing option to ./bqetl query initialize (#4792 ) * Add skip-existing option to ./bqetl query initialize * Handle initialization exceptions and refactor skip-existing check * Refactoring of ./bqetl initialization * Add --force option to ./bqetl initialize * Update bigquery_etl/cli/query.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * Update bigquery_etl/cli/query.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * Update bigquery_etl/cli/query.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * Update bigquery_etl/cli/query.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * Update bigquery_etl/cli/query.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> --------- Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>	2024-01-10 11:00:54 -08:00
Alexander	6c5e59634e	Support backfilling unpartitioned tables and non-date query parameters (#4769 ) * Initial commit * Support non-date parameters and formatting	2024-01-10 13:37:33 -05:00
Sean Rose	7bc55cfc8b	Handle Jinja whitespace control characters in `bqetl format` (#4784 ) * Handle Jinja whitespace control characters in `bqetl format`. * Use default formatting for Jinja in `bigquery_usage_v2` ETL. * Reformat `sql_generators/active_users/templates/mobile_checks.sql`.	2024-01-10 10:07:21 -08:00
Leli	beea0fd9e9	follow up for #4777 (#4778 ) * DAG docs - fix broken links and add tags to docs * change tests * remove empty line * fix typo * fix second test template * add if case for private-bigquery-etl	2024-01-04 20:15:10 +01:00
Leli	84e1188b15	DAG docs - fix broken links and add tags to docs (#4777 )	2024-01-04 18:31:36 +01:00
Alexander	fab7e04764	Remove dryrun from view validation in CI (#4774 ) * Remove dryrun from view validation * Remove access denied view validation skips	2024-01-04 12:06:02 -05:00
Alexander	e0996c20cd	Bqetl on rich-cli ✨💸 (#3775 )	2024-01-03 11:30:54 -05:00
Alexander	7a80984757	DENG-1193 Deprecate generated dataset docs (#4657 )	2023-12-15 12:17:26 -05:00
Jan-Erik Rediger	1a49bda54c	Add `syndication` as possible metadata field (#4715 ) `91acdfce70` added this, which in turn broke (at least) doc generation.	2023-12-15 11:26:03 +01:00
Alexander	463dc15bf1	Support shared-prod UDFs (#4708 )	2023-12-14 13:45:13 -05:00
Daniel Thorn	b0bfc65052	DENG-965 - symbolicate and signaturize crash pings (#4642 )	2023-12-12 08:57:52 -08:00
Alexander	776c590db2	ci-fix Ignore dataset.update required permissions when dryrunning authorized views (#4681 ) * Refactor, add typehint * Add datasets.update clause denied for authorized views	2023-12-11 14:52:19 -05:00
Anna Scholtz	c31ae16efb	Revert "Define `event_monitoring_live_v1` views in `view.sql` files (#4576 )" (#4680 ) This reverts commit `2c4cc5eefe`.	2023-12-11 10:15:30 -08:00
Sean Rose	2c4cc5eefe	Define `event_monitoring_live_v1` views in `view.sql` files (#4576 ) * Define `event_monitoring_live_v1` views in `view.sql` files. So they get automatically deployed by the `bqetl_artifact_deployment.publish_views` Airflow task. * Support materialized views in view naming validation. * Handle `IF NOT EXISTS` in view naming validation. * Use regular expression to extract view ID in view naming validation. This simplifies the logic and avoids a sqlparse bug where it doesn't recognize the `MATERIALIZED` keyword. * Update other view regular expressions to allow for materialized views.	2023-12-08 11:54:02 -08:00
Sean Rose	308822d7cf	Have `bqetl query` commands fail if they don't find a matching query (#4662 ) * Have `bqetl query` commands fail if they don't find a matching query. * Update `test_run_query_no_query_file` test.	2023-12-07 16:57:11 -08:00
Alexander	f045e9d849	Support offset backfills, require metadata (#4627 ) * Skip backfills for queries without metadata.yaml * Support date_partition_offset * Fixed exclude, modified exception * Add test for offset backfill * Apply suggestions from code review Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com> * Formatting --------- Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>	2023-12-05 14:07:09 -05:00
kik-kik	076a0e0775	feat(DENG-2083): added firefox_ios_derived.clients_activation_v1 and corresponding view (#4631 ) * added firefox_ios_derived.clients_activation_v1 and corresponding view * fixing a missing seperator in firefox_ios_derived.clients_activation_v1 checks * adding firefox_ios_derived.clients_activation_v1 to shredder configuration * removed is_suspicious_device_client as it should not be there, thanks bani for pointing this out * fixed black formatting error inside shredder/config.py * applied bqetl formatting * minor styling tweak as suggested by bani in PR#4631	2023-12-05 12:42:39 +01:00
Eduardo Filho	0bf4c279d6	GLAM avoid scientific notation for big sample counts (#4647 ) * GLAM avoid scientific notation for big sample counts * Cast to bignumeric instead of numeric	2023-12-04 17:47:50 -05:00
Anna Scholtz	68ece978e0	Resolve correct task_id for tasks nested in a group (#4637 )	2023-12-01 11:38:59 -08:00
kik-kik	639381f13d	firefox_ios source added to shredder config (#4638 )	2023-12-01 17:48:56 +01:00
kik-kik	9409d2b6cb	feat(DENG-1774 / cancelled): deleting fenix_derived/firefox_android_clients_v2, v1 will remains the active model (#4610 ) * deleting fenix_derived/firefox_android_clients_v2, v1 will remain the active model * removed fenix_derived.firefox_android_clients_v2 from shredder config	2023-12-01 11:16:54 +01:00
Anna Scholtz	7087dbff30	Separate Airflow tasks for glean_usage (#4588 ) * Add support for assigning Airflow tasks to task groups * Generate separate Airflow tasks for glean_usage * Remove Airflow dependencies from old glean_usage tasks	2023-11-30 09:48:17 -08:00
Eduardo Filho	ec297972c6	Glam accounts for sampling when calculating sample_count for windows & release probes (#4581 ) * Glam - fix legacy windows & release probes' sample count going fwd * Glam FOG accounts for sampling when calculating total_sample for windows & release probes * fog - fix client count and sample count * Add channel filtering for fog	2023-11-23 17:06:20 -05:00
Lucia	fe2bf1d2de	DS-3361. Update documentation of initialize command. (#4592 ) Co-authored-by: Lucia Vargas <lvargas@mozilla.com>	2023-11-21 22:02:46 +01:00
Frank Bertsch	5cf8d30153	Add session date param; fix checks CLI bug (#4579 ) * Fix checks to filter on partitions * Don't print "missing checks file" on success Previously, the statement that checks.sql files were missing was printed on any execution of the for statement. ("else" clauses after "for"s execute after completion of the "for" clause). Instead, we want to print only when there are no files.	2023-11-17 15:33:23 -05:00
Linh Nguyen	c1c73e690e	Make sure that metadata `friendly_name` and `description` are not None (#4513 ) * Fill empty description * Assign a friendly name if the table doesn't have one * Update metadata tests * Update bigquery_etl/metadata/parse_metadata.py Co-authored-by: Alexander <anicholson@mozilla.com> * update test again --------- Co-authored-by: Alexander <anicholson@mozilla.com>	2023-11-17 11:48:11 -05:00
kik-kik	6e4c09a677	added fenix_derived.firefox_android_clients_v2 to shredder config (#4564 )	2023-11-16 11:26:59 +01:00
Sean Rose	e44e5ca705	Generate normal task dependencies from `depends_on` if the task is in the same DAG (#4569 ) * Generate normal task dependencies from `depends_on` if the task is in the same DAG. * Update `metadata.yaml` files to use `depends_on` rather than `upstream_dependencies`.	2023-11-14 16:06:00 -08:00
Lucia	894d42dde1	DS-3054. Support running an initialization query in parallel (#4322 ) * DS-3054. Create functions to support running an initialization query for all sample_ids in parallel. * DS-3054. Update _run_query function. * DS-3054. Use _run_query and mapped values for initialization in parallel. * DS-3054. Unify initialization to run in parallel and get sample_id range from metadata. * DS-3054. Minimize formatting of query template and remove need to modify existing initialization queries. Validate if a query should use parallelized or regular update. * DS-3054. Adding link to caveats. * DS-3054. Update sample_id range for initialization. * DS-3054. Use current implementation of run_query. * DS-3054. Update using a parameter instead of initialization in metadata. * DS-3054. DAG update with new parameter. * Pass parameters before calling _run_query(). * Use --append_tablein favour of INSERT INTO. * DS-3054 Separate parallel and non parallel init, plus some improvements. --------- Co-authored-by: Lucia Vargas <lvargas@mozilla.com>	2023-11-07 20:03:48 +01:00
Anna Scholtz	e7e7eaae06	Set depend_on_past=False for warn checks (#4526 )	2023-11-06 10:39:58 -08:00
kik-kik	0962ba65fe	prefixing schema error message inside dryrun to "ERROR" to make it easier to find when searching logs for cause of exit code 1 (#4522 )	2023-11-06 12:12:50 +01:00
Frank Bertsch	a271c024b2	Dont generate dags in bqetl query schedule command (#4517 )	2023-11-03 08:59:27 -07:00
Anna Scholtz	185f833f2a	Materialized views and aggregated tables for event monitoring (#4478 ) * WIP event monitoring * Add FxA custom events to view definition (#4483) * Add FxA custom events to view definition * Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql * Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql * Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql * Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql --------- Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Move event monitoring to glean_usage generator * Add cross-app event monitoring view * Generate cross app monitoring * Simplyfy event monitoring aggregation --------- Co-authored-by: akkomar <akkomar@users.noreply.github.com>	2023-11-01 14:20:20 -07:00
Frank Bertsch	55c5d412c1	Allow running multiple checks (#4471 ) * Allow running multiple checks * Don't yield anything on no matches	2023-10-24 14:39:01 -04:00
Frank Bertsch	ac0af012c2	Add opt-in to running checks for backfill (#4455 )	2023-10-18 17:34:58 -04:00
akkomar	7a36416554	Set project in init jobs (#4453 ) This fixes https://github.com/mozilla/bigquery-etl/pull/4452	2023-10-18 16:58:18 +02:00
akkomar	0171f93596	Set project in init jobs (#4452 )	2023-10-18 16:04:09 +02:00
akkomar	c3c5ecffd4	Don't set destination table for init jobs (#4451 ) This reverts https://github.com/mozilla/bigquery-etl/pull/4193/files By convention all but two init.sql jobs use `CREATE TABLE` statement. Setting destination table on a job that runs these queries causes an `BadRequest: 400 Cannot set destination table in jobs with DDL statements` error as observed in [1]. Apart from removing setting of destination_table this fixes two init queries. [1] https://workflow.telemetry.mozilla.org/dags/copy_deduplicate/grid?dag_run_id=scheduled__2023-10-17T01%3A00%3A00%2B00%3A00&task_id=baseline_clients_first_seen&tab=logs	2023-10-18 14:45:22 +02:00
Frank Bertsch	164ba19abf	Glean usage checks (#4445 ) * WIP: Add checks for glean_usage * Ignore pycache in autogenerated click cmds * Move check to backfill command * Remove view checks	2023-10-17 17:03:41 -04:00
Sean Rose	4bbbc32a5b	Put assert UDFs in `mozfun` project (#4367 ) * Put assert UDFs in `mozfun` project. * Tweak syntax in `assert.array_equals()` to avoid SQLGlot parsing error. https://github.com/tobymao/sqlglot/issues/2348 * Fix SQL syntax error in `assert.struct_equals()` tests. * Fix UDF dependency file path logic when deploying to stage. * Change regular expressions in `parse_routine` module to allow quotes around routines' dataset and name.	2023-10-13 10:58:42 -07:00
Eduardo Filho	83569d8211	Add sampling to glam-fog (#4409 ) * Add sampling to glam-fog * Simplify count logic * Update bigquery_etl/glam/templates/clients_daily_histogram_aggregates_v1.sql Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Update bigquery_etl/glam/templates/clients_daily_scalar_aggregates_v1.sql Co-authored-by: Anna Scholtz <anna@scholtzan.net> --------- Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2023-10-13 09:24:04 -04:00
Anna Scholtz	35ae323487	Funnel generators POC (#4390 ) * Add funnel generation logic * Example funnel config * Fix funnel columns * funnel generation dimensions * Optimize segmenting generated funnels * Add funnel generation docs * Schedule generated funnels * Skip DAGs with no tasks * Add background info funnel generator * Add funnel generation tests * Fix join_previous_step_on * Add funnel example config	2023-10-12 14:05:08 -07:00
Anna Scholtz	61da5cca03	Respect sql_dir in dryrun skip (#4334 ) * Respect sql_dir in dryrun skip * Update bigquery_etl/dryrun.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * Update bigquery_etl/dryrun.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * Set sql_dir when using Schema.from_query_file() --------- Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>	2023-10-12 13:27:54 -07:00
Anna Scholtz	3a8c6a9426	Filter files with multiple suffixes in stage deploy (#4403 )	2023-10-10 15:02:54 -07:00
Daniel Thorn	cd3aaabb66	Remove main summary and main_v4 from shredder (#4388 )	2023-10-06 10:07:25 -07:00
Sean Rose	191eded481	Shred four more FxA tables. (#4376 ) * moz-fx-data-shared-prod.firefox_accounts_derived.events_daily_v1 * moz-fx-data-shared-prod.firefox_accounts_derived.funnel_events_source_v1 * moz-fx-data-shared-prod.firefox_accounts_derived.fxa_log_device_command_events_v1 * moz-fx-data-shared-prod.firefox_accounts_derived.fxa_log_device_command_events_v2	2023-10-03 13:18:43 -07:00
Mike Williams	124a8613cc	fix DENG-1091: automatically add triage/confidential to private DAGs (#4363 ) Co-authored-by: Marlene Hirose <92952117+Marlene-M-Hirose@users.noreply.github.com> Co-authored-by: lelilia	2023-09-29 13:15:35 -04:00
Curtis Morales	6e32c52e2c	Don't retry check tasks (#4359 ) * Don't retry check tasks * Update test * Fix one more test	2023-09-28 15:23:23 -04:00
Daniel Thorn	ea05e6c6dc	Bug 1852630 - Rename main_remainder_v4 to main_v5 (#4353 ) and point at new copy_deduplicate tasks for similar pings	2023-09-28 09:08:55 -07:00
Daniel Thorn	d0cc8dfbe8	Add main_v5 et al to shredder (#4352 )	2023-09-27 14:26:27 -07:00
akkomar	3ae03d6861	Update split main ping queries parent task (#4347 ) This is required after these queries were moved out of copy_deduplicate_all in https://github.com/mozilla/telemetry-airflow/pull/1822	2023-09-26 16:10:57 +02:00
Sean Rose	e04e314f46	Shred `firefox_accounts_derived.fxa_gcp_*_events_v1`. (#4341 )	2023-09-25 09:22:43 -07:00
Anna Scholtz	1b6e598c9e	Publish private-bigquery-etl DAGs to private-generated-sql (#4319 )	2023-09-19 08:18:32 -07:00
Anna Scholtz	3f79cc5151	Generate soft etl checks (#4268 ) * Add markers to check cli command to differentiate warning from hard failures * Fix CI issues * Fix dag generation * Incorporate Feedback * Generate Airflow tasks for #fail and #warn checks --------- Co-authored-by: Alekhya Kommasani <akommasani@mozilla.com> Co-authored-by: Alekhya <88394696+alekhyamoz@users.noreply.github.com>	2023-09-13 10:22:39 -07:00
Alekhya	2e916eb856	DENG1381 - Add bqetl support for deprecation metadata (#4213 ) * Support bq dataset deprecation process (metadata) * Add bqetl metadata cli command * Initial draft for adding deprecation support to bqetl * Incorporate Anna's feedback * Fix based on whd's feedback * Fix ci issues * Remove unnecessary logic from metadata.py * Add dataset metadata yaml for ga_derived * Ignore dirs that do not have dataset_metadata yaml * Remove unwanted dataset metadata yamls * Update bigquery_etl/cli/metadata.py Co-authored-by: whd <whd@users.noreply.github.com> --------- Co-authored-by: whd <whd@users.noreply.github.com>	2023-09-12 18:47:54 +00:00
Anna Scholtz	cb9eff55fb	Handle references to INFORMATION_SCHEMA when deploying to stage (#4233 )	2023-09-12 09:49:49 -07:00
Sean Rose	d33db5b00f	Don't quote wildcard tables twice when updating stage references. (#4227 )	2023-08-31 17:11:43 -07:00
Anna Scholtz	f1f552ef47	Fix publishing udfs that use backticks in identifiers (#4225 ) * Fix publishing udfs that use backticks in identifiers * Update bigquery_etl/routine/parse_routine.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> --------- Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>	2023-08-31 16:00:44 -07:00
Alexander	3c7f95e314	Skip tables with all filtered backfill entries (#4217 )	2023-08-30 10:54:09 -04:00
Lucia	27262acdfd	Default DAG for bqetl queries (#4143 ) * DENG-1314 Implement changes to bqetl and create default DAG. * DENG-1314. Update Documentation. * DENG-1314. Dummy query to enable generating DAG and run tests. * DENG-1314. Update tests. * Update bigquery_etl/cli/query.py Raise exception when scheduling information is missing. Co-authored-by: Daniel Thorn <dthorn@mozilla.com> * DENG-1314. Update tests. * DS-3054. Update query creation to set bqetl_default as default value for --dag. Update tests. * Default task and tests update. * Default task and tests update. * 3650 - Remove default DAG option, update DAG template comment & tests. * 3650 - Condition for DAG warning. * 3650 - Update docs. * Clarification on sql/moz-fx-data-shared-prod/analysis/bqetl_default_task_v1/metadata.yaml Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Update docs/cookbooks/creating_a_derived_dataset.md Co-authored-by: Anna Scholtz <anna@scholtzan.net> --------- Co-authored-by: Lucia Vargas <lvargas@mozilla.com> Co-authored-by: Daniel Thorn <dthorn@mozilla.com> Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2023-08-29 14:32:52 +02:00
kik-kik	7afc4c44f1	docs(DENG-960): bqetl data checks cli docs (#4200 ) * Small tweaks made to the cli cmds comments / help display for data checks * added usage docs to data_checks reference docs * Apply suggestions from code review provided by scholtzan Co-authored-by: Anna Scholtz <anna@scholtzan.net> --------- Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2023-08-24 17:24:23 +02:00
Lucia	7d5f17c1aa	Add destination table when initializing query. If not added, data is initialized in a temporary table. (#4193 ) Co-authored-by: Lucia Vargas <lvargas@mozilla.com> Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2023-08-16 18:26:11 +02:00
Alexander	05ab70070f	DENG-899 - Add json write functionality to bqetl schedule command (#4139 ) * DENG-899 - Add json write functionality to bqetl schedule command * Patch client.get_table so we don't need access	2023-08-15 12:57:38 -04:00
Winnie Chan	cb0cad35e7	Added firefox_android_clients_v1 to shredder (#4141 ) Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com>	2023-08-14 09:55:20 -07:00
Anna Scholtz	647ff690f7	[DENG-1107] Correctly resolve upstream dependencies when using checks.sql (#4079 )	2023-08-09 15:11:01 -07:00
kik-kik	b927ed22be	feat(DENG-949): Added `render` subcommand and `--dry-run` flag to the bqetl check command (#4045 ) * added render subcommand to the bqetl check command * added a dry_run flag to bqqetl check run command * added a test to make sure run command exists with status code 0 * added test for check render subcommand * fixing linter checks * attempting using an alternative way of testing the render command * fixing render test by testing the _render() directly rather than the render cli wrapper * removed dead test * Apply suggestions from code review by ascholtz Co-authored-by: Anna Scholtz <anna@scholtzan.net> * fixed black and mypy errors * fixed app_store_funnel_v1 check formatting * reformatted tests checks --------- Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2023-08-09 16:39:47 +02:00
Linh Nguyen	147b2dbf37	Include metadata check in view publishing (#4159 ) * Include metadata check in view publishing * Address review	2023-08-07 10:24:56 -04:00
Daniel Thorn	98cb7bb013	Implement simple generic active subscriptions table for KPIs (#4149 )	2023-08-02 12:26:54 -07:00
Winnie Chan	4f6870d260	Fixed backfill query cli destination table name (#4123 ) * Fixed desintation table * Added check for destination table	2023-07-28 09:50:10 -07:00
Winnie Chan	36359804ef	Added fenix_derived tables to shredder (#4137 ) * Added activations to shredder	2023-07-28 08:29:48 -07:00
Lucia	e88bfaa441	Enable using more definitions e.g. macros in scheduling parameters. (#4136 ) Co-authored-by: Lucia Vargas <lvargas@mozilla.com>	2023-07-27 18:09:56 +02:00
Linh Nguyen	64f8599e4d	GLAM: update FOG min total_users (#4122 )	2023-07-25 14:46:07 -04:00
Linh Nguyen	0a1689d6cf	Increase FOG client count minimum filter (#4104 )	2023-07-21 11:54:40 -04:00
betling	99e4072018	Betling history bookmarks (#4092 ) * added history and bookmarks fields * adding automated corrections * some auto schema updates but perhaps not all * Update schemas --------- Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2023-07-19 14:02:51 -04:00
Linh Nguyen	bb2e48fa3b	Remove buildhub2 filter in GLAM templates (#4088 )	2023-07-18 14:37:27 -04:00
Anna Scholtz	e0cf9de09a	Fix schema updates (#4086 )	2023-07-17 11:52:59 -07:00
Alexander	f9ff8022d8	Publish dag name as a label (#4084 )	2023-07-17 11:58:20 -04:00
Winnie Chan	97cb4117ad	DENG-807 Added backfill complete cli command (#4040 ) * Added complete command --------- Co-authored-by: Alexander <anicholson@mozilla.com>	2023-07-13 15:44:41 -07:00
Anna Scholtz	d66bb2a8d9	Convert non_user_facing_dataset_suffixes to tuple when loading from bqetl_project.yaml (#4066 )	2023-07-12 11:12:01 -07:00
akkomar	4ed032cceb	Set ConfigLoader's project directory on module initialization (#4062 )	2023-07-12 17:45:08 +02:00
Anna Scholtz	5c0748cf79	Add missing / for generating docs (#4055 )	2023-07-11 15:55:01 -07:00
Anna Scholtz	3f9181c6e1	Fix getting skipped routines from config (#4054 )	2023-07-11 14:23:59 -07:00
Anna Scholtz	03357769cc	Move view, schema and remaining configs to bqetl_project.yaml (#4051 ) * Move view configs to bqetl_project.yaml * Move schema config to bqetl_project.yaml * Move docs config to bqetl_project.yaml * Replace remaining configs	2023-07-11 13:10:57 -07:00
Anna Scholtz	8d72cfa9fe	Move routine config to bqetl_project.yaml (#4038 )	2023-07-11 10:52:48 -07:00
Glenda Leonard	b71e25bc77	Removed checks.sql from dryrun. (#4050 )	2023-07-11 11:45:49 -04:00
Daniel Thorn	6137048eeb	DS-2642 - Import stripe itemized tax report (#3999 )	2023-07-10 17:22:18 -07:00
Winnie Chan	91882dd150	DENG-806 Added backfill process cli command (#3936 ) * Added backfill process command	2023-07-10 16:13:42 -07:00
Anna Scholtz	3a61fd34bb	Move format skip files to bqetl_project.yaml (#4033 )	2023-07-10 10:10:47 -07:00
kik-kik	9b5c04a7bb	bug(1741487): Rename url2 and related fields in stable views (#4029 ) * Bug 1741487 - Rename url2 and related fields in stable views This removes the following unpopulated fields from Glean views: `metrics.url`, `metrics.text`, `metrics.jwe`, and `metrics.labeled_rate`. If any of these metrics exist in the source table under `2`-suffixed name, it is also aliased to its original name (`url2` to `url` and so on). Suffixed fields are still preserved until view consumers migrate. * Remove redundant comma from generated sql * Ignore missing fields in views if any of them were removed * added a todo comment * Added additional context around why we are excluding some of the non-suffixed fields and why alising to remove suffix 2 from some fields --------- Co-authored-by: Arkadiusz Komarzewski <akomarzewski@mozilla.com> Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2023-07-10 09:31:15 -07:00
Anna Scholtz	b3efbf3c88	Pass date partition parameters to check tasks (#4032 )	2023-07-07 12:36:38 -07:00
Anna Scholtz	dc482ad8d5	[DENG-948] Macro support for data checks (#3993 ) * Add support for check macros * Add min_rows() check macro * Add is_unique() check macro * Add in_range() check macro * Update ssl_ratios DAG * Add test for macro checks * Fix rendering	2023-07-06 14:36:59 -07:00
Anna Scholtz	d9bda0df7e	Add ConfigLoader and move dry run skip to bqetl_project.yaml (#4000 ) * Add ConfigLoader and move dry run skip to bqetl_project.yaml * format tests	2023-07-06 10:42:29 -07:00
Anna Scholtz	3286508bc5	Update view metadata in a single update_table() operation (#4017 )	2023-07-04 10:55:55 -07:00
Sean Rose	352cffedb8	Add `stripe_subscriptions_history_v2` ETL (DENG-974) (#4009 ) * Add `synced_at` column to `stripe_subscriptions_changelog_v1`. * Tweak `stripe_subscriptions_changelog_v1` tax rate and discount joins to only include those that existed when the change happened. * Parse subscription metadata in `stripe_subscriptions_changelog_v1`. * Add `stripe_external.invoice_line_item_v1` ETL. * Add `stripe_subscriptions_revised_changelog_v1` ETL. * Add `stripe_subscriptions_history_v2` ETL.	2023-06-30 14:18:31 -07:00
Anna Scholtz	c3ebf87ccb	Ensure all necessary parameters are passed to DQ checks (#4005 )	2023-06-29 13:40:52 -07:00
Linh Nguyen	7c90d5f8e7	Publish view metadata (#3909 )	2023-06-29 16:28:17 -04:00
Alekhya	9d8e7087ec	Add top_sites and quick_suggest views to skip dryrun (#4004 )	2023-06-29 10:39:27 -05:00
Alekhya	01333782b1	DENG 946 - Update DAG generation to include ETL checks (#3969 ) * CAccomodate dq checks in dag generation * Modify the tests to include dq check * Generate dags to include bigquery_dq_check * rename destination to source for dq check * Add DQ check to download attribution dag * Update bigquery_etl/query_scheduling/templates/airflow_dag.j2 Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Update bigquery_etl/query_scheduling/generate_airflow_dags.py Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Set upstream check dependencies using upstream_dependencies * Change bigquery_dq_check as per gcp.py utils * remove sql_file_path in airflow jinja * Fix download attribution dag --------- Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2023-06-28 13:50:41 -04:00
Curtis Morales	b7b1b835ba	Add trigger_rule as an option for generated airflow tasks (#3772 ) * Add trigger_rule as an option for generated airflow tasks * Add test * Move trigger rule options to enum and add to documentation	2023-06-27 13:58:52 -04:00
Sean Rose	01e982a366	Shred `firefox_accounts_derived.fxa_stdout_events_v1`. (#3982 )	2023-06-23 15:02:01 -07:00
Leli	ca2c84e9b5	Add to bqetl docs on query run (#3879 ) * change examples for bqetl query run and remove weird indentation from code examples * actually upload the correct file * note on order of parameters	2023-06-22 13:52:45 +00:00
kik-kik	6eb0647238	feat(DENG-722): preparing for fxa AWS to GCP migration (nonprod) (#3882 ) * added a new table for the new nonprod fxa backend events and a fxa_all_events_nonprod view to simulate the process we will need to follow for prod * added date filters to the nonprod_fxa_all_events view as requested by akkomar and updated the metadata * added the new nonprod_fxa_server_events_v1 table to dry run skip due to permissions * improved the comment about deleting a view as requested by akkomar * tweaked date filtering as requested by srose * pulled nonprod_fxa schema from DENG-1006-fxa-log-fields * added schema.yaml for nonprod_fxa_server_events_v1 * deleted init.sql and added clustering config to metadata.yaml instead * added AS as requested by srose * fixed yaml lint errors * added the ability to pass end_date param into Airflow task * updated nonprod_fxa queries and schema for fxa_server_events_v1 as requested by srose, this query also pulls data for stout which now has end date * regenerated bqetl_fxa_events DAG * renamed fxa_log to fxa_server as agreed on with srose * reverted merging of the stdout and server event etls due to incompatible schemas * removed changes related to task level end_date * removed date filter for stdout events * undoing test changes * added country to fxa_server_events_v1 schema * tweaked selected ordering as requested by srose and updated comments and metadata.yaml	2023-06-21 10:09:28 +00:00
Winnie Chan	0192e9a542	DENG-1021 Added destination table param to query commands (#3951 ) * Added destination param * Updated deploy help	2023-06-20 16:17:50 +00:00
Anna Scholtz	03d55819dc	[Bug 1821767] Speed up table deploys and schema updates (#3967 ) * Speed up schema update * Speed up schema update * Sort and update schemas in parallel * Update sql/moz-fx-data-shared-prod/telemetry_derived/clients_daily_v6/metadata.yaml Co-authored-by: Daniel Thorn <dthorn@mozilla.com> --------- Co-authored-by: Daniel Thorn <dthorn@mozilla.com>	2023-06-19 23:27:45 +00:00
Frank Bertsch	1418e2018b	Fixes to Android attributable_clients (#3802 ) * Create new Fenix attributable_clients table Further updates to attributable clients - Handle clients who were only _activated_ on that day - Separate facts/dimensions - Rename some things - Add metadata about why a client is present - Limit new_activations to just activated clients - Rename client_count field - Include submission_date in activation join - Move to v2 * Add DAG * Add schema file * Move some joins to view; add initialization 1. Move attribution & activation joins to the view. This lets us immediately access updates to those tables, rather than re-materializing this table on changes there. 2. Add the capability to init from a query file. This uses an `is_init` jinja function, which is only set to True when run from `bqetl query initialize`. * Use dict for default template vars * Add default for addl_templates * Reformat files * Update view * Regenerate DAG * Keep metadata field in view	2023-06-16 19:57:57 +00:00
Sean Rose	b636f20235	Don't dryrun the `subscription_platform_derived.stripe_subscriptions_changelog_v1` ETL. (#3958 ) The CI account doesn't have permission to access the tables in the `stripe_external` dataset.	2023-06-16 18:07:18 +00:00
Sean Rose	a50e36ddc8	Format Jinja blocks like SQL blocks. (#3952 )	2023-06-15 23:42:28 +00:00
Glenda Leonard	953529a3a5	Process subsequent checks for a table if a prior check fails for that table (#3943 ) * Process subsequent checks for a table if a prior check fails for that table. * Updated to use sqlparse to parse checks.sql.	2023-06-15 21:12:08 +00:00
Anna Scholtz	25a20bdfbf	Regex for matching UDF names (#3949 )	2023-06-15 18:40:16 +00:00
Sean Rose	faf4dc8269	Dryrun date param fixes (#3942 ) * Always rewrite dryrun date query params as `submission_date`. * Quote date partition column in dryrun to get schema.	2023-06-14 21:31:18 +00:00
Linh Nguyen	d82acc1856	Simplify GLAM template for getting the latest version (#3933 ) * Simplify GLAM template for getting the latest version * Add comment about using buildhub2 data for Fenix	2023-06-14 20:03:01 +00:00
Eduardo Filho	d9c68a48d1	glam: Partition clients_histogram_aggregates by sample_id (#3868 ) * glam: Partition clients_histogram_aggregates by sample_id (has been running like this since April 3 from a different branch) * glam: add description and eol to init * glam: Partition clients_histogram_aggregates by sample_id (has been running like this since April 3 from a different branch) * glam: add description and eol to init * add init.sql to missing tbls * Add schema.yaml * increase ci output timeout to 30m * remove init.sql to prevent ci from trying to derive schema from it and break * Fix schema.yaml files * Revert output timeout to default	2023-06-14 16:37:59 +00:00
Glenda Leonard	c69fee0b5f	DENG-941 initial impl of check rendering and execution. (#3885 ) * initial impl * Updated based on PR feedback * Moved check from query to separate command * Expanded from --partition option to generic --parameter option * Removed `query check` command (check moved to new command) * Update bigquery_etl/cli/check.py remove date param format check Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Removed 'parameter' parameter, everything is passed through ctx.args and then converted to a dict for Jinja rendering. There are no restrictions on ctx.args values. * Merge error --------- Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2023-06-13 19:31:59 +00:00
Daniel Thorn	65365226b5	Don't deduplicate query arguments (#3935 )	2023-06-13 17:34:00 +00:00
kik-kik	b2a06b8779	if --parameter passed we set use_legacy_sql option to False by default and bq called with query by default if not explicitly passed in the bqetl query run command (#3922 )	2023-06-13 08:53:03 +00:00
Winnie Chan	b9d01ca959	DENG-990 Refractor backfill cli commands (#3924 ) * Refractored backfill cli commands * Adjusted validate command	2023-06-12 17:24:35 +00:00
Sean Rose	02afdfb443	Ignore comments when detecting dependency table names. (#3927 ) Otherwise the names of unaliased table references followed by a comment will incorrectly include the comment as part of the name.	2023-06-10 20:20:02 +00:00
Winnie Chan	58c96b4246	DENG-815 Add backfill info cli command (#3915 ) * Added backfill info command * Update bigquery_etl/cli/backfill.py Co-authored-by: Alexander <anicholson@mozilla.com> * Fixed status click choice * Added backfill str method * Added new backfill utils files * Update bigquery_etl/cli/backfill.py Co-authored-by: Alexander <anicholson@mozilla.com> * Update bigquery_etl/cli/backfill.py Co-authored-by: Alexander <anicholson@mozilla.com> * Removed status default --------- Co-authored-by: Alexander <anicholson@mozilla.com>	2023-06-09 17:32:32 +00:00
Sean Rose	b7b7c23913	Preserve the order of column schema properties in `schema.yaml` files. (#3923 ) When using `bqetl query schema update` to create a new `schema.yaml` file, BigQuery returns the column schema properties in a sensible order (`name`, `type`, `mode`, `fields`), but our `schema.yaml` output has been sorting those properties alphabetically which makes it much less readable. Also, when using `bqetl query schema update` to update an existing `schema.yaml` file, this will now preserve whatever order the column schema properties were in.	2023-06-09 16:15:40 +00:00
Curtis Morales	eb02488f34	Fix google sheets metadata and change from "google_sheet" to "google_sheets" for consistency with google (#3914 )	2023-06-07 19:25:31 +00:00
Linh Nguyen	d6a55664d0	Revert "Simplify GLAM template for getting latest versions (#3880 )" (#3908 ) This reverts commit `8ad45a0592`.	2023-06-06 18:41:03 +00:00
kik-kik	71e7201e65	feat(): added support for `--log-level` to bqetl query command and using logging instead of print() (#3891 ) * added support for --log-level to bqetl query command and updated print statements to be log statements * now --log-level flag is a bqetl global flag * fixing linter errors * Update bigquery_etl/cli/__init__.py Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Update bigquery_etl/cli/__init__.py Co-authored-by: Anna Scholtz <anna@scholtzan.net> * fixed indentation of --log-level option --------- Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2023-06-06 09:16:41 +00:00
Sean Rose	1d3030e698	Remove ZetaSQL kludges. (#3898 ) ZetaSQL was removed in #3755.	2023-06-05 18:03:07 +00:00
Sean Rose	1a527d743e	Fix `bqetl stage` table ID quoting (#3899 ) * Fix `bqetl stage` ID quoting. Quoting the entire table ID breaks cases where an unaliased table name is used to qualify a column reference. * Have `bqetl stage` preserve fully quoted references. * Simplify regular expressions for fully quoted references. * Compile all reference replacement regular expressions for performance.	2023-06-05 16:24:14 +00:00
Linh Nguyen	8ad45a0592	Simplify GLAM template for getting latest versions (#3880 ) * Simplify GLAM latest version template * Use buildhub2 table instead --------- Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-05 14:23:21 +00:00
Alexander	5330dd19da	Move schema and initialization logic for firefox_android_clients_v1 to metadata (#3893 ) * Move schema and initializing logic for firefox_android_clients_v1 to metadata * bqetl query schema update * Stage table on init.sql change as well	2023-06-05 13:14:03 +00:00
Sean Rose	c70a17144a	Save current SubPlat ETL views logic in versioned ETLs (DENG-973) (#3883 ) * Save current SubPlat ETL views logic in versioned ETLs (DENG-973). * Add `incremental` labels to the new tables. * List all CJMS ETLs to dryrun-skip rather than using `glob`. The `glob` approach doesn't currently work well with the CI staging process.	2023-06-02 19:53:49 +00:00
Lucia	dd4789c8aa	DENG-970 Only Glean in Focus Android view. (#3877 ) * DENG-970 Only Glean in Focus Android view. * DENG-970 Only Glean in Focus Android view. * DENG-970 Only Glean in Focus Android view. * DENG-970 Only Glean in Focus Android view. * DENG-970 Only Glean in Focus Android view. * DENG-970 Only Glean in Focus Android view. * DENG-970 CI fix * DENG-970 CI failure fix. Related to issue 3889. * Fix UDF dependencies deploy on stage * DENG-970 Revert specific calling to dataset for UDF. --------- Co-authored-by: Lucia Vargas <lvargas@mozilla.com> Co-authored-by: Brad Ochocki <brad.ochocki@gmail.com> Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2023-06-02 16:37:40 +00:00
Sean Rose	f52700dcfe	Format transaction statements properly (#3892 ) * Format transaction statements properly. * Test transaction statement formatting.	2023-06-02 16:36:11 +00:00
Alexander	19bcffa8f7	During stage don't rename test dependencies that have already been renamed (#3890 )	2023-06-02 15:13:38 +00:00
skahmann3	3c1cbf5a98	[RS-727] Add Sponsored Tiles server-side fill rate telemetry (#3872 ) * Create table * Create a view for sponsored_tiles_ad_req_fill * skip the view from deploy stage CI check * Delete metadata.yaml * delete query.sql --------- Co-authored-by: Alekhya Kommasani <akommasani@mozilla.com> Co-authored-by: Alekhya <88394696+alekhyamoz@users.noreply.github.com>	2023-06-01 13:56:58 -04:00
Winnie Chan	071c53e4cb	DENG-803/805: Create & Validate backfill cli commands (#3760 ) * Added backfill create and validate cli ommand --------- Co-authored-by: Alexander <anicholson@mozilla.com> Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>	2023-06-01 10:06:09 -07:00
Marlene Hirose	c08f21c2d5	add csv recognition to tooling (#3881 ) Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2023-06-01 09:45:44 -07:00
Sean Rose	3f4f5a7f94	Increase task name limit from 62 characters to 250 characters (#3876 ) The 62 character limit was due to a Kubernetes pod label limit, which has been worked around as of Airflow 2.0.1.	2023-06-01 09:13:38 -07:00
Anna Scholtz	94d28a329f	Review for #3787 (#3791 ) * Bug 1823627 - Normalize the channel based on probeinfo data in UNIONized views * Handle fenix channel normalization for app pings * Parallelize stage schema deploys * Fix schema field order --------- Co-authored-by: Jan-Erik Rediger <jrediger@mozilla.com>	2023-06-01 07:22:27 -07:00
kik-kik	4af8068912	added new fxa tables to shredder config (#3871 )	2023-06-01 14:07:21 +02:00
kik-kik	46daa24670	feat(DENG-789): making apple ads data accessible (#3847 ) * added apple_ads_derived for copying over apple_ad data from the fivetran dataset, and apple_ads views now read from it * added bqetl_fivetran_apple_ads.py DAG responsible for copying apple_ads data from the fivetran project over to moz-fx-data-shared-prod * now dryrun skips apple_ads_derived instead of apple_ads as the query now accesses restricted dataset * added schema files for apple_ads_derived datasets * added descriptions to schema.yaml files for apple_ads_derived namespace * added dataset_metadata for apple_ads_derived to include a link to the dbt transformations * fixed apple_ads view definitions * removed application label and referenced_tables section inside metadata.yaml for apple_ads as requsted by srose in PR#3847 * corrected source project for apple_ads views * renamed apple_ads_derived to apple_ads_external * added * to apple_ads_external namespace name to skip in the dryrun due to integration test deployment * made tweaks to apple_ads and apple_ads_external datasets/namespaces as requested by whd * updated apple_ads_external skip rule to the way it is meant to be defined, this will work once a fix is rolled out for dryrun * fixed dag bqetl_fivetran_apple_ads description and updated the schedule to run once a day	2023-05-26 17:32:19 +02:00
Lucia	38731a440b	RS-722 Remove task name from printed message in DAG generation (#3859 ) * RS-722 Remove task_name from dag generation when it is not available. * RS-722 Reformat files. --------- Co-authored-by: Lucia Vargas <lvargas@mozilla.com>	2023-05-25 12:07:41 -04:00
Anna Scholtz	46397f42b5	Deploy UDF references to stage for views (#3849 )	2023-05-23 13:10:40 -07:00
Leli	f17ba7ad25	add input and output parameters to udf and stored procedure documenattion (#3843 ) * rename udf_functions to mozfun_doc_functions and add edgecases * refactor generate_mozfun_docs * add parameters to stored procedures * bolden input and output	2023-05-23 18:20:15 +02:00
Mike Williams	bba1758932	fix udf parsing for docs (#3821 )	2023-05-22 13:49:31 -04:00
Mike Williams	10f9994901	fix #3756 : parse inputs and outputs from UDFs for docs (#3778 ) * formatting * refactor parsing * tweaks to udf parser utils for edge cases * refactor UDF input/output parser to use utils; formatting * add check for edge case in output regex Co-authored-by: Leli <33942105+lelilia@users.noreply.github.com> * fix test for udf output util * remove inconsistent formatting of inputs/outputs --------- Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com> Co-authored-by: Leli <33942105+lelilia@users.noreply.github.com>	2023-05-22 12:45:29 -04:00
kik-kik	6fa22aff62	added apple_ads namespace views to dryrun skip as they reference fivetran project (#3812 )	2023-05-19 18:33:29 +02:00
Daniel Thorn	a0d810275b	Remove java dependency in favor of sqlglot (#3755 )	2023-05-17 14:56:42 -07:00
Mike Williams	56a8f2cbb5	utility functions to get udf inputs and outputs (#3785 ) * 3756_function_and_tests_to_parse_UDF * 3756 change regex * code review suggestions and typing * add none input test * Issue 3756. Add tests and documentation for clarity. * take care of procedures --------- Co-authored-by: Lucia Vargas <lvargas@mozilla.com> Co-authored-by: Leli Schiestl <lschiestl@mozilla.com>	2023-05-17 15:21:22 +02:00
Alexander	5f66823383	Add error message for unmatched query files on bqetl query intialize (#3779 ) Closes https://github.com/mozilla/bigquery-etl/issues/1776	2023-05-12 13:35:55 -04:00
Alexander	1c3ba13b40	Skip non-emails in owner labels (#3763 )	2023-05-11 17:02:46 -04:00
Sean Rose	dbab17881f	Parallelize SQL test table/view loads. (#3776 ) Reduces `test-sql` CI job runtime from ~8 minutes to ~5 minutes.	2023-05-11 13:16:00 -07:00
Frank Bertsch	6146d9cdcf	DENG-871 - Add installs_by_country for Fenix (#3767 ) * Add installs_by_country for Fenix * Dont dryrun the query * Dont dryrun view * Add chanel to query	2023-05-08 22:24:10 -04:00
richard baffour	e5aede7947	Firefox mobile installs from Adjust (#3735 )	2023-05-03 21:07:16 -07:00
wil stuckey	9ab5bcbf3e	Partition templates are not using `$` to separate partition designation. (#3757 )	2023-05-03 16:05:27 -05:00
Alekhya	efcb527f33	RS 682 - Bump the derived table versions for `payload` tables (#3752 ) * BBump the version of payload tables * Update the DAG	2023-05-02 12:49:26 -04:00
Lucia	cdedba409f	Deng 847 remove metadata validation for sql generarated files (#3739 ) * DENG-847 Remove the metadata validation for sql_generated files, because the validation needs to be implemented using a CODEOWNERS file in the main branch as well ans sql files. Currently files exist in the generated-sql branch. --------- Co-authored-by: Lucia Vargas <lvargas@mozilla.com>	2023-04-27 07:59:24 -07:00
Alekhya	41d44769ae	RS 682 : Set up data table for AMP impressions and clicks reporting (#3733 ) * add required datasets and udfs * add tiles request payload * Incorporated review feedback	2023-04-26 19:30:25 -04:00
Glenda Leonard	a31072d408	DENG-775 downloads_with_attribution_v2 (#3716 ) * DENG-775 Added session_id to JOIN between GA data and stub_attr.stdout. Also expanded date range on GA session data to [download_date - 2 days, download_date + 1 day] * Updated query to handle missing GA download_session_id. It effectively applies V1 logic to the MISSING_GA_CLIENT dl_tokens.	2023-04-26 14:47:34 -04:00
akkomar	9a9376c6b7	Validate table references when deploying views (#3721 )	2023-04-17 20:49:56 +02:00
Lucia	c34653778f	DENG-774 Change Control for active_users_aggregates (#3687 ) * DENG-774 Add change control to active_users_aggregates and test. * DENG-774 Add test coverage. --------- Co-authored-by: Lucia Vargas <lvargas@mozilla.com>	2023-04-14 15:49:19 +02:00
wil stuckey	034e7d8426	Add support for `table_partition_template` in dag task generation (#3710 ) * Update domain metadata dag. * Remove from triage with tags * Remove telemetry-alerts email * Add date formatting for monthly partition id * Add support for `table_partition_format` in dag generation * Don't add partition format if there's already a destination table * use the correct name * Add partition templates for all time partitioning types * lint fixes * more docs * update all dags to include `table_partition_template` parameter. * don't set if we have a partition offset * don't add the parameter for the default 'day' partitioning scheme	2023-04-12 10:28:19 -05:00
Anna Scholtz	381f95cc15	Add support for --sql-dir option (#3708 )	2023-04-06 15:23:37 -07:00
Anna Scholtz	4e98a40b1e	[Bug 1826618] bqetl status checks (#3706 ) * Add check for running SQL query * Add check for monitoring python script runs	2023-04-05 15:24:09 -07:00
Anna Scholtz	48d8c7603d	Metric hub integration - rewrite SSL ratios to use metrics (#3698 ) * Add metrics.data_source() * Rewrite SSL ratios to use metrics * Fix docs formatting	2023-04-04 15:41:44 -07:00
Anna Scholtz	10cbb52126	Metric-hub integration (#3696 ) * Metric-hub integration * Add metrics.data_source()	2023-04-04 09:19:03 -07:00
Anna Scholtz	3d11766974	Jinja related improvements (#3693 )	2023-03-30 13:29:29 -07:00
Anna Scholtz	08b45a40fe	Jinja queries support (#3691 ) * Support Jinja templating in query files * Formatting for Jinja * ./bqetl query render command * Fix running templates	2023-03-30 11:00:12 -07:00
kik-kik	51def19185	Bug 1825545 - Revert "Support Jinja templating in query files (#3685 )" (#3689 ) Bug 1825545 - This reverts commit `a1c51124ec`.	2023-03-30 10:47:17 -04:00
Anna Scholtz	a1c51124ec	Support Jinja templating in query files (#3685 ) * Support Jinja templating in query files * Formatting for Jinja * ./bqetl query render command	2023-03-29 10:38:08 -07:00
Glenda Leonard	b13f45bc63	DENG-658 - Initial table definitions for dl_token processing. (#3644 ) * Initial table definitions for dl_token processing. Includes update to sql pytest_plugin to account for tablenames with date suffixes. * Removed cluster reference and shortened description * Added sql/moz-fx-data-marketing-prod/ga_derived/downloads_with_attribution_v1/query.sql to dryrun skip * Added time_on_site * Moved country_names sample test data file. * Update bigquery_etl/pytest_plugin/sql.py Co-authored-by: Daniel Thorn <dthorn@mozilla.com> * Update sql/moz-fx-data-marketing-prod/ga_derived/downloads_with_attribution_v1/query.sql Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com> * Update sql/moz-fx-data-marketing-prod/ga_derived/downloads_with_attribution_v1/query.sql Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com> * Updated based on PR feedback. Added LEFT JOIN to ensure sessions without pageviews are not dropped. * Set has_ga_download_event = null if exception=GA_UNRESOLVABLE * Standardized logic for time_on_site * - Added test for multiple downloads for 1 session - Added detailed description of table. * Updated to use mode_last_retain_nulls instead of ANY_VALUE * Set pageviews, unique_pageviews = 0 if null. * Added boolean additional_download_occurred to indicate if another download occurred in the same session. --------- Co-authored-by: Daniel Thorn <dthorn@mozilla.com> Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>	2023-03-23 16:45:58 -04:00
Daniel Thorn	100df65bb1	Allow udf.pseudonymize_ad_id to be published outside shared-prod (#3683 )	2023-03-23 10:00:45 -07:00
Curtis Morales	a4485fe696	Add clients_first_seen_v1 to dry run skip list (#3676 )	2023-03-22 12:05:56 -04:00
kik-kik	ed1317cb7d	bug(): added org_mozilla_firefox_derived/client_deduplication_v1 to dry run skip (#3667 ) * added org_mozilla_firefox_derived/client_deduplication_v1 to dry run skip * fixing fxa_users_daily_v1 test	2023-03-16 15:42:04 +00:00
Anna Scholtz	ba6a281eef	Fix staging UDFs (#3649 )	2023-03-09 09:49:02 -08:00
Anna Scholtz	06d7baa367	Skip client_deduplication view validation (#3645 )	2023-03-08 13:23:01 -08:00
Winnie Chan	33a7ab0921	DENG-590 Added owner to labels metadata (#3637 ) * Added owner to labels metadata * Update bigquery_etl/metadata/parse_metadata.py Co-authored-by: Alexander <anicholson@mozilla.com> --------- Co-authored-by: Alexander <anicholson@mozilla.com>	2023-03-06 15:08:50 -08:00
Sean Rose	88730fa005	Add Hubs subscriptions ETL (DENG-681) (#3632 )	2023-03-03 15:54:17 -08:00
Anna Scholtz	bb23742800	Fix stage identifiers - only encapsulate project name in "`" (#3636 ) * Fix stage identifiers - only encapsulate project name in "`" * Update bigquery_etl/cli/stage.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * Update bigquery_etl/cli/stage.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * Update bigquery_etl/cli/stage.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * Update bigquery_etl/cli/stage.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> --------- Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>	2023-03-03 15:38:20 -08:00
Marlene Hirose	7e9d71a21d	change defautl="analysis" to default="tmp" to stop creation of extraneous baseline_clients_daily_v1_xxxxx tables (#3630 )	2023-03-02 12:47:54 -08:00
Anna Scholtz	c5059f65e7	Fix UDF test regex matching (#3620 )	2023-02-28 15:10:18 -08:00
Anna Scholtz	35212e814d	Fix publishing UDFs with fully qualified identifiers (#3617 )	2023-02-27 13:53:28 -08:00
Anna Scholtz	fdb4dc51a3	Fix stage deploy test file names (#3612 ) * Rename test files that got deployed to stage * Ignore stage deploys based on dryrun skip by default	2023-02-27 09:09:55 -08:00
Daniel Thorn	f98b73f45a	Fix shredder queries skipping additional deletion requests tables (#3606 )	2023-02-22 16:20:24 -08:00
Anna Scholtz	15b0794799	Fix deploy dataset naming (#3605 )	2023-02-22 12:56:33 -08:00
Daniel Thorn	cbe38e314b	Support glean apps with additional deletion requests table in shredder (#3601 )	2023-02-22 09:59:03 -08:00
Anna Scholtz	3362ea470a	Deploy changes to stage env (#3587 ) * Deploy changes to stage env * Test deploy * Remove tests queries * Add CLI option to remove updated artifacts	2023-02-22 09:49:01 -08:00
Brad Ochocki	e2714bc603	Add bootstrap_percentile_ci UDF (#3517 ) * add bootstrap_percentile_ci UDF * updates Accidentally pushed an old version of the code. * add qbinom.js * remove trailing spaces * Update sql/moz-fx-data-shared-prod/udf_js/bootstrap_percentile_ci/udf.sql Co-authored-by: Anna Scholtz <anna@scholtzan.net> * remove `numericSort` on `full_histogram` The sorting step is computationally expensive on histograms with large numbers of elements, and is also unnecessary if we can ensure that the histograms themselves are sorted. Sorting a histogram is much cheaper. * performance enhancements A previous version of the code was running into memory issues with large histograms. This commit aims to be more memory efficient. - Instead of sorting arrays, it only ever sorts arrays, which are orders of magnitude smaller. - It only expands a single histogram (the histogram of binomial samples) into an array. This array has a fixed-width (10k elements) for every input histogram, so it should have consistent memory performance across different metrics. * debug typos * remove unnecessary code * improve readability Change a some variable names in an attempt to make the code less confusing to read. * handle histograms with zero counts A few changes in this commit: - Handle histograms with zero counts. I think this was implicitly handled in the previous version of the algorithm, but there wasn't a test for that case, so I didn't catch that the more space-efficient version of the algorithm wasn't handling it properly. - Add a test case for histograms that have zero observations but are validly constructed. - Minor formatting changes for readability. * Formatting * Fix tests * make UDF work with normalized histograms - Adds a new `scale` constant to simulate continuous values drawn from the discrete binomial distribution. - Other changes to support working with non-integer count values while maintaining constant space/time complexity. * fix failing tests - use `assert.array_empty` instead of `assert.null` for empty array tests - normalize all histogram values - revert the histogram property `VALUES -> values` in the function definition so that the property is can be found. * Reformat --------- Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2023-02-10 16:06:44 -08:00
kik-kik	67ac75a04b	:dded description for triage/record_only tag (#3576 )	2023-02-10 10:00:31 -05:00
Alexander	f61017544e	Update bqetl format messages (#3583 )	2023-02-09 16:36:10 -05:00
Daniel Thorn	ac053c326a	Update dependencies missed by dependabot (#3566 )	2023-02-06 12:14:32 -08:00
Sean Rose	2d84a1d3b7	Change `bqetl format` to improve readability of `CASE` statements (#3546 ) * Indent `WHEN` and `ELSE` clauses one level more than `CASE`. * Indent `THEN` clauses one level more than the corresponding `WHEN` clause. * Have the content of `WHEN`, `THEN`, and `ELSE` clauses start on the same line as the clause keyword. * Allow an alias, comma, or dot right after a `CASE` statement's `END`.	2023-02-03 14:35:59 -08:00
Anna Scholtz	20f764835f	Add --use_cloud_function option when generating SQL queries (#3565 )	2023-02-03 14:10:04 -08:00
Anna Scholtz	30d7507f02	[Bug 1712332] Generate UNIONed app ping views (#3545 ) * Generate UNIONed app ping views * Address review feedback	2023-02-01 14:00:06 -08:00
Curtis Morales	636b150442	search_revenue_levers_daily_v1 table (#3534 ) * add new data table search_revenue_levers_daily * Update sql/moz-fx-data-shared-prod/search_derived/search_revenue_levers_daily_v1/query.sql Co-authored-by: Curtis Morales <cmorales@mozilla.com> * Update sql/moz-fx-data-shared-prod/search_derived/search_revenue_levers_daily_v1/query.sql Co-authored-by: Curtis Morales <cmorales@mozilla.com> * Update sql/moz-fx-data-shared-prod/search_derived/search_revenue_levers_daily_v1/query.sql Co-authored-by: Curtis Morales <cmorales@mozilla.com> * Update sql/moz-fx-data-shared-prod/search_derived/search_revenue_levers_daily_v1/query.sql Co-authored-by: Curtis Morales <cmorales@mozilla.com> * Update sql/moz-fx-data-shared-prod/search_derived/search_revenue_levers_daily_v1/query.sql Co-authored-by: Curtis Morales <cmorales@mozilla.com> * Restore deleted tasks * Change DOU to DAU * Add Bing cohort and tidy query * Format * Add schema file for search_revenue_levers_daily * Add back check for distribution_id and fix tagged_follow_on * Update dags * One more generated task * Be explicit in group by and order * Fix comment * Delete commented out line * Format * Add acer_cohort table to moz-fx-data-shared-prod * Rename query to init.sql * Update acer cohort query * Delete no longer needed comment * Drop references to mozdata * Update sql/moz-fx-data-shared-prod/search_derived/acer_cohort_v1/init.sql Co-authored-by: akkomar <akkomar@users.noreply.github.com> * Update sql/moz-fx-data-shared-prod/search_derived/acer_cohort_v1/metadata.yaml Co-authored-by: akkomar <akkomar@users.noreply.github.com> * Update schema * Remove clustering from table --------- Co-authored-by: Xuan Luo <xluo@mozilla.com> Co-authored-by: akkomar <akkomar@users.noreply.github.com>	2023-02-01 14:29:49 -05:00
Leli	8b0158c8dc	Add fivetran_costs (#3509 ) * Add fivetran_costs * Add fivetran_costs - adding schemas to all tables and adressing other suggestions * Add fivetran_costs - renaming tables, adding to the schema * Add fivetran_costs - adding fivetran-dev * Add fivetran_costs - adding tests * Add fivetran_costs - adding tests * Add fivetran_costs - adding tests * Add fivetran_costs - adding tests * implementing suggestions * rerun dag creation	2023-01-31 16:27:29 +01:00
Sean Rose	a46678ed96	Change `bqetl format` to uppercase built-in function names (#3536 )	2023-01-26 16:07:10 -08:00
Anna Scholtz	22e54ccb5f	[Bug 1812301] Publish only string typed labels (#3530 ) * [Bug 1812301] Publish only string typed labels * Document label publishing	2023-01-26 09:24:49 -08:00
Anna Scholtz	031320dfcf	Move check for dags being up to date after DAGs have been generated (#3508 )	2023-01-17 08:49:29 -08:00
Anna Scholtz	8e2210525c	Add --skip-dryrun option for CLI validation commands (#3505 )	2023-01-16 11:07:01 -08:00
Sean Rose	550752cf6f	Fix bug generating task dependencies when queries reference stable views in `mozdata`. (#3504 )	2023-01-13 16:07:18 -08:00
Sean Rose	495ddcf39f	Correct BigQuery partitioning/clustering metadata in ETL `metadata.yaml` files (#3500 ) * Add missing BigQuery partitioning/clustering metadata. * Correct existing BigQuery partitioning/clustering metadata. * Allow partition `field` metadata field to be omitted. * List partition `type` metadata field first.	2023-01-12 13:58:53 -08:00
Chelsea Troy	b98da3cfe6	Add and incrementally populate a table for google ads campaign cost metrics (#3468 ) * Add and incrementally populate a table for google ads campaign cost metrics * Register dag in dags.yaml * make two strings match that apparently have to match * Consistentify another thing in the dag * Reformat a sql file * Update sql/moz-fx-data-shared-prod/fenix_derived/google_ads_campaign_cost_breakdowns_v1/metadata.yaml Add update dependency on upstream table Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com> * Small adjustments apropos of review * Attempt to fix verify-dags-up-to-date * Evidently there are too many blank lines in the metadata file * Stop dry running an access denied table Also raise the line length limit on the linter, which otherwise prevented the needed change * represent micros more accurately * Update dags.yaml to include Frank as maintainer Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com> * rename ad_clicks * Update sql/moz-fx-data-shared-prod/fenix_derived/google_ads_campaign_cost_breakdowns_v1/query.sql Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com> * Add conversions to columns of output table * Group stats table by campaign id and date * Add revenue generating ad clicks and disambiguate that number from marketing ad clicks * SO IT TURNS OUT, campaigns with the same ID can have different names over time. This associates the appropriate names with each id so we can sum up metrics by campaign rather than by NAME of campaign. * ./bqetl format /Users/chelseatroy/mozilla/bigquery-etl/sql/moz-fx-data-shared-prod/fenix_derived/google_ads_campaign_cost_breakdowns_v1/query.sql * Document the fenix campaign identifier * ./bqetl format * ./bqetl dag generate bqetl_campaign_cost_breakdowns * ./bqetl dag generate bqetl_org_mozilla_firefox_derived Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com> Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>	2023-01-11 15:58:10 -06:00
Yeonjoo Yoo	93df05550d	(DS-2284, DS-2520)Creating Relay derived tables and views for Relay SaaSboard (#3482 ) * relay dataset and data tables are created * added country_name columns to all tables * reformat query files for circle ci error fixing * Schema edited and dry run list updated * removing status column in multiple places * matching date_partition_offset setting with the table Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * keep the consistency of friendly_name Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * removing unnecessary comment * subscriptions query updated * renaming a CTE Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * subscriptions_v1 schema editted * add "relay-phones" to be included for filtering Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>	2023-01-10 17:54:34 -05:00
Sean Rose	589e9192e7	Allow DAG repo to be configured. (#3488 ) To support more accurate DAG generation for the private-bigquery-etl repo.	2023-01-06 09:12:23 -08:00
kik-kik	6ee44ea8c5	added docker_fxa_admin_server_sanitized_v1/init.sql to dryrun skip list as ci gets table view error	2023-01-06 14:16:27 +01:00
Frank Bertsch	175ebb0a29	Add Campaign Conversions by Date table (#3484 ) * Create campaign_conversions_by_date table * Update query for dupes; dont dryrun * Remove ORDER BY; update description * Update DAG	2023-01-05 16:15:16 +01:00
Sean Rose	02b7e2a37a	Increase Fivetran sensors' poke interval to 30 seconds (#3481 ) Using a poke interval of 5 seconds means only 7 Fivetran sensors polling for an hour would exceed Fivetran's rate limit of 5,000 API requests per hour. Increasing the poke interval to 30 seconds will allow for up to 41 Fivetran sensors polling for an hour.	2023-01-04 11:23:42 -08:00
Sean Rose	d76408b314	Retry Fivetran syncs when they fail (#3480 )	2023-01-04 11:01:17 -08:00
Sean Rose	e1eb6df342	Use the `airflow-provider-fivetran` package instead of our backport. (#3479 ) And use the `airflow-provider-fivetran` package's new feature to pass the return value from the Fivetran operator to the Fivetran sensor via XCom so the sensor doesn't miss syncs that finish before it can check.	2023-01-03 15:52:13 -08:00
Sean Rose	4ef732d035	Allow default `repo/bigquery-etl` DAG tag to be overridden. (#3471 )	2022-12-16 15:21:48 -08:00
Sean Rose	7fe6e1462b	Fix DAG generation error when dependency task ID contains dashes. (#3470 )	2022-12-16 14:38:10 -08:00
Daniel Thorn	e0785b1cf3	DS-2184 - Handle Apple subscriptions migration from Guardian to SubPlat (#3329 )	2022-12-16 09:44:05 -08:00
Daniel Thorn	bfc996d265	Add support for publishing specific routines (#3460 )	2022-12-14 15:13:41 -08:00
Anna Scholtz	246fc98015	Remove origin_content_blocking view (#3455 )	2022-12-13 10:43:48 -08:00
Anna Scholtz	e9ea22a7a7	Don't stop updating and deploying schemas if one has failed (#3449 )	2022-12-12 14:58:32 -08:00
haroldwoo	29907d0792	Replace references of deprecated task_sensor with upstream airflows externaltasksensor (#3447 )	2022-12-12 12:20:32 -08:00
Anna Scholtz	850e4fde9d	Pass respect_dryrun_skip parameter to Schema.for_table (#3443 )	2022-12-08 17:40:54 -08:00
Anna Scholtz	cd48f7c09c	Pass --use_cloud_function to get_schema_from_table (#3442 )	2022-12-08 16:45:30 -08:00
Daniel Thorn	6fb2c91a1d	Trim semicolons when comparing view sql (#3439 )	2022-12-08 15:06:04 -08:00
Daniel Thorn	f1699b21ed	Add `bqetl views clean` command (#3433 ) to remove managed views from BigQuery when they are removed from sql dir	2022-12-08 14:48:03 -08:00
Daniel Thorn	c807de71c8	Add missing condition for target project in view.has_changes (#3437 )	2022-12-08 14:03:22 -08:00
Daniel Thorn	e40f43bfed	Use target_view not view_identifier when updating labels (#3431 )	2022-12-08 11:19:23 -08:00
whd	42d48be752	Rename views directories to match convention (#3413 )	2022-12-08 10:51:34 -08:00
Daniel Thorn	9e52d19909	Ignore missing view schema files (#3429 )	2022-12-08 10:27:16 -08:00
Daniel Thorn	cf6ce002d3	Publish views serially and default to only publishing changes (#3424 )	2022-12-08 10:17:01 -08:00
Anna Scholtz	64e6613b83	Support multiple source_uris for external data	2022-12-07 15:28:18 -08:00
Anna Scholtz	413efe0d6c	Migrate table configurations for notes	2022-12-07 15:28:18 -08:00
Anna Scholtz	edfb911fcc	Migrate monitoring_distinct_docids_notes_v1	2022-12-07 15:28:18 -08:00
Anna Scholtz	0ec61ba3a0	Add support for external data	2022-12-07 15:28:18 -08:00
Anna Scholtz	dc69069939	Prevent bqetl query schema deploy to deploy views with schema.yaml	2022-12-07 14:12:48 -08:00
Daniel Thorn	52be263f2e	Add option to skip publishing authorized views (#3418 )	2022-12-07 10:53:46 -08:00
Anna Scholtz	4914757952	Set project when dryrunning not with cloud function	2022-12-07 10:32:04 -08:00
Anna Scholtz	921a3ca019	Allow deploys for tables without metadata.yaml	2022-12-07 09:19:47 -08:00
Daniel Thorn	ddbabeef05	Publish views in order by dependencies (#3411 )	2022-12-06 17:16:15 -08:00
Anna Scholtz	bb84cb0eb5	Use graphlib topological sort for determining derived_from dependencies	2022-12-06 17:01:35 -08:00
Anna Scholtz	6f795228e5	bqetl update all query schemas and deploy new tables	2022-12-06 17:01:35 -08:00
Sean Rose	32ab51ba83	Only merge descriptions from Glean stable table schemas to views (#3391 )	2022-12-01 09:50:01 -08:00
Eduardo Filho	26413d5ecd	[Bug 1803344] GLAM - use firefox-desktop Glean metadata to build fog queries (#3386 )	2022-12-01 13:39:08 +01:00
Daniel Thorn	c281400486	Enforce isort via pytest (#3384 )	2022-11-30 11:45:05 -08:00
Alekhya	be655cd7a1	Make ./bqetl query schema deploy work for Python scripts without --force (#3374 ) * Fixes #3368 - Make schema deploy work without --force * fix format issues * fix the cli/query.py * fix black format issues	2022-11-23 10:12:11 -05:00
Sean Rose	38578a5f08	Handle Stripe subscription plan changes (DS-2160) (#3289 )	2022-11-08 10:45:35 -08:00
Curtis Morales	1c415bec38	Accept --dry-run as a flag in `bqetl query` commands (#3339 )	2022-11-07 16:35:10 -05:00
Alekhya	b980d2b17d	fix tests (#3336 )	2022-11-07 09:45:00 -05:00

... 3 4 5 6 7 ...

1440 Коммитов