bigquery-etl

Граф коммитов

Автор	SHA1	Сообщение	Дата
Alexander	463dc15bf1	Support shared-prod UDFs (#4708 )	2023-12-14 13:45:13 -05:00
Daniel Thorn	b0bfc65052	DENG-965 - symbolicate and signaturize crash pings (#4642 )	2023-12-12 08:57:52 -08:00
Alexander	776c590db2	ci-fix Ignore dataset.update required permissions when dryrunning authorized views (#4681 ) * Refactor, add typehint * Add datasets.update clause denied for authorized views	2023-12-11 14:52:19 -05:00
Anna Scholtz	c31ae16efb	Revert "Define `event_monitoring_live_v1` views in `view.sql` files (#4576 )" (#4680 ) This reverts commit `2c4cc5eefe`.	2023-12-11 10:15:30 -08:00
Sean Rose	2c4cc5eefe	Define `event_monitoring_live_v1` views in `view.sql` files (#4576 ) * Define `event_monitoring_live_v1` views in `view.sql` files. So they get automatically deployed by the `bqetl_artifact_deployment.publish_views` Airflow task. * Support materialized views in view naming validation. * Handle `IF NOT EXISTS` in view naming validation. * Use regular expression to extract view ID in view naming validation. This simplifies the logic and avoids a sqlparse bug where it doesn't recognize the `MATERIALIZED` keyword. * Update other view regular expressions to allow for materialized views.	2023-12-08 11:54:02 -08:00
Sean Rose	308822d7cf	Have `bqetl query` commands fail if they don't find a matching query (#4662 ) * Have `bqetl query` commands fail if they don't find a matching query. * Update `test_run_query_no_query_file` test.	2023-12-07 16:57:11 -08:00
Alexander	f045e9d849	Support offset backfills, require metadata (#4627 ) * Skip backfills for queries without metadata.yaml * Support date_partition_offset * Fixed exclude, modified exception * Add test for offset backfill * Apply suggestions from code review Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com> * Formatting --------- Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>	2023-12-05 14:07:09 -05:00
kik-kik	076a0e0775	feat(DENG-2083): added firefox_ios_derived.clients_activation_v1 and corresponding view (#4631 ) * added firefox_ios_derived.clients_activation_v1 and corresponding view * fixing a missing seperator in firefox_ios_derived.clients_activation_v1 checks * adding firefox_ios_derived.clients_activation_v1 to shredder configuration * removed is_suspicious_device_client as it should not be there, thanks bani for pointing this out * fixed black formatting error inside shredder/config.py * applied bqetl formatting * minor styling tweak as suggested by bani in PR#4631	2023-12-05 12:42:39 +01:00
Eduardo Filho	0bf4c279d6	GLAM avoid scientific notation for big sample counts (#4647 ) * GLAM avoid scientific notation for big sample counts * Cast to bignumeric instead of numeric	2023-12-04 17:47:50 -05:00
Anna Scholtz	68ece978e0	Resolve correct task_id for tasks nested in a group (#4637 )	2023-12-01 11:38:59 -08:00
kik-kik	639381f13d	firefox_ios source added to shredder config (#4638 )	2023-12-01 17:48:56 +01:00
kik-kik	9409d2b6cb	feat(DENG-1774 / cancelled): deleting fenix_derived/firefox_android_clients_v2, v1 will remains the active model (#4610 ) * deleting fenix_derived/firefox_android_clients_v2, v1 will remain the active model * removed fenix_derived.firefox_android_clients_v2 from shredder config	2023-12-01 11:16:54 +01:00
Anna Scholtz	7087dbff30	Separate Airflow tasks for glean_usage (#4588 ) * Add support for assigning Airflow tasks to task groups * Generate separate Airflow tasks for glean_usage * Remove Airflow dependencies from old glean_usage tasks	2023-11-30 09:48:17 -08:00
Eduardo Filho	ec297972c6	Glam accounts for sampling when calculating sample_count for windows & release probes (#4581 ) * Glam - fix legacy windows & release probes' sample count going fwd * Glam FOG accounts for sampling when calculating total_sample for windows & release probes * fog - fix client count and sample count * Add channel filtering for fog	2023-11-23 17:06:20 -05:00
Lucia	fe2bf1d2de	DS-3361. Update documentation of initialize command. (#4592 ) Co-authored-by: Lucia Vargas <lvargas@mozilla.com>	2023-11-21 22:02:46 +01:00
Frank Bertsch	5cf8d30153	Add session date param; fix checks CLI bug (#4579 ) * Fix checks to filter on partitions * Don't print "missing checks file" on success Previously, the statement that checks.sql files were missing was printed on any execution of the for statement. ("else" clauses after "for"s execute after completion of the "for" clause). Instead, we want to print only when there are no files.	2023-11-17 15:33:23 -05:00
Linh Nguyen	c1c73e690e	Make sure that metadata `friendly_name` and `description` are not None (#4513 ) * Fill empty description * Assign a friendly name if the table doesn't have one * Update metadata tests * Update bigquery_etl/metadata/parse_metadata.py Co-authored-by: Alexander <anicholson@mozilla.com> * update test again --------- Co-authored-by: Alexander <anicholson@mozilla.com>	2023-11-17 11:48:11 -05:00
kik-kik	6e4c09a677	added fenix_derived.firefox_android_clients_v2 to shredder config (#4564 )	2023-11-16 11:26:59 +01:00
Sean Rose	e44e5ca705	Generate normal task dependencies from `depends_on` if the task is in the same DAG (#4569 ) * Generate normal task dependencies from `depends_on` if the task is in the same DAG. * Update `metadata.yaml` files to use `depends_on` rather than `upstream_dependencies`.	2023-11-14 16:06:00 -08:00
Lucia	894d42dde1	DS-3054. Support running an initialization query in parallel (#4322 ) * DS-3054. Create functions to support running an initialization query for all sample_ids in parallel. * DS-3054. Update _run_query function. * DS-3054. Use _run_query and mapped values for initialization in parallel. * DS-3054. Unify initialization to run in parallel and get sample_id range from metadata. * DS-3054. Minimize formatting of query template and remove need to modify existing initialization queries. Validate if a query should use parallelized or regular update. * DS-3054. Adding link to caveats. * DS-3054. Update sample_id range for initialization. * DS-3054. Use current implementation of run_query. * DS-3054. Update using a parameter instead of initialization in metadata. * DS-3054. DAG update with new parameter. * Pass parameters before calling _run_query(). * Use --append_tablein favour of INSERT INTO. * DS-3054 Separate parallel and non parallel init, plus some improvements. --------- Co-authored-by: Lucia Vargas <lvargas@mozilla.com>	2023-11-07 20:03:48 +01:00
Anna Scholtz	e7e7eaae06	Set depend_on_past=False for warn checks (#4526 )	2023-11-06 10:39:58 -08:00
kik-kik	0962ba65fe	prefixing schema error message inside dryrun to "ERROR" to make it easier to find when searching logs for cause of exit code 1 (#4522 )	2023-11-06 12:12:50 +01:00
Frank Bertsch	a271c024b2	Dont generate dags in bqetl query schedule command (#4517 )	2023-11-03 08:59:27 -07:00
Anna Scholtz	185f833f2a	Materialized views and aggregated tables for event monitoring (#4478 ) * WIP event monitoring * Add FxA custom events to view definition (#4483) * Add FxA custom events to view definition * Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql * Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql * Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql * Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql --------- Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Move event monitoring to glean_usage generator * Add cross-app event monitoring view * Generate cross app monitoring * Simplyfy event monitoring aggregation --------- Co-authored-by: akkomar <akkomar@users.noreply.github.com>	2023-11-01 14:20:20 -07:00
Frank Bertsch	55c5d412c1	Allow running multiple checks (#4471 ) * Allow running multiple checks * Don't yield anything on no matches	2023-10-24 14:39:01 -04:00
Frank Bertsch	ac0af012c2	Add opt-in to running checks for backfill (#4455 )	2023-10-18 17:34:58 -04:00
akkomar	7a36416554	Set project in init jobs (#4453 ) This fixes https://github.com/mozilla/bigquery-etl/pull/4452	2023-10-18 16:58:18 +02:00
akkomar	0171f93596	Set project in init jobs (#4452 )	2023-10-18 16:04:09 +02:00
akkomar	c3c5ecffd4	Don't set destination table for init jobs (#4451 ) This reverts https://github.com/mozilla/bigquery-etl/pull/4193/files By convention all but two init.sql jobs use `CREATE TABLE` statement. Setting destination table on a job that runs these queries causes an `BadRequest: 400 Cannot set destination table in jobs with DDL statements` error as observed in [1]. Apart from removing setting of destination_table this fixes two init queries. [1] https://workflow.telemetry.mozilla.org/dags/copy_deduplicate/grid?dag_run_id=scheduled__2023-10-17T01%3A00%3A00%2B00%3A00&task_id=baseline_clients_first_seen&tab=logs	2023-10-18 14:45:22 +02:00
Frank Bertsch	164ba19abf	Glean usage checks (#4445 ) * WIP: Add checks for glean_usage * Ignore pycache in autogenerated click cmds * Move check to backfill command * Remove view checks	2023-10-17 17:03:41 -04:00
Sean Rose	4bbbc32a5b	Put assert UDFs in `mozfun` project (#4367 ) * Put assert UDFs in `mozfun` project. * Tweak syntax in `assert.array_equals()` to avoid SQLGlot parsing error. https://github.com/tobymao/sqlglot/issues/2348 * Fix SQL syntax error in `assert.struct_equals()` tests. * Fix UDF dependency file path logic when deploying to stage. * Change regular expressions in `parse_routine` module to allow quotes around routines' dataset and name.	2023-10-13 10:58:42 -07:00
Eduardo Filho	83569d8211	Add sampling to glam-fog (#4409 ) * Add sampling to glam-fog * Simplify count logic * Update bigquery_etl/glam/templates/clients_daily_histogram_aggregates_v1.sql Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Update bigquery_etl/glam/templates/clients_daily_scalar_aggregates_v1.sql Co-authored-by: Anna Scholtz <anna@scholtzan.net> --------- Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2023-10-13 09:24:04 -04:00
Anna Scholtz	35ae323487	Funnel generators POC (#4390 ) * Add funnel generation logic * Example funnel config * Fix funnel columns * funnel generation dimensions * Optimize segmenting generated funnels * Add funnel generation docs * Schedule generated funnels * Skip DAGs with no tasks * Add background info funnel generator * Add funnel generation tests * Fix join_previous_step_on * Add funnel example config	2023-10-12 14:05:08 -07:00
Anna Scholtz	61da5cca03	Respect sql_dir in dryrun skip (#4334 ) * Respect sql_dir in dryrun skip * Update bigquery_etl/dryrun.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * Update bigquery_etl/dryrun.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * Set sql_dir when using Schema.from_query_file() --------- Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>	2023-10-12 13:27:54 -07:00
Anna Scholtz	3a8c6a9426	Filter files with multiple suffixes in stage deploy (#4403 )	2023-10-10 15:02:54 -07:00
Daniel Thorn	cd3aaabb66	Remove main summary and main_v4 from shredder (#4388 )	2023-10-06 10:07:25 -07:00
Sean Rose	191eded481	Shred four more FxA tables. (#4376 ) * moz-fx-data-shared-prod.firefox_accounts_derived.events_daily_v1 * moz-fx-data-shared-prod.firefox_accounts_derived.funnel_events_source_v1 * moz-fx-data-shared-prod.firefox_accounts_derived.fxa_log_device_command_events_v1 * moz-fx-data-shared-prod.firefox_accounts_derived.fxa_log_device_command_events_v2	2023-10-03 13:18:43 -07:00
Mike Williams	124a8613cc	fix DENG-1091: automatically add triage/confidential to private DAGs (#4363 ) Co-authored-by: Marlene Hirose <92952117+Marlene-M-Hirose@users.noreply.github.com> Co-authored-by: lelilia	2023-09-29 13:15:35 -04:00
Curtis Morales	6e32c52e2c	Don't retry check tasks (#4359 ) * Don't retry check tasks * Update test * Fix one more test	2023-09-28 15:23:23 -04:00
Daniel Thorn	ea05e6c6dc	Bug 1852630 - Rename main_remainder_v4 to main_v5 (#4353 ) and point at new copy_deduplicate tasks for similar pings	2023-09-28 09:08:55 -07:00
Daniel Thorn	d0cc8dfbe8	Add main_v5 et al to shredder (#4352 )	2023-09-27 14:26:27 -07:00
akkomar	3ae03d6861	Update split main ping queries parent task (#4347 ) This is required after these queries were moved out of copy_deduplicate_all in https://github.com/mozilla/telemetry-airflow/pull/1822	2023-09-26 16:10:57 +02:00
Sean Rose	e04e314f46	Shred `firefox_accounts_derived.fxa_gcp_*_events_v1`. (#4341 )	2023-09-25 09:22:43 -07:00
Anna Scholtz	1b6e598c9e	Publish private-bigquery-etl DAGs to private-generated-sql (#4319 )	2023-09-19 08:18:32 -07:00
Anna Scholtz	3f79cc5151	Generate soft etl checks (#4268 ) * Add markers to check cli command to differentiate warning from hard failures * Fix CI issues * Fix dag generation * Incorporate Feedback * Generate Airflow tasks for #fail and #warn checks --------- Co-authored-by: Alekhya Kommasani <akommasani@mozilla.com> Co-authored-by: Alekhya <88394696+alekhyamoz@users.noreply.github.com>	2023-09-13 10:22:39 -07:00
Alekhya	2e916eb856	DENG1381 - Add bqetl support for deprecation metadata (#4213 ) * Support bq dataset deprecation process (metadata) * Add bqetl metadata cli command * Initial draft for adding deprecation support to bqetl * Incorporate Anna's feedback * Fix based on whd's feedback * Fix ci issues * Remove unnecessary logic from metadata.py * Add dataset metadata yaml for ga_derived * Ignore dirs that do not have dataset_metadata yaml * Remove unwanted dataset metadata yamls * Update bigquery_etl/cli/metadata.py Co-authored-by: whd <whd@users.noreply.github.com> --------- Co-authored-by: whd <whd@users.noreply.github.com>	2023-09-12 18:47:54 +00:00
Anna Scholtz	cb9eff55fb	Handle references to INFORMATION_SCHEMA when deploying to stage (#4233 )	2023-09-12 09:49:49 -07:00
Sean Rose	d33db5b00f	Don't quote wildcard tables twice when updating stage references. (#4227 )	2023-08-31 17:11:43 -07:00
Anna Scholtz	f1f552ef47	Fix publishing udfs that use backticks in identifiers (#4225 ) * Fix publishing udfs that use backticks in identifiers * Update bigquery_etl/routine/parse_routine.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> --------- Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>	2023-08-31 16:00:44 -07:00
Alexander	3c7f95e314	Skip tables with all filtered backfill entries (#4217 )	2023-08-30 10:54:09 -04:00

1 2 3 4 5 ...

1221 Коммитов