Граф коммитов

1221 Коммитов

Автор SHA1 Сообщение Дата
Alexander 463dc15bf1
Support shared-prod UDFs (#4708) 2023-12-14 13:45:13 -05:00
Daniel Thorn b0bfc65052
DENG-965 - symbolicate and signaturize crash pings (#4642) 2023-12-12 08:57:52 -08:00
Alexander 776c590db2
ci-fix Ignore dataset.update required permissions when dryrunning authorized views (#4681)
* Refactor, add typehint
* Add datasets.update clause denied for authorized views
2023-12-11 14:52:19 -05:00
Anna Scholtz c31ae16efb
Revert "Define `event_monitoring_live_v1` views in `view.sql` files (#4576)" (#4680)
This reverts commit 2c4cc5eefe.
2023-12-11 10:15:30 -08:00
Sean Rose 2c4cc5eefe
Define `event_monitoring_live_v1` views in `view.sql` files (#4576)
* Define `event_monitoring_live_v1` views in `view.sql` files.

So they get automatically deployed by the `bqetl_artifact_deployment.publish_views` Airflow task.

* Support materialized views in view naming validation.

* Handle `IF NOT EXISTS` in view naming validation.

* Use regular expression to extract view ID in view naming validation.

This simplifies the logic and avoids a sqlparse bug where it doesn't recognize the `MATERIALIZED` keyword.

* Update other view regular expressions to allow for materialized views.
2023-12-08 11:54:02 -08:00
Sean Rose 308822d7cf
Have `bqetl query` commands fail if they don't find a matching query (#4662)
* Have `bqetl query` commands fail if they don't find a matching query.

* Update `test_run_query_no_query_file` test.
2023-12-07 16:57:11 -08:00
Alexander f045e9d849
Support offset backfills, require metadata (#4627)
* Skip backfills for queries without metadata.yaml

* Support date_partition_offset

* Fixed exclude, modified exception

* Add test for offset backfill

* Apply suggestions from code review

Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>

* Formatting

---------

Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>
2023-12-05 14:07:09 -05:00
kik-kik 076a0e0775
feat(DENG-2083): added firefox_ios_derived.clients_activation_v1 and corresponding view (#4631)
* added firefox_ios_derived.clients_activation_v1 and corresponding view

* fixing a missing seperator in firefox_ios_derived.clients_activation_v1 checks

* adding firefox_ios_derived.clients_activation_v1 to shredder configuration

* removed is_suspicious_device_client as it should not be there, thanks bani for pointing this out

* fixed black formatting error inside shredder/config.py

* applied bqetl formatting

* minor styling tweak as suggested by bani in PR#4631
2023-12-05 12:42:39 +01:00
Eduardo Filho 0bf4c279d6
GLAM avoid scientific notation for big sample counts (#4647)
* GLAM avoid scientific notation for big sample counts

* Cast to bignumeric instead of numeric
2023-12-04 17:47:50 -05:00
Anna Scholtz 68ece978e0
Resolve correct task_id for tasks nested in a group (#4637) 2023-12-01 11:38:59 -08:00
kik-kik 639381f13d
firefox_ios source added to shredder config (#4638) 2023-12-01 17:48:56 +01:00
kik-kik 9409d2b6cb
feat(DENG-1774 / cancelled): deleting fenix_derived/firefox_android_clients_v2, v1 will remains the active model (#4610)
* deleting fenix_derived/firefox_android_clients_v2, v1 will remain the active model

* removed fenix_derived.firefox_android_clients_v2 from shredder config
2023-12-01 11:16:54 +01:00
Anna Scholtz 7087dbff30
Separate Airflow tasks for glean_usage (#4588)
* Add support for assigning Airflow tasks to task groups

* Generate separate Airflow tasks for glean_usage

* Remove Airflow dependencies from old glean_usage tasks
2023-11-30 09:48:17 -08:00
Eduardo Filho ec297972c6
Glam accounts for sampling when calculating sample_count for windows & release probes (#4581)
* Glam - fix legacy windows & release probes' sample count going fwd

* Glam FOG accounts for sampling when calculating total_sample for windows & release probes

* fog - fix client count and sample count

* Add channel filtering for fog
2023-11-23 17:06:20 -05:00
Lucia fe2bf1d2de
DS-3361. Update documentation of initialize command. (#4592)
Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
2023-11-21 22:02:46 +01:00
Frank Bertsch 5cf8d30153
Add session date param; fix checks CLI bug (#4579)
* Fix checks to filter on partitions

* Don't print "missing checks file" on success

Previously, the statement that checks.sql files
were missing was printed on any execution of the for
statement. ("else" clauses after "for"s execute after
completion of the "for" clause).

Instead, we want to print only when there are no files.
2023-11-17 15:33:23 -05:00
Linh Nguyen c1c73e690e
Make sure that metadata `friendly_name` and `description` are not None (#4513)
* Fill empty description

* Assign a friendly name if the table doesn't have one

* Update metadata tests

* Update bigquery_etl/metadata/parse_metadata.py

Co-authored-by: Alexander <anicholson@mozilla.com>

* update test again

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2023-11-17 11:48:11 -05:00
kik-kik 6e4c09a677
added fenix_derived.firefox_android_clients_v2 to shredder config (#4564) 2023-11-16 11:26:59 +01:00
Sean Rose e44e5ca705
Generate normal task dependencies from `depends_on` if the task is in the same DAG (#4569)
* Generate normal task dependencies from `depends_on` if the task is in the same DAG.

* Update `metadata.yaml` files to use `depends_on` rather than `upstream_dependencies`.
2023-11-14 16:06:00 -08:00
Lucia 894d42dde1
DS-3054. Support running an initialization query in parallel (#4322)
* DS-3054. Create functions to support running an initialization query for all sample_ids in parallel.

* DS-3054. Update _run_query function.

* DS-3054. Use _run_query and mapped values for initialization in parallel.

* DS-3054. Unify initialization to run in parallel and get sample_id range from metadata.

* DS-3054. Minimize formatting of query template and remove need to modify existing initialization queries. Validate if a query should use parallelized or regular update.

* DS-3054. Adding link to caveats.

* DS-3054. Update sample_id range for initialization.

* DS-3054. Use current implementation of run_query.

* DS-3054. Update using a parameter instead of initialization in metadata.

* DS-3054. DAG update with new parameter.

* Pass parameters before calling _run_query().

* Use --append_tablein favour of INSERT INTO.

* DS-3054 Separate parallel and non parallel init, plus some improvements.

---------

Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
2023-11-07 20:03:48 +01:00
Anna Scholtz e7e7eaae06
Set depend_on_past=False for warn checks (#4526) 2023-11-06 10:39:58 -08:00
kik-kik 0962ba65fe
prefixing schema error message inside dryrun to "ERROR" to make it easier to find when searching logs for cause of exit code 1 (#4522) 2023-11-06 12:12:50 +01:00
Frank Bertsch a271c024b2
Dont generate dags in bqetl query schedule command (#4517) 2023-11-03 08:59:27 -07:00
Anna Scholtz 185f833f2a
Materialized views and aggregated tables for event monitoring (#4478)
* WIP event monitoring

* Add FxA custom events to view definition (#4483)

* Add FxA custom events to view definition

* Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql

* Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql

* Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql

* Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Move event monitoring to glean_usage generator

* Add cross-app event monitoring view

* Generate cross app monitoring

* Simplyfy event monitoring aggregation

---------

Co-authored-by: akkomar <akkomar@users.noreply.github.com>
2023-11-01 14:20:20 -07:00
Frank Bertsch 55c5d412c1
Allow running multiple checks (#4471)
* Allow running multiple checks

* Don't yield anything on no matches
2023-10-24 14:39:01 -04:00
Frank Bertsch ac0af012c2
Add opt-in to running checks for backfill (#4455) 2023-10-18 17:34:58 -04:00
akkomar 7a36416554
Set project in init jobs (#4453)
This fixes https://github.com/mozilla/bigquery-etl/pull/4452
2023-10-18 16:58:18 +02:00
akkomar 0171f93596
Set project in init jobs (#4452) 2023-10-18 16:04:09 +02:00
akkomar c3c5ecffd4
Don't set destination table for init jobs (#4451)
This reverts https://github.com/mozilla/bigquery-etl/pull/4193/files

By convention all but two init.sql jobs use `CREATE TABLE` statement. Setting destination table on a job that runs these queries causes an `BadRequest: 400 Cannot set destination table in jobs with DDL statements` error as observed in [1].

Apart from removing setting of destination_table this fixes two init queries.

[1] https://workflow.telemetry.mozilla.org/dags/copy_deduplicate/grid?dag_run_id=scheduled__2023-10-17T01%3A00%3A00%2B00%3A00&task_id=baseline_clients_first_seen&tab=logs
2023-10-18 14:45:22 +02:00
Frank Bertsch 164ba19abf
Glean usage checks (#4445)
* WIP: Add checks for glean_usage

* Ignore pycache in autogenerated click cmds

* Move check to backfill command

* Remove view checks
2023-10-17 17:03:41 -04:00
Sean Rose 4bbbc32a5b
Put assert UDFs in `mozfun` project (#4367)
* Put assert UDFs in `mozfun` project.

* Tweak syntax in `assert.array_equals()` to avoid SQLGlot parsing error.
  https://github.com/tobymao/sqlglot/issues/2348

* Fix SQL syntax error in `assert.struct_equals()` tests.

* Fix UDF dependency file path logic when deploying to stage.

* Change regular expressions in `parse_routine` module to allow quotes around routines' dataset and name.
2023-10-13 10:58:42 -07:00
Eduardo Filho 83569d8211
Add sampling to glam-fog (#4409)
* Add sampling to glam-fog

* Simplify count logic

* Update bigquery_etl/glam/templates/clients_daily_histogram_aggregates_v1.sql

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Update bigquery_etl/glam/templates/clients_daily_scalar_aggregates_v1.sql

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-10-13 09:24:04 -04:00
Anna Scholtz 35ae323487
Funnel generators POC (#4390)
* Add funnel generation logic

* Example funnel config

* Fix funnel columns

* funnel generation dimensions

* Optimize segmenting generated funnels

* Add funnel generation docs

* Schedule generated funnels

* Skip DAGs with no tasks

* Add background info funnel generator

* Add funnel generation tests

* Fix join_previous_step_on

* Add funnel example config
2023-10-12 14:05:08 -07:00
Anna Scholtz 61da5cca03
Respect sql_dir in dryrun skip (#4334)
* Respect sql_dir in dryrun skip

* Update bigquery_etl/dryrun.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Update bigquery_etl/dryrun.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Set sql_dir when using Schema.from_query_file()

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2023-10-12 13:27:54 -07:00
Anna Scholtz 3a8c6a9426
Filter files with multiple suffixes in stage deploy (#4403) 2023-10-10 15:02:54 -07:00
Daniel Thorn cd3aaabb66
Remove main summary and main_v4 from shredder (#4388) 2023-10-06 10:07:25 -07:00
Sean Rose 191eded481
Shred four more FxA tables. (#4376)
* moz-fx-data-shared-prod.firefox_accounts_derived.events_daily_v1
  * moz-fx-data-shared-prod.firefox_accounts_derived.funnel_events_source_v1
  * moz-fx-data-shared-prod.firefox_accounts_derived.fxa_log_device_command_events_v1
  * moz-fx-data-shared-prod.firefox_accounts_derived.fxa_log_device_command_events_v2
2023-10-03 13:18:43 -07:00
Mike Williams 124a8613cc
fix DENG-1091: automatically add triage/confidential to private DAGs (#4363)
Co-authored-by: Marlene Hirose <92952117+Marlene-M-Hirose@users.noreply.github.com>
Co-authored-by: lelilia
2023-09-29 13:15:35 -04:00
Curtis Morales 6e32c52e2c
Don't retry check tasks (#4359)
* Don't retry check tasks

* Update test

* Fix one more test
2023-09-28 15:23:23 -04:00
Daniel Thorn ea05e6c6dc
Bug 1852630 - Rename main_remainder_v4 to main_v5 (#4353)
and point at new copy_deduplicate tasks for similar pings
2023-09-28 09:08:55 -07:00
Daniel Thorn d0cc8dfbe8
Add main_v5 et al to shredder (#4352) 2023-09-27 14:26:27 -07:00
akkomar 3ae03d6861
Update split main ping queries parent task (#4347)
This is required after these queries were moved out of copy_deduplicate_all in https://github.com/mozilla/telemetry-airflow/pull/1822
2023-09-26 16:10:57 +02:00
Sean Rose e04e314f46
Shred `firefox_accounts_derived.fxa_gcp_*_events_v1`. (#4341) 2023-09-25 09:22:43 -07:00
Anna Scholtz 1b6e598c9e
Publish private-bigquery-etl DAGs to private-generated-sql (#4319) 2023-09-19 08:18:32 -07:00
Anna Scholtz 3f79cc5151
Generate soft etl checks (#4268)
* Add markers to check cli command to differentiate warning from hard failures

* Fix CI issues

* Fix dag generation

* Incorporate Feedback

* Generate Airflow tasks for #fail and #warn checks

---------

Co-authored-by: Alekhya Kommasani <akommasani@mozilla.com>
Co-authored-by: Alekhya <88394696+alekhyamoz@users.noreply.github.com>
2023-09-13 10:22:39 -07:00
Alekhya 2e916eb856
DENG1381 - Add bqetl support for deprecation metadata (#4213)
* Support bq dataset deprecation process (metadata)

* Add bqetl metadata cli command

* Initial draft for adding deprecation support to bqetl

* Incorporate Anna's feedback

* Fix based on whd's feedback

* Fix ci issues

* Remove unnecessary logic from metadata.py

* Add dataset metadata yaml for ga_derived

* Ignore dirs that do not have dataset_metadata yaml

* Remove unwanted dataset metadata yamls

* Update bigquery_etl/cli/metadata.py

Co-authored-by: whd <whd@users.noreply.github.com>

---------

Co-authored-by: whd <whd@users.noreply.github.com>
2023-09-12 18:47:54 +00:00
Anna Scholtz cb9eff55fb
Handle references to INFORMATION_SCHEMA when deploying to stage (#4233) 2023-09-12 09:49:49 -07:00
Sean Rose d33db5b00f
Don't quote wildcard tables twice when updating stage references. (#4227) 2023-08-31 17:11:43 -07:00
Anna Scholtz f1f552ef47
Fix publishing udfs that use backticks in identifiers (#4225)
* Fix publishing udfs that use backticks in identifiers

* Update bigquery_etl/routine/parse_routine.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2023-08-31 16:00:44 -07:00
Alexander 3c7f95e314
Skip tables with all filtered backfill entries (#4217) 2023-08-30 10:54:09 -04:00