Граф коммитов

69 Коммитов

Автор SHA1 Сообщение Дата
akkomar a78effac1c
Don't generate glean_usage queries for moso_mastodon_android (#5071)
We added this app in https://bugzilla.mozilla.org/show_bug.cgi?id=1879155 but it got deprecated before the process was finished.
2024-02-20 18:44:28 +01:00
akkomar 8765c86240
Exclude GLAM views from dry run (#5052) 2024-02-15 13:53:16 +01:00
Eduardo Filho e6fdee2c57
Bug 1880270 Allow glam-prod project queries to execute (#5045)
* Allow glam-prod project queries to execute

* put back what was accidently removed
2024-02-14 17:13:04 -05:00
Anna Scholtz 392c6315c9
Add event monitoring for VPN [DENG-2620] (#5007)
* Add event monitoring for VPN [DENG-2620]

* VPN support for event error monitoring
2024-02-12 15:20:33 -08:00
Marlene Hirose 373fa54c61
Deng 2579 gclid conversions v2 creation (#4972)
* initial commit of code for gclid_conversions_v2

* change owners to Marlene and Katie

* add gclid_conversions_v2 to Access Denied bqetl_project.yaml
2024-02-07 14:03:32 -08:00
Sean Rose 802da71a2c
Add ETLs and views for Google Search Console data (DENG-1733) (#4892)
* Add ETLs for historical Google Search Console data synced by Fivetran.

* Fix formatting of `CASE` subclauses like `WHEN` inside Jinja blocks.

* Add ETLs for current Google Search Console data exported directly to BigQuery.

* Add views for Google Search Console data.
2024-02-07 12:53:32 -08:00
Jan-Erik Rediger ab68b6fa04
events stream: Convert nested maps in metrics (#4964) 2024-02-06 14:21:00 +01:00
Anna Scholtz 138841d351
Package bqetl and publish to PyPI (#4917)
* pyproject.toml for bqetl

* Correctly resolve SQL generators from package

* CircleCI config to publish tagged versions to PyPI

* Get version from git tags
2024-02-05 09:04:04 -08:00
Lucia 89aff17297
DS-3104 Create version 2 of Clients Last Seen (#4236)
* DS-3104. Create quer, metadata and schedule  clients_last_seen_v2. Update view clients_last_seen to use this version.

* Update metadata and formatting

* Add to dry-run skip

* Update metadata

---------

Co-authored-by: Alexander Nicholson <anicholson@mozilla.com>
2024-02-05 10:38:49 -05:00
Anna Scholtz 562544690f
Remove generated content from main (#4867)
* Remove generated content from main

* Fix file permissions and comments
2024-01-24 09:52:04 -08:00
Alessio Placitelli e067426d97
Enable the events stream table for more products (#4879)
* Enable the events stream table for more products

This enables the events stream table for the Glean Debug Ping Viewer and the Glean Dictionary, in the spirit of dogfooding the table internally a bit more.

* Update bqetl_project.yaml

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2024-01-24 17:51:19 +01:00
whd 575adc35e4
Remove defunct PBD views (#4858) 2024-01-19 09:08:02 -08:00
Sean Rose 9aea89370b
Add `fxa_delete_events_v2` ETL based on FxA logs from GCP (#4843)
* Add `fxa_delete_events_v2` ETL based on FxA logs from GCP.

* Add `fxa_delete_events` view combining `fxa_delete_events_v1` and `fxa_delete_events_v2` data.

* Use `fxa_delete_events` view for Shredder.

* Update sql/moz-fx-data-shared-prod/firefox_accounts_derived/fxa_delete_events_v2/metadata.yaml

---------

Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>
2024-01-17 14:34:54 +01:00
Jan-Erik Rediger 1c7e4b35a4
Add a generator for events stream tables (#4655)
* Add a generator for events stream tables

Open questions:

* How does init work?
  * Is this manually triggered? How do we backfill to a certain date?
* Schema is defined in SQL query. How does this behave on changes in
  the future?
* Configuration: Right now inline in Python. Should we change this?

TODO:

* check table Schema

* Store category and name separately to help with filtering and clustering

* Concat into full event name using array to avoid NULL issues

* events stream: Read allowed apps from project configuration

* event stream: Cluster by event category

* Remove trailing commas

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Update sql_generators/glean_usage/events_stream.py

* Update sql_generators/glean_usage/templates/events_stream_v1.metadata.yaml

* Update sql_generators/glean_usage/templates/events_stream_v1.query.sql

* Update sql_generators/glean_usage/templates/events_stream_v1.query.sql

* Update sql_generators/glean_usage/templates/events_stream_v1.query.sql

* Update sql_generators/glean_usage/templates/events_stream_v1.query.sql

* Update sql_generators/glean_usage/templates/events_stream_v1.query.sql

* Update sql_generators/glean_usage/common.py

* Update sql_generators/glean_usage/events_stream.py

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2024-01-16 10:56:11 -08:00
akkomar c3fa65a30e
Bug 1874697 - Unschedule deletion_request_volume_v1 (#4836)
This query was superseded by v2 added in https://github.com/mozilla/bigquery-etl/pull/4442 and is no longer working after payload_bytes_decoded tables have been removed.
2024-01-16 08:30:43 -08:00
Anna Scholtz 8dd0e09aa1
Event flows monitoring (#4744)
* Event flow monitoring

* Script query for getting flow source target pairs

* Cross app script for event flow monitoring

* Add event flows

* Improve timestamp handling for events

* Add handling for accounts to event_flow_monitoring

* Handle null categories in event flow monitoring

* Limit number of events in event flow monitoring
2024-01-11 11:43:18 -08:00
Marlene Hirose 16cfc1b399
add query.sql file using Jinja templating to loop through projects (#4779)
* add query.sql file using Jinja templating to loop through projects

* add region-us to the INFORMATION_SCHEMA.JOBS_BY_PROJECT table description

* remove python file

* add bigquery_usage_v2/query.sql to skip list of bqetl_project.yaml

* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* remove Jinja format on and off

* change obscure alias 't1' to more explicit one 'jobs'

* change alias, remove arguments,referenced_tables, depends_on from metadata.yaml file

* update fist CTE referenced_tables to reference_table

* fix UNNEST(referenced_table) to UNNEST(referenced_tables)

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2024-01-05 12:40:29 -08:00
Alexander fab7e04764
Remove dryrun from view validation in CI (#4774)
* Remove dryrun from view validation

* Remove access denied view validation skips
2024-01-04 12:06:02 -05:00
Sean Rose b965973e8c
Remove `glean_usage` generator skip config for VPN views. (#4775)
The views in question were removed in #3567.
2024-01-03 10:48:13 -08:00
akkomar 53baf0c9af
DENG-2228 Add table to monitor total numbers of Mozilla Accounts (#4746) 2023-12-28 13:49:24 +01:00
Alexander 1a07ce68d9
Skip all fxa_accounts (#4740) 2023-12-22 12:10:02 -05:00
Marlene Hirose 2a576e03d4
add data from desktop_cohort_daily_retention to cohort_daily_stats, n… (#4711)
* add data from desktop_cohort_daily_retention to cohort_daily_stats, normalize desktop normalized_app_name to 'Firefox Desktop'

* add fakespot_daily_events_rollup to bqetl_project skip list

* Update sql/moz-fx-data-shared-prod/telemetry/cohort_daily_statistics/view.sql

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* add new line at end of file

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2023-12-15 08:51:35 -08:00
Alekhya 8b3fe3f5f5
RS-805 - Fix normalized_app_name in mobile search tables (#4684)
* Draft PR for RS-805

* Split the udf to separate PR

* Fix review checker sql

* add schema.yaml to fix the CI error

* Fix CI error

* add mobile search aggregate to skip dry run
2023-12-14 02:45:55 +05:30
Anna Scholtz c31ae16efb
Revert "Define `event_monitoring_live_v1` views in `view.sql` files (#4576)" (#4680)
This reverts commit 2c4cc5eefe.
2023-12-11 10:15:30 -08:00
Sean Rose 2c4cc5eefe
Define `event_monitoring_live_v1` views in `view.sql` files (#4576)
* Define `event_monitoring_live_v1` views in `view.sql` files.

So they get automatically deployed by the `bqetl_artifact_deployment.publish_views` Airflow task.

* Support materialized views in view naming validation.

* Handle `IF NOT EXISTS` in view naming validation.

* Use regular expression to extract view ID in view naming validation.

This simplifies the logic and avoids a sqlparse bug where it doesn't recognize the `MATERIALIZED` keyword.

* Update other view regular expressions to allow for materialized views.
2023-12-08 11:54:02 -08:00
Alexander ef3a0a2470
Skip accounts_db.fxa_oauth_clients dryrun (#4671) 2023-12-08 10:39:15 -05:00
Alexander 43f46ce81f
Skip accounts_db.fxa_oauth_clients in view validation (#4667) 2023-12-07 17:36:12 -05:00
Anna Scholtz 7e36ba259e
Skip check for baseline_clients_last_seen for Fire TV (#4640) 2023-12-01 09:24:11 -08:00
akkomar e1a94c9e4a
SVCSE-1595 Setup import of tables from production FxA databases (#4597) 2023-11-24 17:44:38 +01:00
akkomar a47bcedc28
SVCSE-1595 Setup import of tables from staging FxA databases (#4578) 2023-11-23 17:45:15 +01:00
Frank Bertsch dc3864fbc4
Add conversion event; fix gclid conversions query (#4584)
* Add first_run conversion; use correct table names

* Ignore dryrun of query and view

* Remove HAVING clause; fix logical_or
2023-11-20 14:48:02 -05:00
Frank Bertsch 05fed88b07
Include GA intraday sessions tables (#4582)
* Include GA intraday sessions tables

* Update doc string on backfilling ga_sessions

* Dont dryrun stub_attribution view
2023-11-20 11:58:45 -05:00
Frank Bertsch 104ece82d9
Add derived stub attribution logs (#4557)
* Add derived stub attribution logs

This table keeps triplets from the stub attribution logs.
The triplet of (dl_token, ga_client_id, stub_session_id)
will only ever appear once here.

See the associated decision brief:
https://docs.google.com/document/d/1L4vOR0nCGawwSRPA9xiR8Hmu_8ozCGUecXAtBWmGGA0/edit

* Move stub attribution table to new dataset

In order to ensure limited access to the stub attribution service
data without significantly decreasing developer velocity, we
move these tables to a new dataset. That dataset has the defaults
we want for all stub attribution log data:
- Defaults to just read access to data-science/DUET workgroup
- No read/write access for DE

We will backfill via the bqetl_backfill DAG.

* Rename view

* Use correct dataset name in view

* Skip dryrun; no access
2023-11-17 16:36:48 -05:00
Anna Scholtz 185f833f2a
Materialized views and aggregated tables for event monitoring (#4478)
* WIP event monitoring

* Add FxA custom events to view definition (#4483)

* Add FxA custom events to view definition

* Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql

* Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql

* Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql

* Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Move event monitoring to glean_usage generator

* Add cross-app event monitoring view

* Generate cross app monitoring

* Simplyfy event monitoring aggregation

---------

Co-authored-by: akkomar <akkomar@users.noreply.github.com>
2023-11-01 14:20:20 -07:00
akkomar 2754d7d7c0
DENG-1879 Setup import of emails table from FxA prod CloudSQL (#4494) 2023-10-31 10:31:27 +01:00
akkomar 511894d181
DENG-1879 Setup import of emails table from FxA stage CloudSQL (#4493) 2023-10-31 09:12:56 +01:00
wil stuckey c6ffd9e1fd
Glean server knobs monitoring table (#4491)
* Glean server knobs monitoring table

* fix code gen and skip dry-run

* Remove view creation in query
2023-10-30 16:23:46 -05:00
wil stuckey fac37452ab
Update experiment export query to include feature ids and branch feature config values (#4477)
* Update experiment export query to include feature ids and branch feature
config value.

* Add view skip for broken view

* add skip to dry run as well
2023-10-26 05:11:06 -07:00
akkomar 66729aa702
FXA-6721 Setup import of accounts table from FxA production CloudSQL (#4423) 2023-10-25 09:50:25 +02:00
Sean Rose 4bbbc32a5b
Put assert UDFs in `mozfun` project (#4367)
* Put assert UDFs in `mozfun` project.

* Tweak syntax in `assert.array_equals()` to avoid SQLGlot parsing error.
  https://github.com/tobymao/sqlglot/issues/2348

* Fix SQL syntax error in `assert.struct_equals()` tests.

* Fix UDF dependency file path logic when deploying to stage.

* Change regular expressions in `parse_routine` module to allow quotes around routines' dataset and name.
2023-10-13 10:58:42 -07:00
Anna Scholtz 35ae323487
Funnel generators POC (#4390)
* Add funnel generation logic

* Example funnel config

* Fix funnel columns

* funnel generation dimensions

* Optimize segmenting generated funnels

* Add funnel generation docs

* Schedule generated funnels

* Skip DAGs with no tasks

* Add background info funnel generator

* Add funnel generation tests

* Fix join_previous_step_on

* Add funnel example config
2023-10-12 14:05:08 -07:00
kik-kik 79de048842
feat(DENG-1696): docker fxa admin server sanitized updating after gcp migration (#4400)
* added v2 of docker_fxa_admin_server_events

* updated the view to include the fields needed

* added schema for firefox_accounts_derived/docker_fxa_admin_server_sanitized_v1

* Update sql/moz-fx-data-shared-prod/firefox_accounts/docker_fxa_admin_server_sanitized/view.sql

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* added docker_fxa_admin_server_sanitized_v2 to dry_run skip list

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2023-10-10 21:23:49 +02:00
kik-kik 38f9c26e02
feat(DENG-1578): docker_fxa_customs_sanitized_v2 query added and view updated (#4315)
* added docker_fxa_customs_sanitized_v2 query to pull from the new fxa log table and updated the view to union v1 and v2

* updated schema file for docker_fxa_customs_sanitized_v2

* tweaks made as sugested by srose in PR#4315

* added docker_fxa_customs_sanitized_v2 to skip list due to permissions

* once again scheduling fxa.docker_fxa_customs_sanitized_v1 as AWS events appear to still be arriving

* added schema for docker_fxa_customs_sanitized_v1

* updated date filter to represent when we stopped receiving relevant events and descheduled v1
2023-09-29 10:51:36 +02:00
Lucia 6826cbe395
DS-2947 clients_first_seen_v2 (#3962)
* DENG-850 Add test setup.

* DS-2947. Create new dataset and tests for Firefox Desktop Clients.

* DS-2947. Update dataset name to clients_first_seen_v2.

* DS-2947. Dataset name to clients_first_seen_v2.

* DS-2947. Updating tests.

* DS-2947. Schema for clients_first_seen_v2.

* DS-2947. Tests update.

* Tests update

* Restore test files

* DS-2947. Get data from main and new profile ping. Get first dltoken and dlsource available. Update tests.

* DS-2947. Use main ping's submission_timestamp_min to find the earliest ping.

* Remove app_display_versin as it is normalised in app_version. Update fields on a 7 day window. Retrieve data from the ping with the earliest NOT NULL value to remove NULLS when main ping is not available.

* Remove app_display_versin as it is normalised in app_version. Update fields on a 7 day window. Retrieve data from the ping with the earliest NOT NULL value to remove NULLS when main ping is not available.

* Update schemas, remove duplicated columns from query and init. Adapt existing unitest and add unitest for 7-day window updates. Include scheduler in DAG bqetl_analytics_tables.

* Update to enable initialize from query.sql. Remove init.sql.

* Update DAGs dependencies.

* DAG bqetl_main_summary updated.

* Query and tests update to join with sample_id.

* Refactor metadata fields in query and tests.

* Schema and descriptions updated. Remove filter to query the existing table. Remove the DATETIME, the first_seen_date is equivalent.

* Column required to be explicit in the query to match the schema.

* Test fix.

* Tests tmp changes.

* Remove 7-day window update and update tests.

* Add second_seen_date to the query

* DS-3037 Add second_seen_date and tests.

* DS-3037 Add is_init to calculate second_seen_date.

* remove files in analysis dataset

* DS-3037 Add is_init to calculate second_seen_date. Formatting.

* DS-2986 Add initialize script. Change submission_timestamp_min to submission_date due to NULL values in that field.

* DENG-1314. Update metadata reported pings and tests in the query.

* DS-3054. Update bqetl initialize command and query to support parallel run.

* DS-3054. Update query to use submission_timestamp_min from main ping where available for precision in source ping for first_seen_date, add source ping of second_seen_date, get only first_seen date from new_profile and shutdown ping due to 16% clients with more than one new_profile ping. Add capability to run in parallel in bqetl. Update tests.

* DS-3054. Remove initialize.py.

* Reset unrelated formatting changes from this branch to match the main branch.

* Correct jira template.

* DS-2947. Update naming for attribution dltoken and dlsource.

* DS-2947. Update column names and tests and clarity for the initialization command.

* DS-2986. Create table with schema and metadata in command initialize.

* Document what is the result expected from each subquery.

* Documentation update.

* DS-3145. Include user agent and the source ping. Add query documentation.

* DS-3145. Update tests.

* DS-3146 Update logic to get attributes only from the ping that reports the first_seen_date, include locale, update the source for app_build_id and collect second_seen_date only from main ping.

* DS-3146 Update logic to get attributes only from the ping that reports the first_seen_date, include locale, update the source for app_build_id and collect second_seen_date only from main ping.

* DS-3054. Updates and save initialization.

* DS-3054. Table name required for DAG generation.

* DS-2947_implement_bigquery_changes_in_another_PR.

* DS-2947 Naming

* Add clients_first_seen_v2 to skip dry-run.

---------

Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
2023-09-25 16:39:04 +02:00
akkomar e3208aeecc
FXA-6721 Setup import of accounts table from FxA stage CloudSQL (#4327)
* FXA-6721 Setup import of accounts table from FxA stage CloudSQL

* Fix typo
2023-09-22 15:27:07 +02:00
kik-kik 69592dab81
# feat(): updated fxa nonprod queries updated to be in line with production queries (#4297)
* updated fxa nonprod/staging queries to be in line with what production queries look like

* Apply suggestions from code review provided by srose

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* tweaks made as suggested by srose in PR#4297

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2023-09-21 16:54:17 +02:00
kik-kik 15b277c3d5
feat(DENG-1576): introducing fxa_log_device_command_events_v2 to include GCP based logs (#4308)
* introducing fxa_log_device_command_events_v2 to pull relevant logs from GCP log table

* updated the bqetl_fxa_events DAG

* correcting the source table

* added fxa_log_device_command_events_v2 to dry run skip list due to the source table permissions issue and added date filters to incidcate tiemframes for which events are included in both tables

* Update sql/moz-fx-data-shared-prod/firefox_accounts_derived/fxa_log_device_command_events_v2/query.sql

Co-authored-by: akkomar <akkomar@users.noreply.github.com>

* made changes as suggested by srose in PR#4308

---------

Co-authored-by: akkomar <akkomar@users.noreply.github.com>
2023-09-21 12:06:14 +02:00
akkomar b64aea8d1a
DENG-722 Handle events from FxA services migrated to new GCP environment (#4288)
Co-authored-by: kik-kik <kignasiak@mozilla.com>
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2023-09-13 11:49:29 -07:00
Sean Rose 52a97ad37d
Add `stripe_products_v1` and `stripe_plans_v1` ETLs (DENG-1448). (#4234) 2023-09-05 08:47:56 -07:00
Sean Rose 8f4a06a462
Add Stripe-based logical subscriptions ETLs (DENG-977) (#4166)
* Add `subscription_platform_derived.services_v1` ETL.

* Add `subscription_platform_derived.subplat_flow_events_v1` ETL.

* Add `subscription_platform_derived.subplat_attribution_impressions_v1` ETL.

* Add `mozilla_vpn_derived.users_attribution_v1` table.

* Add `subscription_platform_derived.stripe_logical_subscriptions_history_v1` ETL.

* Add `subscription_platform_derived.logical_subscriptions_history_v1` ETL.

* Add `subscription_platform_derived.daily_active_logical_subscriptions_v1` ETL.

* Add `subscription_platform_derived.monthly_active_logical_subscriptions_v1` ETL.

* Add `subscription_platform_derived.logical_subscription_events_v1` ETL.

* Add `subscription_platform.logical_subscriptions` view.

* Add `subscription_platform.daily_active_logical_subscriptions` view.

* Add `subscription_platform.monthly_active_logical_subscriptions` view.

* Add `subscription_platform.logical_subscription_events` view.
2023-08-18 14:07:05 -07:00