Граф коммитов

5854 Коммитов

Автор SHA1 Сообщение Дата
m-d-bowerman d9387b9e6c
add missing fields to pocket_summary and backfill 2024-08-23 06:54:47 -07:00
Chelsey Beck c8593c9d8f
updating query to match schema (#6091)
* updating query to match schema

* adding new columns to test

---------

Co-authored-by: akkomar <akkomar@users.noreply.github.com>
2024-08-23 12:07:19 +02:00
Alekhya 81a931ea67
Backfill search_aggregates_v8 table (#6065)
Co-authored-by: skahmann3 <16420065+skahmann3@users.noreply.github.com>
2024-08-22 21:26:11 -04:00
ksiegler1 f54d847725
Add account deletion funnel (#6089)
Co-authored-by: Kimberly Siegler <kimberlysiegler@Kimberlys-MacBook-Pro-2.local>
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2024-08-22 14:53:24 -07:00
whd 345ed707dc
chore(ci): stop publishing to dockerhub (#5137) 2024-08-22 19:57:02 +00:00
Katie Windau 936a22379f
DENG-4568 add profile_group_id to sponsored_tiles_clients_daily_v1 (#6085) 2024-08-22 13:01:53 -05:00
Chelsey Beck efbfa05ebb
updating files according to workbook (#6086)
* updating files according to workbook

* formatting yaml

* reformatting

* adding s to sheet
2024-08-22 10:46:35 -07:00
Ben Wu 4d6990dbbe
Include tables with no expirations in table_partition_expirations (#6084) 2024-08-22 18:04:31 +01:00
Chelsey Beck 92b77d0049
adding tag for triage to record only (#6083) 2024-08-22 18:36:10 +02:00
akkomar bfc2eb1583
Allow field addition in events_stream tables with struct metrics (#6082) 2024-08-22 14:55:43 +02:00
Anna Scholtz 7271c771dd
Remove data-observability-dev as the datasets got removed from BigQuery (#6081)
* Remove data-observability-dev as the datasets got removed from BigQuery

* Remove bqetl_data_observability_test_data_copy
2024-08-21 16:40:50 -07:00
m-d-bowerman e21a96f60e
Add topic, Pocket, default ui fields (#6069) 2024-08-21 14:40:02 -07:00
Chelsey Beck 302e32b5f3
Adding python script to export bigquery table as json objects to GCS for Merino (#6058)
* adding python script to export bigquery table as json objects to GCS

* updating to delete files older than 3 days

* renaming directory to add version

* updating description

* reformatting and ignoring type on storage

* updating description

* moving script to query.py, adding exception, and adding .ndjson extension to temp file

* updating bucket name and adding file prefix

* adding days old as parameter
2024-08-21 12:16:34 -07:00
Katie Windau c1c690e3a6
DENG-4564 add profile_group_id to urlbar_clients_daily_v1 (#6079)
* DENG-4564 add profile_group_id to urlbar_clients_daily_v1

* DENG-4564 Add description for profile_group_id

* DENG-4564 add profile_group_id to the urlbar_clients_daily_v1 sql tests
2024-08-21 12:23:27 -05:00
Katie Windau b541625b7f
DENG-4554 - add profile_group_id to telemetry_derived.addons_v2 (#6078) 2024-08-20 15:33:39 -05:00
Katie Windau fb498e6e68
DENG-4546 - update events_v1 view and add schema.yaml for existing table so CI checks pass (#6077) 2024-08-20 13:58:18 -05:00
Ben Wu 03a42ae334
Sort datasets in table_partition_expirations sql generator (#6076) 2024-08-20 17:55:06 +01:00
akkomar 806fc84b13
Filter out overactive clients in all events_daily queries (#6074)
In https://github.com/mozilla/bigquery-etl/pull/2333 we started filtering out overactive clients from desktop events_daily query. Back then I opted for not adding this filter for Glean queries as their event counts were significantly lower than desktop.

We are now having `bqetl_event_rollup.mozilla_vpn_derived__events_daily__v1` failing with `Cannot query rows larger than 100MB limit.` error. We'll fix it by extending the `client_event_count` filter to all queries.

3M threshold seems safe and a good first value to try - I have tested this query with this threshold on `2024-08-18` and got the same number of rows in the output table as currently in production (10598).

I tested `2024-08-19` by running:
```
bqetl generate events_daily --use_cloud_function=False --output_dir=sql_test_events_daily
cat sql_test_events_daily/moz-fx-data-shared-prod/mozilla_vpn_derived/events_daily_v1/query.sql | bq query --project_id=moz-fx-data-shared-prod --parameter=submission_date:DATE:2024-08-19 --use_legacy_sql=false --max_rows=0 --dataset_id=mozdata:tmp --destination_table=akomar_vpn_events_daily_test --replace
```
2024-08-20 16:10:21 +02:00
Ryan VanderMeulen b7f47c3b2e
Add Windows 11 23H2 & 24H2 updates to aggregates (#5883)
Co-authored-by: Katie Windau <153020235+kwindau@users.noreply.github.com>
2024-08-20 04:12:17 -07:00
Katie Windau 244d97ef2b
DENG-4514 add profile group ID to clients_first_seen_v3 (#6072) 2024-08-19 11:35:12 -05:00
Katie Windau 36739bd0b5
DENG-4515 Add profile_group_id to search_clients_daily_v8 (#6064)
Co-authored-by: Marlene Hirose <92952117+Marlene-M-Hirose@users.noreply.github.com>
2024-08-19 07:04:40 -05:00
Marlene Hirose 331c1469ae
replace previous code that was the wrong code that was copied over from marketing_prod. Copy the right code over from marketing_prod to shared_prod (#6068) 2024-08-17 13:41:11 -07:00
Katie Windau af1e182c02
DENG-3096 Add 6 more countries to all 3 Cloudflare (#6070)
* DENG-3096 Added 6 more countries to pull browser usage data for

* DENG-3096 Add 6 more countries to device usage

* DENG-3096 Add 6 countries to OS usage
2024-08-16 19:18:37 +01:00
Alexander 4cfa504e49
chore(deploys): Catch and raise for missing dataset/project in table deploys (#6067)
* chore(deploys): Catch NotFound for missing dataset/project in table deploys

* Update bigquery_etl/deploy.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* update test exception text match

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2024-08-15 13:56:36 -04:00
Chelsey Beck 5a5087a031
DENG-3503 add braze subscription map v2 (#6066) 2024-08-15 09:59:26 -07:00
m-d-bowerman 435422c94e
Add topic selection fields to newtab_visits (#6004)
* Add topic selection fields to newtab_visits

* Remove duplicate from schema

* Add new metric to test schema

* Fix event name in test yaml

* Test yaml fix

* Add visit id to test case

---------

Co-authored-by: Chelsey Beck <64881557+chelseybeck@users.noreply.github.com>
2024-08-15 09:32:23 -07:00
Alekhya 32399c5c50
Fix search_aggregates_v8 schema (#6063)
* Fix search_aggregates_v8 schema

* Fix the order of columns in schema.yaml

* Fix schema.yaml format issue
2024-08-15 10:34:19 -04:00
Katie Windau 659d01bfb9
DENG-4515 fix cast null as string to match schema (#6062) 2024-08-15 07:09:19 -05:00
Alekhya 47a72c9f42
Add is_sap_monetizable and normalize the search engine (#5907)
* Add is_sap_monetizable to aggregates table

* Fix tests


* add is_acer_cohort


* Add normalized_default_search_engine column for mobile and desktop

* Fix test cases
2024-08-14 18:00:58 -04:00
Katie Windau ccd0f25c52
DENG-4515 - part 1, add column to schema (#6052) 2024-08-14 14:53:34 +01:00
Alekhya 9d2ba6e20d
Separate DAU metrics into `search_dau_aggregates` table (#6000)
* Separate DAU metrics

* Combine desktop + mobile DAU queries

* Fix CI failure

Fix CI failure

* Remove eligible markets, reduce group bys

* Update desktop DAU and revert new mobile bug

* Fix sql file format

---------

Co-authored-by: skahmann3 <16420065+skahmann3@users.noreply.github.com>
2024-08-13 20:52:38 -04:00
Katie Windau 69ec6a634f
DENG-3096 Add country names to the 3 cloudflare views (#6057) 2024-08-13 18:04:17 +01:00
Chelsey Beck dfb6bc7cbc
Mc 1256 adding filter on recommended at (#6056)
* adding filter on recommended_at

* adding filter on recommended at

* updating start date
2024-08-13 08:57:16 -07:00
Chelsey Beck 9db7d44c0b
Adding table for merino export to gcs (#6054)
* adding table for merino export to gcs

* updating dag name

* formatting sql

* Update sql/moz-fx-data-shared-prod/telemetry_derived/newtab_merino_extract_v1/query.sql

Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>

* Update sql/moz-fx-data-shared-prod/telemetry_derived/newtab_merino_extract_v1/query.sql

Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>

* removing save and dismiss events from filter

---------

Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>
2024-08-12 14:49:29 -07:00
Anna Scholtz 66f938bbf1
Revert "adding script to land files in GCS for merino migration (#6032)" (#6053)
This reverts commit 41e8c9b221.
2024-08-12 09:36:33 -07:00
Katie Windau fb064e8f4f
DENG-4481 Add profile_group_id to telemetry_derived.clients_first_see… (#6044) 2024-08-09 22:10:17 -05:00
Chelsey Beck 41e8c9b221
adding script to land files in GCS for merino migration (#6032)
* adding script to land files in GCS for merino migration

* formatting yaml

* formatting yaml again

* updating sql

* updating dag name

* adding dataset and table for merino exports

* updating check to add fail

* updating start date

* updating dag name
2024-08-09 16:52:40 -07:00
Sean Rose c4bdac3f32
Remove Google Search Console placeholder views from the `moz-fx-data-marketing-prod` project (DENG-1733). (#6028) 2024-08-09 16:05:15 -07:00
Sean Rose 7871d0be83
Add brand keyword patterns to `google_search_console.classify_site_query` UDF (DS-3756). (#6038) 2024-08-09 15:26:17 -07:00
Sean Rose dbdcffc4e1
Add `mozilla_vpn_derived.funnel_ga_to_subscriptions_v2` ETL using GA4 data (bug 1905989) (#6033)
* Add `mozilla_vpn_derived.funnel_ga_to_subscriptions_v2` ETL using GA4 data.

* Move `mozilla_vpn_derived.site_metrics_summary_v2` ETL to `bqetl_mozilla_vpn_site_metrics` DAG.

* Unschedule VPN ETLs that relied on GA3 data.

* Update `mozilla_vpn.funnel_ga_to_subscriptions` view to include GA4 data.

* Update `mozilla_vpn.site_metrics_summary` view to include GA4 data.

* Remove `mozilla_vpn.site_metrics_summary_v2` view.
2024-08-09 14:45:54 -07:00
Winnie Chan e0ecc25687
removed deletion date (#6048) 2024-08-09 14:15:44 -07:00
Katie Windau 9449eb446b
DENG-4491 part 1 - add profile_group_id to schema (#6050) 2024-08-09 14:17:30 -05:00
Marlene Hirose 0a181abc70
copy code from marketing-prod to shared-prod, refactor query to pull from shared-prod, change owner in metadata (#6049) 2024-08-09 11:57:38 -07:00
Marlene Hirose 55b5a15ca9
copy code from marketing-prod to shared-prod, refactor query to pull from shared-prod, change owner in metadata (#6046) 2024-08-09 11:26:14 -07:00
Marlene Hirose 35d0a34ab8
copy code from marketing-prod to shared-prod, refactor query to pull from shared-prod, change owner in metadata (#6039) 2024-08-09 10:36:17 -07:00
Marlene Hirose 5dfbb878dc
copy code from marketing-prod to shared-prod, refactor query to pull from shared-prod, change owner in metadata (#6045) 2024-08-09 10:14:21 -07:00
Marlene Hirose e91abe0a03
copy code from marketing-prod to shared-prod, refactor query to pull from shared-prod, change owner in metadata (#6037) 2024-08-09 09:31:55 -07:00
Katie Windau b07bb880bd
DENG-4482 update query for clients_last_seen_v2 to include profile_group_id (#6043) 2024-08-09 11:06:05 -05:00
Anna Scholtz 26bcdb0410
Remove parallelism parameter from dependency CLI command (#6041) 2024-08-08 21:57:47 -07:00
Anna Scholtz 72269ac955
Fix dependency record (#6040)
* Undo change on parallelizing dependency record

* fix flake8
2024-08-08 17:26:16 -07:00