* adding python script to export bigquery table as json objects to GCS
* updating to delete files older than 3 days
* renaming directory to add version
* updating description
* reformatting and ignoring type on storage
* updating description
* moving script to query.py, adding exception, and adding .ndjson extension to temp file
* updating bucket name and adding file prefix
* adding days old as parameter
In https://github.com/mozilla/bigquery-etl/pull/2333 we started filtering out overactive clients from desktop events_daily query. Back then I opted for not adding this filter for Glean queries as their event counts were significantly lower than desktop.
We are now having `bqetl_event_rollup.mozilla_vpn_derived__events_daily__v1` failing with `Cannot query rows larger than 100MB limit.` error. We'll fix it by extending the `client_event_count` filter to all queries.
3M threshold seems safe and a good first value to try - I have tested this query with this threshold on `2024-08-18` and got the same number of rows in the output table as currently in production (10598).
I tested `2024-08-19` by running:
```
bqetl generate events_daily --use_cloud_function=False --output_dir=sql_test_events_daily
cat sql_test_events_daily/moz-fx-data-shared-prod/mozilla_vpn_derived/events_daily_v1/query.sql | bq query --project_id=moz-fx-data-shared-prod --parameter=submission_date:DATE:2024-08-19 --use_legacy_sql=false --max_rows=0 --dataset_id=mozdata:tmp --destination_table=akomar_vpn_events_daily_test --replace
```
* DENG-3096 Added 6 more countries to pull browser usage data for
* DENG-3096 Add 6 more countries to device usage
* DENG-3096 Add 6 countries to OS usage
* chore(deploys): Catch NotFound for missing dataset/project in table deploys
* Update bigquery_etl/deploy.py
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* update test exception text match
---------
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* Add topic selection fields to newtab_visits
* Remove duplicate from schema
* Add new metric to test schema
* Fix event name in test yaml
* Test yaml fix
* Add visit id to test case
---------
Co-authored-by: Chelsey Beck <64881557+chelseybeck@users.noreply.github.com>
* Add is_sap_monetizable to aggregates table
* Fix tests
* add is_acer_cohort
* Add normalized_default_search_engine column for mobile and desktop
* Fix test cases
* Separate DAU metrics
* Combine desktop + mobile DAU queries
* Fix CI failure
Fix CI failure
* Remove eligible markets, reduce group bys
* Update desktop DAU and revert new mobile bug
* Fix sql file format
---------
Co-authored-by: skahmann3 <16420065+skahmann3@users.noreply.github.com>
* adding table for merino export to gcs
* updating dag name
* formatting sql
* Update sql/moz-fx-data-shared-prod/telemetry_derived/newtab_merino_extract_v1/query.sql
Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>
* Update sql/moz-fx-data-shared-prod/telemetry_derived/newtab_merino_extract_v1/query.sql
Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>
* removing save and dismiss events from filter
---------
Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>
* adding script to land files in GCS for merino migration
* formatting yaml
* formatting yaml again
* updating sql
* updating dag name
* adding dataset and table for merino exports
* updating check to add fail
* updating start date
* updating dag name
* Add `mozilla_vpn_derived.funnel_ga_to_subscriptions_v2` ETL using GA4 data.
* Move `mozilla_vpn_derived.site_metrics_summary_v2` ETL to `bqetl_mozilla_vpn_site_metrics` DAG.
* Unschedule VPN ETLs that relied on GA3 data.
* Update `mozilla_vpn.funnel_ga_to_subscriptions` view to include GA4 data.
* Update `mozilla_vpn.site_metrics_summary` view to include GA4 data.
* Remove `mozilla_vpn.site_metrics_summary_v2` view.