Граф коммитов

2205 Коммитов

Автор SHA1 Сообщение Дата
wil stuckey 36cd455d2a
Add `json.from_map` UDF (#4414) 2023-10-17 14:00:25 -05:00
Daniel Thorn b692735c80
DENG-476 - Update FoG decision ETL to reference main_v5 (#4436) 2023-10-17 07:14:10 -07:00
Daniel Thorn 40ffe7f0b6
DENG-476 - Update UDF to reference main pings instead of main_v4 (#4440) 2023-10-16 15:49:58 -07:00
Daniel Thorn 3b14b881f8
DENG-476 - Remove unnecessary references to main_v4 in addons ETL (#4434) 2023-10-16 15:13:37 -07:00
Daniel Thorn 065f4c78f9
DENG-476 - Update latest versions to reference main_v5 (#4441) 2023-10-16 14:56:43 -07:00
Daniel Thorn b63779faf1
DENG-476 - Update error aggregates to reference main_v5 (#4439) 2023-10-16 14:40:12 -07:00
Daniel Thorn 167d3a7a31
DENG-476 - Remove unnecessary reference to main_v4 from ssl ratios (#4437) 2023-10-16 14:17:25 -07:00
Daniel Thorn 4728884e3c
DS-2642 - Update tax address source in stripe payout report (#4422) 2023-10-13 13:16:12 -07:00
Daniel Thorn f41c85a37d
DS-2642 - Add tax amount to stripe payout report (#4421) 2023-10-13 11:56:00 -07:00
Sean Rose 4bbbc32a5b
Put assert UDFs in `mozfun` project (#4367)
* Put assert UDFs in `mozfun` project.

* Tweak syntax in `assert.array_equals()` to avoid SQLGlot parsing error.
  https://github.com/tobymao/sqlglot/issues/2348

* Fix SQL syntax error in `assert.struct_equals()` tests.

* Fix UDF dependency file path logic when deploying to stage.

* Change regular expressions in `parse_routine` module to allow quotes around routines' dataset and name.
2023-10-13 10:58:42 -07:00
kik-kik 109b9f0835
feat(DENG-1599): added ETL checks related to firefox_ios_derived.new_profile_activation_v2 (#4411)
* added ETL checks related to firefox_ios_derived.new_profile_activation_v2

* regenerated bqetl_firefox_ios DAG
2023-10-13 16:42:02 +02:00
Yashika Khurana 148d4eb539
feat: Cirrus monitor query (#4393)
* feat: Cirrus monitor query

* Update sql_generators/experiment_monitoring/templates/experiments_daily_active_clients_v1/query.sql

Co-authored-by: Mike Williams <102263964+mikewilli@users.noreply.github.com>

* feat: Monitor cirrus events

* fix: Use events

* Update sql/moz-fx-data-shared-prod/telemetry_derived/experiments_daily_active_clients_v1/query.sql

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Update sql_generators/experiment_monitoring/templates/experiment_enrollment_aggregates_v1/query.sql

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Update sql_generators/experiment_monitoring/templates/experiment_events_live_v1/init.sql

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Update sql_generators/experiment_monitoring/templates/experiment_search_events_live_v1/init.sql

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Update sql_generators/experiment_monitoring/templates/experiments_daily_active_clients_v1/query.sql

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Update sql_generators/experiment_monitoring/templates/experiments_daily_active_clients_v1/query.sql

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Update sql_generators/experiment_monitoring/templates/templating.yaml

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Update sql_generators/experiment_monitoring/templates/templating.yaml

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Update sql/moz-fx-data-shared-prod/monitor_cirrus_derived/experiment_search_events_live_v1/init.sql

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Update sql/moz-fx-data-shared-prod/monitor_cirrus_derived/experiment_events_live_v1/init.sql

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* feat: Update view to table

* feat: Update view to table

* use live tables instead of stable for live events

---------

Co-authored-by: Mike Williams <102263964+mikewilli@users.noreply.github.com>
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
Co-authored-by: Mike Williams <mwilliams@mozilla.com>
2023-10-12 14:46:26 -07:00
Frank Bertsch a5bcebc308
Add line numbers for clients_daily aggs (#4412) 2023-10-12 12:36:50 -04:00
Eduardo Filho a0d1e16101
Remove use_counters from queries and test file referencing main_v5 (#4410) 2023-10-12 11:16:52 -04:00
Eduardo Filho 2663e3d521
Migrate glam etl to main_v5 (#4389) 2023-10-11 11:45:18 -04:00
Daniel Thorn e202a94cab
Fix syntax in stripe.itemized_payout_reconciliation (#4401) 2023-10-10 13:05:00 -07:00
kik-kik 79de048842
feat(DENG-1696): docker fxa admin server sanitized updating after gcp migration (#4400)
* added v2 of docker_fxa_admin_server_events

* updated the view to include the fields needed

* added schema for firefox_accounts_derived/docker_fxa_admin_server_sanitized_v1

* Update sql/moz-fx-data-shared-prod/firefox_accounts/docker_fxa_admin_server_sanitized/view.sql

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* added docker_fxa_admin_server_sanitized_v2 to dry_run skip list

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2023-10-10 21:23:49 +02:00
Daniel Thorn b2a211eebc
DS-2642 - Add plan, product, and customer shipping info to stripe itemized report (#3998) 2023-10-10 10:33:31 -07:00
Lucia 6ee11d490a
Ds 2948 update second seen date (#4399)
* DS-3054. The second_seen_date depends on the first_seen_date.

* DS-3054. Add caveats to metadata.

* Impriovement to second_seen_Dates calculation, two aggregations reduced.

* Follow-up for PR-4396 and second_seen_date calculation.

* Typo

---------

Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
2023-10-09 19:10:32 +02:00
kik-kik bd1101393e
fixing firefox_ios funnel_retention checks config (#4398) 2023-10-09 18:22:54 +02:00
kik-kik 46ee8fb548
fixing firefox_ios app_store_funnel_v1 checks config (#4397) 2023-10-09 15:09:41 +02:00
Lucia 9a49db6256
DS-2948 Update second seen date (#4396)
* DS-3054. The second_seen_date depends on the first_seen_date.

* DS-3054. Add caveats to metadata.

* Impriovement to second_seen_Dates calculation, two aggregations reduced.

---------

Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
2023-10-09 13:30:27 +02:00
Mike Williams 62b2c8e0d4
remove unused deprecated field from branch (#4391) 2023-10-06 16:41:08 -04:00
kik-kik d95cf7e58b
feat(): renaming funnel_retention week_2 and week_4 and aggregates tables (#4386)
* renaming funnel_retention week_2 and week_4 to include clients (client level table) and removing aggregates from funnel_retention_week_4_aggregates_v1

* fixed typo in one of the retention queries

* made tweaks to field descriptions as suggested by lvargas
2023-10-06 15:46:37 +02:00
kik-kik 5fe800497f
moved out the is_suspicious_client_device filter from clients view to retention query (#4385) 2023-10-06 12:15:10 +02:00
Daniel Thorn e686965118
Remove hcm_clients in favor of clients_daily (#4375) 2023-10-05 12:31:39 -07:00
kik-kik 75bb1f531b
feat(): converting firefox ios derived app store funnel v1 to incremental job (#4381)
* modified firefox_ios_derived.app_store_funnel_v1 to be an incremental job

* regenerated bqetl_firefox_ios

* regenerated bqetl_analytics_aggregations

* fixing clustering settings for app_store_funnel_v1

* fixed app_store_funnel_v1 invalid query error

* renaming fields as requestes/agreed

* updated description for the impressions field
2023-10-05 19:34:57 +02:00
Daniel Thorn d4038d0182
unschedule telemetry_derived.main_summary_v4 (#4360) 2023-10-05 10:18:59 -07:00
kik-kik b2a00f8d0d
feat(): adding firefox ios retention week 4 aggregates query (#4379)
* added firefox_ios funnel_retention_week_4_aggregates query

* small tweak for week_2 and week_4 retention checks to use partioning field for determining subset of data to run the check on

* updated bqetl_firefox_ios DAG

* added a check to the retention tables to make sure the day diff between submission_date and first_seen_date is fixed

* fixed type in clustering settings for retention_week_4 aggregates query

* regenerated bqetl_analytics_aggregations DAG

* checked out bqetl_analtycsi_aggregations from main
2023-10-05 16:34:51 +02:00
Daniel Thorn de71eac14d
DENG-476 - Create derived table for HCM clients dashboards (#4372)
so that they no longer need to depend on main summary
2023-10-02 14:41:12 -07:00
kik-kik 3691e88952
Added jsonPayload.foundin to docker_fxa_customs_sanitized view (#4371) 2023-10-02 12:19:23 -07:00
kik-kik a63d690433
tweaks to partitioning settings for firefox_ios_derived.funnel_retention tables (#4370) 2023-10-02 19:36:32 +02:00
kik-kik cf4f8ab340
Added jsonPayload.errno to docker_fxa_customs_sanitized view (#4369) 2023-10-02 18:45:12 +02:00
kik-kik b97e0019cb
feat(DENG-1233): Queries to generate firefox_ios retention metrics (#4344)
* added queries for generating firefox_ios retention metrics and a view for access

* added repeat_first_month_user to the week 4 retention query as requested by soGaussian

* added clustering settings for both app_store_retention tables

* made some final tweaks to the firefox_ios retention queries

* renamed app_store_retention_* to funnel_retention_*

* small tweak to the checks.sql for funnel_retention_week_4 query to improve understanding of what were doing

* small tweaks to the retention queries

* added additional filter to remove client anomaly

* updated the retention queries to use firefox_ios.firefox_ios_clients table

* funnel_retention tables now using firefox_ios_clients table instead of first seen and additional dimensions added

* added new retention fields to the retention view

* regenerated bqetl_firefox_ios

* regenerated bqetl_firefox_ios DAG

* regenerated bqetl_analytics_aggregations DAG

* fixing the firefox_ios funnel_retention view
2023-10-02 11:40:23 +02:00
kik-kik 1254894389
added checks for firefox_ios_clients_v1 (#4365) 2023-09-29 13:40:25 -04:00
kik-kik 97848ae49d
updating clustering settings for firefox_ios_clients_v1 tale (#4364) 2023-09-29 12:48:28 -04:00
kik-kik 4f67e1324e
feat(): firefox_ios_clients_v1 table partitoning and clustering settings update + filtering out "suspicious ios" users (#4362)
* updated table partitoning and clustering settings + filtering out "suspicious ios" users as per bug-1846554

* rather than filtering out suspicious device clients, we have a flag field to easily filter them out and give us the ability to keep track of their numbers
2023-09-29 17:54:24 +02:00
kik-kik 38f9c26e02
feat(DENG-1578): docker_fxa_customs_sanitized_v2 query added and view updated (#4315)
* added docker_fxa_customs_sanitized_v2 query to pull from the new fxa log table and updated the view to union v1 and v2

* updated schema file for docker_fxa_customs_sanitized_v2

* tweaks made as sugested by srose in PR#4315

* added docker_fxa_customs_sanitized_v2 to skip list due to permissions

* once again scheduling fxa.docker_fxa_customs_sanitized_v1 as AWS events appear to still be arriving

* added schema for docker_fxa_customs_sanitized_v1

* updated date filter to represent when we stopped receiving relevant events and descheduled v1
2023-09-29 10:51:36 +02:00
Lucia ee3451390d
Prioritize ping type and first record for ping types with the same TIMESTAMP (#4354)
* Update query.sql

Prioritize ping types and only first record when pings have exact same timestamp.

* DS-3186. Get the first value returned by the subqueries, in case of same ping type and same timestamp.

* DS-3186. Update DAGs

* DS-3186. Update DAGs

---------

Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
2023-09-28 20:02:37 +02:00
Daniel Thorn 8ba1195c2b
Override default main-like tables to pin v4 (#4276) 2023-09-28 09:50:46 -07:00
Daniel Thorn ea05e6c6dc
Bug 1852630 - Rename main_remainder_v4 to main_v5 (#4353)
and point at new copy_deduplicate tasks for similar pings
2023-09-28 09:08:55 -07:00
kik-kik b48d8806ca
small tweak to the uniqueness check (#4343) 2023-09-27 11:02:16 +02:00
Marlene Hirose 8c7dd57147
add date filter to submission_date in bigquery_usage_v2 part of the q… (#4348)
* add date filter to submission_date in biqguery_usage_v2 part of the query

* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_tables_inventory_v1/query.py

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-09-26 14:03:57 -07:00
Marlene Hirose a9c4df8348
Update partitioning bigquery usage v2 (#4345)
* add submission_date to table, schema and update partition field to be date job is run

* Change field name from job_creation_date back to creation_date

* update description in schema.yaml, submission_date is when the Airflow DAG is run, not exactly when the table was refreshed
2023-09-25 10:44:32 -07:00
Lucia 6826cbe395
DS-2947 clients_first_seen_v2 (#3962)
* DENG-850 Add test setup.

* DS-2947. Create new dataset and tests for Firefox Desktop Clients.

* DS-2947. Update dataset name to clients_first_seen_v2.

* DS-2947. Dataset name to clients_first_seen_v2.

* DS-2947. Updating tests.

* DS-2947. Schema for clients_first_seen_v2.

* DS-2947. Tests update.

* Tests update

* Restore test files

* DS-2947. Get data from main and new profile ping. Get first dltoken and dlsource available. Update tests.

* DS-2947. Use main ping's submission_timestamp_min to find the earliest ping.

* Remove app_display_versin as it is normalised in app_version. Update fields on a 7 day window. Retrieve data from the ping with the earliest NOT NULL value to remove NULLS when main ping is not available.

* Remove app_display_versin as it is normalised in app_version. Update fields on a 7 day window. Retrieve data from the ping with the earliest NOT NULL value to remove NULLS when main ping is not available.

* Update schemas, remove duplicated columns from query and init. Adapt existing unitest and add unitest for 7-day window updates. Include scheduler in DAG bqetl_analytics_tables.

* Update to enable initialize from query.sql. Remove init.sql.

* Update DAGs dependencies.

* DAG bqetl_main_summary updated.

* Query and tests update to join with sample_id.

* Refactor metadata fields in query and tests.

* Schema and descriptions updated. Remove filter to query the existing table. Remove the DATETIME, the first_seen_date is equivalent.

* Column required to be explicit in the query to match the schema.

* Test fix.

* Tests tmp changes.

* Remove 7-day window update and update tests.

* Add second_seen_date to the query

* DS-3037 Add second_seen_date and tests.

* DS-3037 Add is_init to calculate second_seen_date.

* remove files in analysis dataset

* DS-3037 Add is_init to calculate second_seen_date. Formatting.

* DS-2986 Add initialize script. Change submission_timestamp_min to submission_date due to NULL values in that field.

* DENG-1314. Update metadata reported pings and tests in the query.

* DS-3054. Update bqetl initialize command and query to support parallel run.

* DS-3054. Update query to use submission_timestamp_min from main ping where available for precision in source ping for first_seen_date, add source ping of second_seen_date, get only first_seen date from new_profile and shutdown ping due to 16% clients with more than one new_profile ping. Add capability to run in parallel in bqetl. Update tests.

* DS-3054. Remove initialize.py.

* Reset unrelated formatting changes from this branch to match the main branch.

* Correct jira template.

* DS-2947. Update naming for attribution dltoken and dlsource.

* DS-2947. Update column names and tests and clarity for the initialization command.

* DS-2986. Create table with schema and metadata in command initialize.

* Document what is the result expected from each subquery.

* Documentation update.

* DS-3145. Include user agent and the source ping. Add query documentation.

* DS-3145. Update tests.

* DS-3146 Update logic to get attributes only from the ping that reports the first_seen_date, include locale, update the source for app_build_id and collect second_seen_date only from main ping.

* DS-3146 Update logic to get attributes only from the ping that reports the first_seen_date, include locale, update the source for app_build_id and collect second_seen_date only from main ping.

* DS-3054. Updates and save initialization.

* DS-3054. Table name required for DAG generation.

* DS-2947_implement_bigquery_changes_in_another_PR.

* DS-2947 Naming

* Add clients_first_seen_v2 to skip dry-run.

---------

Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
2023-09-25 16:39:04 +02:00
Marlene Hirose 7040463033
add query_id and username columns to bigquery_usage_v2 table - joinin… (#4203)
* add query_id and username columns to bigquery_usage_v2 table - joining JOBS_BY_ORG and JOBS_BY_PROJECT tables

* change jobs_by_project query to use variable {project} instead of

* adjust formatting

* refactor query to use jobs_by_organization_derived table

* remove data ops from codeowner of bigquery_table_usage_v2

* add marlene as a code owner to the bigquery_table_usage_v2 table

* add marlene as owner to DAG for bq_table_usage_v2
2023-09-22 10:15:37 -07:00
akkomar e3208aeecc
FXA-6721 Setup import of accounts table from FxA stage CloudSQL (#4327)
* FXA-6721 Setup import of accounts table from FxA stage CloudSQL

* Fix typo
2023-09-22 15:27:07 +02:00
kik-kik c9cabdcedd
updated the app_store_funnel_v1 query to use country_name mapping table to fix countries ending up without a country value and updated ETL check making sure data is arriving (#4336) 2023-09-21 17:55:11 +02:00
kik-kik 69592dab81
# feat(): updated fxa nonprod queries updated to be in line with production queries (#4297)
* updated fxa nonprod/staging queries to be in line with what production queries look like

* Apply suggestions from code review provided by srose

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* tweaks made as suggested by srose in PR#4297

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2023-09-21 16:54:17 +02:00
kik-kik 2356bfeca7
added additional checks to firefox_ios_derived.app_store_funnel_v1 (#4330) 2023-09-21 12:30:52 +02:00