* Put assert UDFs in `mozfun` project.
* Tweak syntax in `assert.array_equals()` to avoid SQLGlot parsing error.
https://github.com/tobymao/sqlglot/issues/2348
* Fix SQL syntax error in `assert.struct_equals()` tests.
* Fix UDF dependency file path logic when deploying to stage.
* Change regular expressions in `parse_routine` module to allow quotes around routines' dataset and name.
* added v2 of docker_fxa_admin_server_events
* updated the view to include the fields needed
* added schema for firefox_accounts_derived/docker_fxa_admin_server_sanitized_v1
* Update sql/moz-fx-data-shared-prod/firefox_accounts/docker_fxa_admin_server_sanitized/view.sql
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* added docker_fxa_admin_server_sanitized_v2 to dry_run skip list
---------
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* DS-3054. The second_seen_date depends on the first_seen_date.
* DS-3054. Add caveats to metadata.
* Impriovement to second_seen_Dates calculation, two aggregations reduced.
* Follow-up for PR-4396 and second_seen_date calculation.
* Typo
---------
Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
* DS-3054. The second_seen_date depends on the first_seen_date.
* DS-3054. Add caveats to metadata.
* Impriovement to second_seen_Dates calculation, two aggregations reduced.
---------
Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
* renaming funnel_retention week_2 and week_4 to include clients (client level table) and removing aggregates from funnel_retention_week_4_aggregates_v1
* fixed typo in one of the retention queries
* made tweaks to field descriptions as suggested by lvargas
* modified firefox_ios_derived.app_store_funnel_v1 to be an incremental job
* regenerated bqetl_firefox_ios
* regenerated bqetl_analytics_aggregations
* fixing clustering settings for app_store_funnel_v1
* fixed app_store_funnel_v1 invalid query error
* renaming fields as requestes/agreed
* updated description for the impressions field
* added firefox_ios funnel_retention_week_4_aggregates query
* small tweak for week_2 and week_4 retention checks to use partioning field for determining subset of data to run the check on
* updated bqetl_firefox_ios DAG
* added a check to the retention tables to make sure the day diff between submission_date and first_seen_date is fixed
* fixed type in clustering settings for retention_week_4 aggregates query
* regenerated bqetl_analytics_aggregations DAG
* checked out bqetl_analtycsi_aggregations from main
* added queries for generating firefox_ios retention metrics and a view for access
* added repeat_first_month_user to the week 4 retention query as requested by soGaussian
* added clustering settings for both app_store_retention tables
* made some final tweaks to the firefox_ios retention queries
* renamed app_store_retention_* to funnel_retention_*
* small tweak to the checks.sql for funnel_retention_week_4 query to improve understanding of what were doing
* small tweaks to the retention queries
* added additional filter to remove client anomaly
* updated the retention queries to use firefox_ios.firefox_ios_clients table
* funnel_retention tables now using firefox_ios_clients table instead of first seen and additional dimensions added
* added new retention fields to the retention view
* regenerated bqetl_firefox_ios
* regenerated bqetl_firefox_ios DAG
* regenerated bqetl_analytics_aggregations DAG
* fixing the firefox_ios funnel_retention view
* updated table partitoning and clustering settings + filtering out "suspicious ios" users as per bug-1846554
* rather than filtering out suspicious device clients, we have a flag field to easily filter them out and give us the ability to keep track of their numbers
* added docker_fxa_customs_sanitized_v2 query to pull from the new fxa log table and updated the view to union v1 and v2
* updated schema file for docker_fxa_customs_sanitized_v2
* tweaks made as sugested by srose in PR#4315
* added docker_fxa_customs_sanitized_v2 to skip list due to permissions
* once again scheduling fxa.docker_fxa_customs_sanitized_v1 as AWS events appear to still be arriving
* added schema for docker_fxa_customs_sanitized_v1
* updated date filter to represent when we stopped receiving relevant events and descheduled v1
* Update query.sql
Prioritize ping types and only first record when pings have exact same timestamp.
* DS-3186. Get the first value returned by the subqueries, in case of same ping type and same timestamp.
* DS-3186. Update DAGs
* DS-3186. Update DAGs
---------
Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
* add date filter to submission_date in biqguery_usage_v2 part of the query
* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_tables_inventory_v1/query.py
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
---------
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
* add submission_date to table, schema and update partition field to be date job is run
* Change field name from job_creation_date back to creation_date
* update description in schema.yaml, submission_date is when the Airflow DAG is run, not exactly when the table was refreshed
* DENG-850 Add test setup.
* DS-2947. Create new dataset and tests for Firefox Desktop Clients.
* DS-2947. Update dataset name to clients_first_seen_v2.
* DS-2947. Dataset name to clients_first_seen_v2.
* DS-2947. Updating tests.
* DS-2947. Schema for clients_first_seen_v2.
* DS-2947. Tests update.
* Tests update
* Restore test files
* DS-2947. Get data from main and new profile ping. Get first dltoken and dlsource available. Update tests.
* DS-2947. Use main ping's submission_timestamp_min to find the earliest ping.
* Remove app_display_versin as it is normalised in app_version. Update fields on a 7 day window. Retrieve data from the ping with the earliest NOT NULL value to remove NULLS when main ping is not available.
* Remove app_display_versin as it is normalised in app_version. Update fields on a 7 day window. Retrieve data from the ping with the earliest NOT NULL value to remove NULLS when main ping is not available.
* Update schemas, remove duplicated columns from query and init. Adapt existing unitest and add unitest for 7-day window updates. Include scheduler in DAG bqetl_analytics_tables.
* Update to enable initialize from query.sql. Remove init.sql.
* Update DAGs dependencies.
* DAG bqetl_main_summary updated.
* Query and tests update to join with sample_id.
* Refactor metadata fields in query and tests.
* Schema and descriptions updated. Remove filter to query the existing table. Remove the DATETIME, the first_seen_date is equivalent.
* Column required to be explicit in the query to match the schema.
* Test fix.
* Tests tmp changes.
* Remove 7-day window update and update tests.
* Add second_seen_date to the query
* DS-3037 Add second_seen_date and tests.
* DS-3037 Add is_init to calculate second_seen_date.
* remove files in analysis dataset
* DS-3037 Add is_init to calculate second_seen_date. Formatting.
* DS-2986 Add initialize script. Change submission_timestamp_min to submission_date due to NULL values in that field.
* DENG-1314. Update metadata reported pings and tests in the query.
* DS-3054. Update bqetl initialize command and query to support parallel run.
* DS-3054. Update query to use submission_timestamp_min from main ping where available for precision in source ping for first_seen_date, add source ping of second_seen_date, get only first_seen date from new_profile and shutdown ping due to 16% clients with more than one new_profile ping. Add capability to run in parallel in bqetl. Update tests.
* DS-3054. Remove initialize.py.
* Reset unrelated formatting changes from this branch to match the main branch.
* Correct jira template.
* DS-2947. Update naming for attribution dltoken and dlsource.
* DS-2947. Update column names and tests and clarity for the initialization command.
* DS-2986. Create table with schema and metadata in command initialize.
* Document what is the result expected from each subquery.
* Documentation update.
* DS-3145. Include user agent and the source ping. Add query documentation.
* DS-3145. Update tests.
* DS-3146 Update logic to get attributes only from the ping that reports the first_seen_date, include locale, update the source for app_build_id and collect second_seen_date only from main ping.
* DS-3146 Update logic to get attributes only from the ping that reports the first_seen_date, include locale, update the source for app_build_id and collect second_seen_date only from main ping.
* DS-3054. Updates and save initialization.
* DS-3054. Table name required for DAG generation.
* DS-2947_implement_bigquery_changes_in_another_PR.
* DS-2947 Naming
* Add clients_first_seen_v2 to skip dry-run.
---------
Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
* add query_id and username columns to bigquery_usage_v2 table - joining JOBS_BY_ORG and JOBS_BY_PROJECT tables
* change jobs_by_project query to use variable {project} instead of
* adjust formatting
* refactor query to use jobs_by_organization_derived table
* remove data ops from codeowner of bigquery_table_usage_v2
* add marlene as a code owner to the bigquery_table_usage_v2 table
* add marlene as owner to DAG for bq_table_usage_v2
* updated fxa nonprod/staging queries to be in line with what production queries look like
* Apply suggestions from code review provided by srose
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* tweaks made as suggested by srose in PR#4297
---------
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>