* Add ETLs for historical Google Search Console data synced by Fivetran.
* Fix formatting of `CASE` subclauses like `WHEN` inside Jinja blocks.
* Add ETLs for current Google Search Console data exported directly to BigQuery.
* Add views for Google Search Console data.
* pyproject.toml for bqetl
* Correctly resolve SQL generators from package
* CircleCI config to publish tagged versions to PyPI
* Get version from git tags
* DS-3104. Create quer, metadata and schedule clients_last_seen_v2. Update view clients_last_seen to use this version.
* Update metadata and formatting
* Add to dry-run skip
* Update metadata
---------
Co-authored-by: Alexander Nicholson <anicholson@mozilla.com>
* Enable the events stream table for more products
This enables the events stream table for the Glean Debug Ping Viewer and the Glean Dictionary, in the spirit of dogfooding the table internally a bit more.
* Update bqetl_project.yaml
---------
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
* Add a generator for events stream tables
Open questions:
* How does init work?
* Is this manually triggered? How do we backfill to a certain date?
* Schema is defined in SQL query. How does this behave on changes in
the future?
* Configuration: Right now inline in Python. Should we change this?
TODO:
* check table Schema
* Store category and name separately to help with filtering and clustering
* Concat into full event name using array to avoid NULL issues
* events stream: Read allowed apps from project configuration
* event stream: Cluster by event category
* Remove trailing commas
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
* Update sql_generators/glean_usage/events_stream.py
* Update sql_generators/glean_usage/templates/events_stream_v1.metadata.yaml
* Update sql_generators/glean_usage/templates/events_stream_v1.query.sql
* Update sql_generators/glean_usage/templates/events_stream_v1.query.sql
* Update sql_generators/glean_usage/templates/events_stream_v1.query.sql
* Update sql_generators/glean_usage/templates/events_stream_v1.query.sql
* Update sql_generators/glean_usage/templates/events_stream_v1.query.sql
* Update sql_generators/glean_usage/common.py
* Update sql_generators/glean_usage/events_stream.py
---------
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
* add query.sql file using Jinja templating to loop through projects
* add region-us to the INFORMATION_SCHEMA.JOBS_BY_PROJECT table description
* remove python file
* add bigquery_usage_v2/query.sql to skip list of bqetl_project.yaml
* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* remove Jinja format on and off
* change obscure alias 't1' to more explicit one 'jobs'
* change alias, remove arguments,referenced_tables, depends_on from metadata.yaml file
* update fist CTE referenced_tables to reference_table
* fix UNNEST(referenced_table) to UNNEST(referenced_tables)
---------
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* add data from desktop_cohort_daily_retention to cohort_daily_stats, normalize desktop normalized_app_name to 'Firefox Desktop'
* add fakespot_daily_events_rollup to bqetl_project skip list
* Update sql/moz-fx-data-shared-prod/telemetry/cohort_daily_statistics/view.sql
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* add new line at end of file
---------
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* Draft PR for RS-805
* Split the udf to separate PR
* Fix review checker sql
* add schema.yaml to fix the CI error
* Fix CI error
* add mobile search aggregate to skip dry run
* Define `event_monitoring_live_v1` views in `view.sql` files.
So they get automatically deployed by the `bqetl_artifact_deployment.publish_views` Airflow task.
* Support materialized views in view naming validation.
* Handle `IF NOT EXISTS` in view naming validation.
* Use regular expression to extract view ID in view naming validation.
This simplifies the logic and avoids a sqlparse bug where it doesn't recognize the `MATERIALIZED` keyword.
* Update other view regular expressions to allow for materialized views.
* Add derived stub attribution logs
This table keeps triplets from the stub attribution logs.
The triplet of (dl_token, ga_client_id, stub_session_id)
will only ever appear once here.
See the associated decision brief:
https://docs.google.com/document/d/1L4vOR0nCGawwSRPA9xiR8Hmu_8ozCGUecXAtBWmGGA0/edit
* Move stub attribution table to new dataset
In order to ensure limited access to the stub attribution service
data without significantly decreasing developer velocity, we
move these tables to a new dataset. That dataset has the defaults
we want for all stub attribution log data:
- Defaults to just read access to data-science/DUET workgroup
- No read/write access for DE
We will backfill via the bqetl_backfill DAG.
* Rename view
* Use correct dataset name in view
* Skip dryrun; no access
* Put assert UDFs in `mozfun` project.
* Tweak syntax in `assert.array_equals()` to avoid SQLGlot parsing error.
https://github.com/tobymao/sqlglot/issues/2348
* Fix SQL syntax error in `assert.struct_equals()` tests.
* Fix UDF dependency file path logic when deploying to stage.
* Change regular expressions in `parse_routine` module to allow quotes around routines' dataset and name.
* added v2 of docker_fxa_admin_server_events
* updated the view to include the fields needed
* added schema for firefox_accounts_derived/docker_fxa_admin_server_sanitized_v1
* Update sql/moz-fx-data-shared-prod/firefox_accounts/docker_fxa_admin_server_sanitized/view.sql
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* added docker_fxa_admin_server_sanitized_v2 to dry_run skip list
---------
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* added docker_fxa_customs_sanitized_v2 query to pull from the new fxa log table and updated the view to union v1 and v2
* updated schema file for docker_fxa_customs_sanitized_v2
* tweaks made as sugested by srose in PR#4315
* added docker_fxa_customs_sanitized_v2 to skip list due to permissions
* once again scheduling fxa.docker_fxa_customs_sanitized_v1 as AWS events appear to still be arriving
* added schema for docker_fxa_customs_sanitized_v1
* updated date filter to represent when we stopped receiving relevant events and descheduled v1
* DENG-850 Add test setup.
* DS-2947. Create new dataset and tests for Firefox Desktop Clients.
* DS-2947. Update dataset name to clients_first_seen_v2.
* DS-2947. Dataset name to clients_first_seen_v2.
* DS-2947. Updating tests.
* DS-2947. Schema for clients_first_seen_v2.
* DS-2947. Tests update.
* Tests update
* Restore test files
* DS-2947. Get data from main and new profile ping. Get first dltoken and dlsource available. Update tests.
* DS-2947. Use main ping's submission_timestamp_min to find the earliest ping.
* Remove app_display_versin as it is normalised in app_version. Update fields on a 7 day window. Retrieve data from the ping with the earliest NOT NULL value to remove NULLS when main ping is not available.
* Remove app_display_versin as it is normalised in app_version. Update fields on a 7 day window. Retrieve data from the ping with the earliest NOT NULL value to remove NULLS when main ping is not available.
* Update schemas, remove duplicated columns from query and init. Adapt existing unitest and add unitest for 7-day window updates. Include scheduler in DAG bqetl_analytics_tables.
* Update to enable initialize from query.sql. Remove init.sql.
* Update DAGs dependencies.
* DAG bqetl_main_summary updated.
* Query and tests update to join with sample_id.
* Refactor metadata fields in query and tests.
* Schema and descriptions updated. Remove filter to query the existing table. Remove the DATETIME, the first_seen_date is equivalent.
* Column required to be explicit in the query to match the schema.
* Test fix.
* Tests tmp changes.
* Remove 7-day window update and update tests.
* Add second_seen_date to the query
* DS-3037 Add second_seen_date and tests.
* DS-3037 Add is_init to calculate second_seen_date.
* remove files in analysis dataset
* DS-3037 Add is_init to calculate second_seen_date. Formatting.
* DS-2986 Add initialize script. Change submission_timestamp_min to submission_date due to NULL values in that field.
* DENG-1314. Update metadata reported pings and tests in the query.
* DS-3054. Update bqetl initialize command and query to support parallel run.
* DS-3054. Update query to use submission_timestamp_min from main ping where available for precision in source ping for first_seen_date, add source ping of second_seen_date, get only first_seen date from new_profile and shutdown ping due to 16% clients with more than one new_profile ping. Add capability to run in parallel in bqetl. Update tests.
* DS-3054. Remove initialize.py.
* Reset unrelated formatting changes from this branch to match the main branch.
* Correct jira template.
* DS-2947. Update naming for attribution dltoken and dlsource.
* DS-2947. Update column names and tests and clarity for the initialization command.
* DS-2986. Create table with schema and metadata in command initialize.
* Document what is the result expected from each subquery.
* Documentation update.
* DS-3145. Include user agent and the source ping. Add query documentation.
* DS-3145. Update tests.
* DS-3146 Update logic to get attributes only from the ping that reports the first_seen_date, include locale, update the source for app_build_id and collect second_seen_date only from main ping.
* DS-3146 Update logic to get attributes only from the ping that reports the first_seen_date, include locale, update the source for app_build_id and collect second_seen_date only from main ping.
* DS-3054. Updates and save initialization.
* DS-3054. Table name required for DAG generation.
* DS-2947_implement_bigquery_changes_in_another_PR.
* DS-2947 Naming
* Add clients_first_seen_v2 to skip dry-run.
---------
Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
* updated fxa nonprod/staging queries to be in line with what production queries look like
* Apply suggestions from code review provided by srose
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* tweaks made as suggested by srose in PR#4297
---------
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* introducing fxa_log_device_command_events_v2 to pull relevant logs from GCP log table
* updated the bqetl_fxa_events DAG
* correcting the source table
* added fxa_log_device_command_events_v2 to dry run skip list due to the source table permissions issue and added date filters to incidcate tiemframes for which events are included in both tables
* Update sql/moz-fx-data-shared-prod/firefox_accounts_derived/fxa_log_device_command_events_v2/query.sql
Co-authored-by: akkomar <akkomar@users.noreply.github.com>
* made changes as suggested by srose in PR#4308
---------
Co-authored-by: akkomar <akkomar@users.noreply.github.com>