bigquery-etl/sql_generators/glean_usage
Ben Wu 59e49ea36c
Remove hardcoded dataset in baseline clients last seen check (#6514)
* Remove hardcoded dataset in baseline clients last seen check

* Remove extra .

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2024-11-18 20:38:11 +00:00
..
templates Remove hardcoded dataset in baseline clients last seen check (#6514) 2024-11-18 20:38:11 +00:00
README.md Remove telemetry_derived init.sql files (#5342) 2024-04-10 15:36:30 -07:00
__init__.py Pass ID token to dryrun instances to speed things up (#6019) 2024-08-08 12:38:43 -07:00
baseline_clients_daily.py Remove telemetry_derived init.sql files (#5342) 2024-04-10 15:36:30 -07:00
baseline_clients_first_seen.py Pass ID token to dryrun instances to speed things up (#6019) 2024-08-08 12:38:43 -07:00
baseline_clients_last_seen.py Remove telemetry_derived init.sql files (#5342) 2024-04-10 15:36:30 -07:00
clients_last_seen_joined.py Remove telemetry_derived init.sql files (#5342) 2024-04-10 15:36:30 -07:00
common.py Bug 1905938 Support events with no metrics in glean_usage generator (#6358) 2024-10-16 15:30:28 -07:00
event_error_monitoring.py Bug 1905938 Support events with no metrics in glean_usage generator (#6358) 2024-10-16 15:30:28 -07:00
event_flow_monitoring.py Remove telemetry_derived init.sql files (#5342) 2024-04-10 15:36:30 -07:00
event_monitoring_live.py Fix skipping of apps in glean_usage (#6191) 2024-09-12 15:29:35 -07:00
events_stream.py Bug 1905938 Support events with no metrics in glean_usage generator (#6358) 2024-10-16 15:30:28 -07:00
events_unnested.py Bug 1905938 Support events with no metrics in glean_usage generator (#6358) 2024-10-16 15:30:28 -07:00
glean_app_ping_views.py Bug 1920544 Create view to union firefox desktop crashes (#6257) 2024-09-27 17:48:44 +00:00
metrics_clients_daily.py Remove telemetry_derived init.sql files (#5342) 2024-04-10 15:36:30 -07:00
metrics_clients_last_seen.py Remove telemetry_derived init.sql files (#5342) 2024-04-10 15:36:30 -07:00

README.md

Glean Usage

This generator generates the following queries for Glean applications:

  • baseline_clients_daily: A daily aggregate of baseline pings per client_id
  • baseline_clients_first_seen: Captures the earliest server date that we observe a particular client in the baseline table.
  • baseline_clients_last_seen: Captures activity history of each client in 28-day windows for each submission date based on baseline pings.
  • clients_last_seen_joined: Joins baseline and metrics views
  • events_unnested: A view of unnested events
  • metrics_clients_daily: Daily per-client aggregates on top of metrics pings
  • metrics_clients_last_seen: Window over the previous 28 days of the clients metrics daily table
  • App-specific views for Glean pings: a pointer to the main view to the stable ping table for the release channel of each Glean application

Depending on the specific query, queries get generated for per-app_id datasets and/or per app.

For example, for datasets related to Fenix this means that for each app_id (=org_mozilla_firefox, org_mozilla_fenix_nightly, org_mozilla_fennec_aurora, org_mozilla_firefox_beta, org_mozilla_fenix) queries are generated writing their results to tables in the associated dataset. Additionally queries will write results to the app dataset fenix which will essentially UNION the results of the per-app_id datasets.

Tables for new Glean apps are generated automatically during nightly table deployment runs.

Adding Queries

Each query is generated by adding a corresponding class that is derived from GleanTable. For each of these classes a separate Python file is created inside this directory. The Python file and class are named after the query that they generate.

The GleanTable class has a few parameters and methods that can be overridden inside the derived classes to customize the generation.

The parameters that are available can and in some cases need to be set in the __init__ method of the new class definitions:

  • [required] target_table_id: name of the target table results are written to by the query
  • [required] first_seen: the general prefix of the query to get related derived tables and views
  • per_app_id_enabled: default = True; If set to True the query will be generated for each app_id-dataset.
  • per_app_enabled: default = True; If set to True the query will be generated for app-datasets
  • cross_channel_template: default = "cross_channel.view.sql"; File name of the template used to join data from different channels of the same app. Used when generated per-app queries.

Each query depends on a couple of templates that need to be added to the templates/ directory:

  • <target_table_id>.query.sql: Template of the generated query
  • <target_table_id>.metadata.yaml: Template for the metadata that gets added alongside the generated query
  • <target_table_id>.view.sql: Template for the user-facing view to expose the data written by the generated query
  • dataset_metadata.yaml and derived_dataset_metadata.yaml: Template for the dataset_metadata.yaml, reused across all queries

Additional, query-specific templates or config files used during the generation process can also be added to the templates/ directory.

The GleanTable class calls two methods that can be overridden by the query-specific classes:

  • generate_per_app_id(self, project_id, baseline_table, output_dir=None): This method is for generating the per-app_id queries
  • generate_per_app(self, project_id, app_info, output_dir=None): This method is for generating the per-app queries

Customizing the Generated Queries

In some cases it is necessary to use a custom, manually written query instead of a generated one. For example, if the query logic is different for a single app_id.

For these cases, a query can be added in the sql/ directory. Queries that have been added there and are named like the generated queries will not be overwritten.