From adaa9ab52e7b9a02905e603d87587c2b3df8322d Mon Sep 17 00:00:00 2001 From: Anna Scholtz Date: Mon, 24 Oct 2022 14:08:39 -0700 Subject: [PATCH] Add docs for glean_usage --- sql_generators/glean_usage/README.md | 60 ++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) create mode 100644 sql_generators/glean_usage/README.md diff --git a/sql_generators/glean_usage/README.md b/sql_generators/glean_usage/README.md new file mode 100644 index 0000000000..d76aaeb433 --- /dev/null +++ b/sql_generators/glean_usage/README.md @@ -0,0 +1,60 @@ +# Glean Usage + +This generator generates the following queries for Glean applications: +* `baseline_clients_daily`: A daily aggregate of baseline pings per `client_id` +* `baseline_clients_first_seen`: Captures the earliest server date that we observe a particular client in the + baseline table. +* `baseline_clients_last_seen`: Captures activity history of each client in 28-day windows for each submission date based on baseline pings. +* `clients_last_seen_joined`: Joins baseline and metrics views +* `events_unnested`: A view of unnested events +* `metrics_clients_daily`: Daily per-client aggregates on top of metrics pings +* `metrics_clients_last_seen`: Window over the previous 28 days of the clients metrics daily table +* App-specific views for Glean pings: a pointer to the main view to the stable ping table for the release channel of each Glean application + +Depending on the specific query, queries get generated for per-`app_id` datasets and/or per app. + +For example, for datasets related to Fenix this means that for each `app_id` (=`org_mozilla_firefox`, `org_mozilla_fenix_nightly`, `org_mozilla_fennec_aurora`, `org_mozilla_firefox_beta`, `org_mozilla_fenix`) queries are generated writing their results to tables in the associated dataset. Additionally queries will write results to the app dataset `fenix` which will essentially `UNION` the results of the per-`app_id` datasets. + +## Adding Glean Apps + +In order for queries to get generated, a Glean app needs to be added manually to the [`ALLOWED_APPS` list](https://github.com/mozilla/bigquery-etl/blob/2a2a14d9e1e7444034c93706a464346f29eaae30/sql_generators/glean_usage/__init__.py#L42). +Queries for new Glean apps need to be generated and deployed manually: + +``` +> ./bqetl glean_usage generate +> ./bqetl query schema deploy _derived.* +``` + +## Adding Queries + +Each query is generated by adding a corresponding class that is derived from [`GleanTable`](https://github.com/mozilla/bigquery-etl/blob/main/sql_generators/glean_usage/common.py#L137). For each of these classes a separate Python file is created inside this directory. The Python file and class are named after the query that they generate. + +The `GleanTable` class has a few parameters and methods that can be overridden inside the derived classes to customize the generation. + +The parameters that are available can and in some cases _need_ to be set in the `__init__` method of the new class definitions: +* [required] `target_table_id`: name of the target table results are written to by the query +* [required] `first_seen`: the general prefix of the query to get related derived tables and views +* `no_init`: default = `False`; If set to `True` the generator will not create a `init.sql` file for the query. Otherwise a `init.sql` will be generated and a template needs to be provided. The init templates will need to be named like `.init.sql`. +* `per_app_id_enabled`: default = `True`; If set to `True` the query will be generated for each `app_id`-dataset. +* `per_app_enabled`: default = `True`; If set to `True` the query will be generated for `app`-datasets +* `cross_channel_template`: default = `"cross_channel.view.sql"`; File name of the template used to join data from different channels of the same app. Used when generated per-app queries. + +Each query depends on a couple of templates that need to be added to the `templates/` directory: +* `.query.sql`: Template of the generated query +* `.metadata.yaml`: Template for the metadata that gets added alongside the generated query +* `.view.sql`: Template for the user-facing view to expose the data written by the generated query +* `dataset_metadata.yaml` and `derived_dataset_metadata.yaml`: Template for the `dataset_metadata.yaml`, reused across all queries + +Additional, query-specific templates or config files used during the generation process can also be added to the `templates/` directory. + +The `GleanTable` class calls two methods that can be overridden by the query-specific classes: +* `generate_per_app_id(self, project_id, baseline_table, output_dir=None)`: This method is for generating the per-`app_id` queries +* `generate_per_app(self, project_id, app_info, output_dir=None)`: This method is for generating the per-app queries + + +## Customizing the Generated Queries + +In some cases it is necessary to use a custom, manually written query instead of a generated one. For example, if the query logic is different for a single `app_id`. + +For these cases, a query can be added in the `sql/` directory. Queries that have been added there and are named like the generated queries will not be overwritten. +