Add docs for glean_usage

2022-10-24 14:08:39 -07:00 · 2022-10-24 14:08:39 -07:00 · adaa9ab52e
--- a/sql_generators/glean_usage/README.md
+++ b/sql_generators/glean_usage/README.md
@ -0,0 +1,60 @@
+# Glean Usage
+
+This generator generates the following queries for Glean applications:
+* `baseline_clients_daily`: A daily aggregate of baseline pings per `client_id`
+* `baseline_clients_first_seen`: Captures the earliest server date that we observe a particular client in the
+  baseline table.
+* `baseline_clients_last_seen`: Captures activity history of each client in 28-day windows for each submission date based on baseline pings.
+* `clients_last_seen_joined`: Joins baseline and metrics views
+* `events_unnested`: A view of unnested events
+* `metrics_clients_daily`: Daily per-client aggregates on top of metrics pings
+* `metrics_clients_last_seen`: Window over the previous 28 days of the clients metrics daily table
+* App-specific views for Glean pings: a pointer to the main view to the stable ping table for the release channel of each Glean application
+
+Depending on the specific query, queries get generated for per-`app_id` datasets and/or per app.
+
+For example, for datasets related to Fenix this means that for each `app_id` (=`org_mozilla_firefox`, `org_mozilla_fenix_nightly`, `org_mozilla_fennec_aurora`, `org_mozilla_firefox_beta`, `org_mozilla_fenix`) queries are generated writing their results to tables in the associated dataset. Additionally queries will write results to the app dataset `fenix` which will essentially `UNION` the results of the per-`app_id` datasets.
+
+## Adding Glean Apps
+
+In order for queries to get generated, a Glean app needs to be added manually to the [`ALLOWED_APPS` list](https://github.com/mozilla/bigquery-etl/blob/2a2a14d9e1e7444034c93706a464346f29eaae30/sql_generators/glean_usage/__init__.py#L42).
+Queries for new Glean apps need to be generated and deployed manually:
+
+```
+> ./bqetl glean_usage generate
+> ./bqetl query schema deploy <glean-app-dataset-name>_derived.*
+```
+
+## Adding Queries
+
+Each query is generated by adding a corresponding class that is derived from  [`GleanTable`](https://github.com/mozilla/bigquery-etl/blob/main/sql_generators/glean_usage/common.py#L137). For each of these classes a separate Python file is created inside this directory. The Python file and class are named after the query that they generate.
+
+The `GleanTable` class has a few parameters and methods that can be overridden inside the derived classes to customize the generation.
+
+The parameters that are available can and in some cases _need_ to be set in the `__init__` method of the new class definitions:
+* [required] `target_table_id`: name of the target table results are written to by the query
+* [required] `first_seen`: the general prefix of the query to get related derived tables and views
+* `no_init`: default = `False`; If set to `True` the generator will not create a `init.sql` file for the query. Otherwise a `init.sql` will be generated and a template needs to be provided. The init templates will need to be named like `<target_table_id>.init.sql`.
+* `per_app_id_enabled`: default = `True`; If set to `True` the query will be generated for each `app_id`-dataset.
+* `per_app_enabled`: default = `True`; If set to `True` the query will be generated for `app`-datasets
+* `cross_channel_template`: default = `"cross_channel.view.sql"`; File name of the template used to join data from different channels of the same app. Used when generated per-app queries.
+
+Each query depends on a couple of templates that need to be added to the `templates/` directory:
+* `<target_table_id>.query.sql`: Template of the generated query
+* `<target_table_id>.metadata.yaml`: Template for the metadata that gets added alongside the generated query
+* `<target_table_id>.view.sql`: Template for the user-facing view to expose the data written by the generated query
+* `dataset_metadata.yaml` and `derived_dataset_metadata.yaml`: Template for the `dataset_metadata.yaml`, reused across all queries
+
+Additional, query-specific templates or config files used during the generation process can also be added to the `templates/` directory. 
+
+The `GleanTable` class calls two methods that can be overridden by the query-specific classes:
+* `generate_per_app_id(self, project_id, baseline_table, output_dir=None)`: This method is for generating the per-`app_id` queries
+* `generate_per_app(self, project_id, app_info, output_dir=None)`: This method is for generating the per-app queries
+
+
+## Customizing the Generated Queries
+
+In some cases it is necessary to use a custom, manually written query instead of a generated one. For example, if the query logic is different for a single `app_id`.
+
+For these cases, a query can be added in the `sql/` directory. Queries that have been added there and are named like the generated queries will not be overwritten.
+