This commit is contained in:
Anna Scholtz 2022-10-24 14:08:39 -07:00
Родитель 5e361f0b96
Коммит adaa9ab52e
1 изменённых файлов: 60 добавлений и 0 удалений

Просмотреть файл

@ -0,0 +1,60 @@
# Glean Usage
This generator generates the following queries for Glean applications:
* `baseline_clients_daily`: A daily aggregate of baseline pings per `client_id`
* `baseline_clients_first_seen`: Captures the earliest server date that we observe a particular client in the
baseline table.
* `baseline_clients_last_seen`: Captures activity history of each client in 28-day windows for each submission date based on baseline pings.
* `clients_last_seen_joined`: Joins baseline and metrics views
* `events_unnested`: A view of unnested events
* `metrics_clients_daily`: Daily per-client aggregates on top of metrics pings
* `metrics_clients_last_seen`: Window over the previous 28 days of the clients metrics daily table
* App-specific views for Glean pings: a pointer to the main view to the stable ping table for the release channel of each Glean application
Depending on the specific query, queries get generated for per-`app_id` datasets and/or per app.
For example, for datasets related to Fenix this means that for each `app_id` (=`org_mozilla_firefox`, `org_mozilla_fenix_nightly`, `org_mozilla_fennec_aurora`, `org_mozilla_firefox_beta`, `org_mozilla_fenix`) queries are generated writing their results to tables in the associated dataset. Additionally queries will write results to the app dataset `fenix` which will essentially `UNION` the results of the per-`app_id` datasets.
## Adding Glean Apps
In order for queries to get generated, a Glean app needs to be added manually to the [`ALLOWED_APPS` list](https://github.com/mozilla/bigquery-etl/blob/2a2a14d9e1e7444034c93706a464346f29eaae30/sql_generators/glean_usage/__init__.py#L42).
Queries for new Glean apps need to be generated and deployed manually:
```
> ./bqetl glean_usage generate
> ./bqetl query schema deploy <glean-app-dataset-name>_derived.*
```
## Adding Queries
Each query is generated by adding a corresponding class that is derived from [`GleanTable`](https://github.com/mozilla/bigquery-etl/blob/main/sql_generators/glean_usage/common.py#L137). For each of these classes a separate Python file is created inside this directory. The Python file and class are named after the query that they generate.
The `GleanTable` class has a few parameters and methods that can be overridden inside the derived classes to customize the generation.
The parameters that are available can and in some cases _need_ to be set in the `__init__` method of the new class definitions:
* [required] `target_table_id`: name of the target table results are written to by the query
* [required] `first_seen`: the general prefix of the query to get related derived tables and views
* `no_init`: default = `False`; If set to `True` the generator will not create a `init.sql` file for the query. Otherwise a `init.sql` will be generated and a template needs to be provided. The init templates will need to be named like `<target_table_id>.init.sql`.
* `per_app_id_enabled`: default = `True`; If set to `True` the query will be generated for each `app_id`-dataset.
* `per_app_enabled`: default = `True`; If set to `True` the query will be generated for `app`-datasets
* `cross_channel_template`: default = `"cross_channel.view.sql"`; File name of the template used to join data from different channels of the same app. Used when generated per-app queries.
Each query depends on a couple of templates that need to be added to the `templates/` directory:
* `<target_table_id>.query.sql`: Template of the generated query
* `<target_table_id>.metadata.yaml`: Template for the metadata that gets added alongside the generated query
* `<target_table_id>.view.sql`: Template for the user-facing view to expose the data written by the generated query
* `dataset_metadata.yaml` and `derived_dataset_metadata.yaml`: Template for the `dataset_metadata.yaml`, reused across all queries
Additional, query-specific templates or config files used during the generation process can also be added to the `templates/` directory.
The `GleanTable` class calls two methods that can be overridden by the query-specific classes:
* `generate_per_app_id(self, project_id, baseline_table, output_dir=None)`: This method is for generating the per-`app_id` queries
* `generate_per_app(self, project_id, app_info, output_dir=None)`: This method is for generating the per-app queries
## Customizing the Generated Queries
In some cases it is necessary to use a custom, manually written query instead of a generated one. For example, if the query logic is different for a single `app_id`.
For these cases, a query can be added in the `sql/` directory. Queries that have been added there and are named like the generated queries will not be overwritten.