Add docs for glean_usage
This commit is contained in:
Родитель
5e361f0b96
Коммит
adaa9ab52e
|
@ -0,0 +1,60 @@
|
|||
# Glean Usage
|
||||
|
||||
This generator generates the following queries for Glean applications:
|
||||
* `baseline_clients_daily`: A daily aggregate of baseline pings per `client_id`
|
||||
* `baseline_clients_first_seen`: Captures the earliest server date that we observe a particular client in the
|
||||
baseline table.
|
||||
* `baseline_clients_last_seen`: Captures activity history of each client in 28-day windows for each submission date based on baseline pings.
|
||||
* `clients_last_seen_joined`: Joins baseline and metrics views
|
||||
* `events_unnested`: A view of unnested events
|
||||
* `metrics_clients_daily`: Daily per-client aggregates on top of metrics pings
|
||||
* `metrics_clients_last_seen`: Window over the previous 28 days of the clients metrics daily table
|
||||
* App-specific views for Glean pings: a pointer to the main view to the stable ping table for the release channel of each Glean application
|
||||
|
||||
Depending on the specific query, queries get generated for per-`app_id` datasets and/or per app.
|
||||
|
||||
For example, for datasets related to Fenix this means that for each `app_id` (=`org_mozilla_firefox`, `org_mozilla_fenix_nightly`, `org_mozilla_fennec_aurora`, `org_mozilla_firefox_beta`, `org_mozilla_fenix`) queries are generated writing their results to tables in the associated dataset. Additionally queries will write results to the app dataset `fenix` which will essentially `UNION` the results of the per-`app_id` datasets.
|
||||
|
||||
## Adding Glean Apps
|
||||
|
||||
In order for queries to get generated, a Glean app needs to be added manually to the [`ALLOWED_APPS` list](https://github.com/mozilla/bigquery-etl/blob/2a2a14d9e1e7444034c93706a464346f29eaae30/sql_generators/glean_usage/__init__.py#L42).
|
||||
Queries for new Glean apps need to be generated and deployed manually:
|
||||
|
||||
```
|
||||
> ./bqetl glean_usage generate
|
||||
> ./bqetl query schema deploy <glean-app-dataset-name>_derived.*
|
||||
```
|
||||
|
||||
## Adding Queries
|
||||
|
||||
Each query is generated by adding a corresponding class that is derived from [`GleanTable`](https://github.com/mozilla/bigquery-etl/blob/main/sql_generators/glean_usage/common.py#L137). For each of these classes a separate Python file is created inside this directory. The Python file and class are named after the query that they generate.
|
||||
|
||||
The `GleanTable` class has a few parameters and methods that can be overridden inside the derived classes to customize the generation.
|
||||
|
||||
The parameters that are available can and in some cases _need_ to be set in the `__init__` method of the new class definitions:
|
||||
* [required] `target_table_id`: name of the target table results are written to by the query
|
||||
* [required] `first_seen`: the general prefix of the query to get related derived tables and views
|
||||
* `no_init`: default = `False`; If set to `True` the generator will not create a `init.sql` file for the query. Otherwise a `init.sql` will be generated and a template needs to be provided. The init templates will need to be named like `<target_table_id>.init.sql`.
|
||||
* `per_app_id_enabled`: default = `True`; If set to `True` the query will be generated for each `app_id`-dataset.
|
||||
* `per_app_enabled`: default = `True`; If set to `True` the query will be generated for `app`-datasets
|
||||
* `cross_channel_template`: default = `"cross_channel.view.sql"`; File name of the template used to join data from different channels of the same app. Used when generated per-app queries.
|
||||
|
||||
Each query depends on a couple of templates that need to be added to the `templates/` directory:
|
||||
* `<target_table_id>.query.sql`: Template of the generated query
|
||||
* `<target_table_id>.metadata.yaml`: Template for the metadata that gets added alongside the generated query
|
||||
* `<target_table_id>.view.sql`: Template for the user-facing view to expose the data written by the generated query
|
||||
* `dataset_metadata.yaml` and `derived_dataset_metadata.yaml`: Template for the `dataset_metadata.yaml`, reused across all queries
|
||||
|
||||
Additional, query-specific templates or config files used during the generation process can also be added to the `templates/` directory.
|
||||
|
||||
The `GleanTable` class calls two methods that can be overridden by the query-specific classes:
|
||||
* `generate_per_app_id(self, project_id, baseline_table, output_dir=None)`: This method is for generating the per-`app_id` queries
|
||||
* `generate_per_app(self, project_id, app_info, output_dir=None)`: This method is for generating the per-app queries
|
||||
|
||||
|
||||
## Customizing the Generated Queries
|
||||
|
||||
In some cases it is necessary to use a custom, manually written query instead of a generated one. For example, if the query logic is different for a single `app_id`.
|
||||
|
||||
For these cases, a query can be added in the `sql/` directory. Queries that have been added there and are named like the generated queries will not be overwritten.
|
||||
|
Загрузка…
Ссылка в новой задаче