Docs for new `bqetl_project.yaml` (#4018)

* Add ConfigLoader and move dry run skip to bqetl_project.yaml

* Update bqetl configuration docs
This commit is contained in:
Anna Scholtz 2023-07-10 09:50:22 -07:00 коммит произвёл GitHub
Родитель 9b5c04a7bb
Коммит d5a6dc97f4
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
2 изменённых файлов: 71 добавлений и 0 удалений

Просмотреть файл

@ -8,6 +8,10 @@ Running some commands, for example to create or query tables, will [require Mozi
Follow the [Quick Start](https://github.com/mozilla/bigquery-etl#quick-start) to set up bigquery-etl and the bqetl CLI.
## Configuration
`bqetl` can be configured via the `bqetl_project.yaml` file. See [Configuration](https://mozilla.github.io/bigquery-etl/reference/configuration/) to find available configuration options.
## Commands
To list all available commands in the bqetl CLI:

Просмотреть файл

@ -0,0 +1,67 @@
# Configuration
The behaviour of `bqetl` can be configured via the `bqetl_project.yaml` file. This file, for example, specifies the queries that should be skipped during dryrun, views that should not be published and contains various other configurations.
The general structure of `bqetl_project.yaml` is as follows:
```yaml
dry_run:
function: https://us-central1-moz-fx-data-shared-prod.cloudfunctions.net/bigquery-etl-dryrun
test_project: bigquery-etl-integration-test
skip:
- sql/moz-fx-data-shared-prod/account_ecosystem_derived/desktop_clients_daily_v1/query.sql
- sql/**/apple_ads_external*/**/query.sql
# - ...
views:
skip_validation:
- sql/moz-fx-data-test-project/test/simple_view/view.sql
- sql/moz-fx-data-shared-prod/mlhackweek_search/events/view.sql
- sql/moz-fx-data-shared-prod/**/client_deduplication/view.sql
# - ...
skip_publishing:
- activity_stream/tile_id_types/view.sql
- pocket/pocket_reach_mau/view.sql
# - ...
non_user_facing_suffixes:
- _derived
- _external
# - ...
schema:
skip_update:
- sql/moz-fx-data-shared-prod/mozilla_vpn_derived/users_v1/schema.yaml
# - ...
skip_prefixes:
- pioneer
- rally
routines:
skip_publishing:
- sql/moz-fx-data-shared-prod/udf/main_summary_scalars/udf.sql
formatting:
skip:
- bigquery_etl/glam/templates/*.sql
- sql/moz-fx-data-shared-prod/telemetry/fenix_events_v1/view.sql
- stored_procedures/safe_crc32_uuid.sql
# - ...
```
## Accessing configurations
`ConfigLoader` can be used in the bigquery_etl tooling codebase to access configuration parameters. `bqetl_project.yaml` is automatically loaded in `ConfigLoader` and parameters can be accessed via a `get()` method:
```python
from bigquery_etl.config import ConfigLoader
skipped_formatting = cfg.get("formatting", "skip", fallback=[])
dry_run_function = cfg.get("dry_run", "function", fallback=None)
schema_config_dict = cfg.get("schema")
```
The `ConfigLoader.get()` method allows multiple string parameters to reference a configuration value that is stored in a nested structure. A `fallback` value can be optionally provided in case the configuration parameter is not set.
## Adding configuration parameters
New configuration parameters can simply be added to `bqetl_project.yaml`. `ConfigLoader.get()` allows for these new parameters simply to be referenced without needing to be changed or updated.