* Parallelize dependency graph
* Use GCP API to get table schema when not using cloud function
* Reuse GCP credentials
* Update dependency tests
* Remove print
* Limit `derived_view_schemas` SQL generator to actual view directories.
* Fix the `derived_view_schemas` SQL generator to get the view schemas by dry-running their latest SQL and/or from their latest `schema.yaml` file. Getting the schema from the currently deployed view wasn't appropriate because it wouldn't reflect the latest view code.
* Rename `View.view_schema` to `View.schema`.
* Change `View` so its schema dry-runs use the cloud function (CI doesn't have permission to run dry-run queries directly).
* Apply partition column filters in view dry-run queries when possible for speed/efficiency.
* Don't allow missing fields to prevent view schema enrichment.
* Only copy column descriptions during view schema enrichment.
* Only try enriching view schemas from their reference table `schema.yaml` files if those files actually exist.
* Change `main_1pct` view to select directly from the `main_remainder_1pct_v1` table, so the `derived_view_schemas` SQL generator can detect the partition column to use and successfully dry-run the view to determine its schema.
* Formalize the order `bqetl generate all` runs the SQL generators in.
* Have `bqetl generate all` run `derived_view_schemas` last, in case other SQL generators create derived views.
* Fix `Schema._traverse()` to only recurse if both fields are records.
* Bump black from 23.10.1 to 24.1.1
Bumps [black](https://github.com/psf/black) from 23.10.1 to 24.1.1.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/23.10.1...24.1.1)
---
updated-dependencies:
- dependency-name: black
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
* Reformat files with black to fix dependabot update.
* Reformat with black 24.1.1. Update test dag with required space.
* Update test dags.
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add BigQuery schema conversion util
* Update bigquery_etl/schema/__init__.py
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
---------
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* Bug 1741487 - Rename url2 and related fields in stable views
This removes the following unpopulated fields from Glean views: `metrics.url`, `metrics.text`, `metrics.jwe`, and `metrics.labeled_rate`. If any of these metrics exist in the source table under `2`-suffixed name, it is also aliased to its original name (`url2` to `url` and so on).
Suffixed fields are still preserved until view consumers migrate.
* Remove redundant comma from generated sql
* Ignore missing fields in views if any of them were removed
* added a todo comment
* Added additional context around why we are excluding some of the non-suffixed fields and why alising to remove suffix 2 from some fields
---------
Co-authored-by: Arkadiusz Komarzewski <akomarzewski@mozilla.com>
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
When using `bqetl query schema update` to create a new `schema.yaml` file, BigQuery returns the column schema properties in a sensible order (`name`, `type`, `mode`, `fields`), but our `schema.yaml` output has been sorting those properties alphabetically which makes it much less readable.
Also, when using `bqetl query schema update` to update an existing `schema.yaml` file, this will now preserve whatever order the column schema properties were in.
* Comment out print to reduce Netlify build logging by 84%.
* Omit Maven transfer progress from Netlify build logs.
* Omit wget transfer progress from Netlify build logs.