Граф коммитов

39 Коммитов

Автор SHA1 Сообщение Дата
Anna Scholtz 381107db71
Revert #6418 (#6429) 2024-10-31 15:41:23 +00:00
Anna Scholtz 2add865249
Speed up schema updates (#6418)
* Parallelize dependency graph

* Use GCP API to get table schema when not using cloud function

* Reuse GCP credentials

* Update dependency tests

* Remove print
2024-10-30 14:52:05 +00:00
Ben Wu 66443eee29
Bug 1920544 Create view to union firefox desktop crashes (#6257) 2024-09-27 17:48:44 +00:00
Anna Scholtz 0dda2651a4
Fall back to running query to get view schema (#6035) 2024-08-08 16:32:28 -07:00
Anna Scholtz 7024b6976e
Pass ID token to dryrun instances to speed things up (#6019)
* Pass ID token to dryrun instances to speed things up

* Parallelize metadata and dependency generation

* Use table schema from dryrun function
2024-08-08 12:38:43 -07:00
Ben Wu de40bdfe93
Use updated table object in view publish (#5973) 2024-07-25 11:56:45 -04:00
Sean Rose 4b8574f3bd
Fix bugs in `derived_view_schemas` SQL generator (#5592)
* Limit `derived_view_schemas` SQL generator to actual view directories.

* Fix the `derived_view_schemas` SQL generator to get the view schemas by dry-running their latest SQL and/or from their latest `schema.yaml` file.  Getting the schema from the currently deployed view wasn't appropriate because it wouldn't reflect the latest view code.

* Rename `View.view_schema` to `View.schema`.

* Change `View` so its schema dry-runs use the cloud function (CI doesn't have permission to run dry-run queries directly).

* Apply partition column filters in view dry-run queries when possible for speed/efficiency.

* Don't allow missing fields to prevent view schema enrichment.

* Only copy column descriptions during view schema enrichment.

* Only try enriching view schemas from their reference table `schema.yaml` files if those files actually exist.

* Change `main_1pct` view to select directly from the `main_remainder_1pct_v1` table, so the `derived_view_schemas` SQL generator can detect the partition column to use and successfully dry-run the view to determine its schema.

* Formalize the order `bqetl generate all` runs the SQL generators in.

* Have `bqetl generate all` run `derived_view_schemas` last, in case other SQL generators create derived views.

* Fix `Schema._traverse()` to only recurse if both fields are records.
2024-06-07 19:05:36 -07:00
Sean Rose 5ee7b8cdc7
Pass the query's project to the dryrun cloud function (#4904)
So we don't have to rely on the dryrun cloud function using the `moz-fx-data-shared-prod` project by default.
2024-03-07 15:37:17 -08:00
Lucia 84ee88e2b9
Dependabot/pip/black 24.1.1 fix (#5027)
* Bump black from 23.10.1 to 24.1.1

Bumps [black](https://github.com/psf/black) from 23.10.1 to 24.1.1.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/23.10.1...24.1.1)

---
updated-dependencies:
- dependency-name: black
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* Reformat files with black to fix dependabot update.

* Reformat with black 24.1.1. Update test dag with required space.

* Update test dags.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-19 15:27:34 +01:00
Sean Rose 20da21c16b
Ignore stable table schemas without `bq_dataset_family` and `bq_table` metadata. (#5037)
The `glean/glean` schema now has `mozPipelineMetadata`, so we need to be more specific to continue excluding it.
2024-02-14 11:10:06 +01:00
Alexander a5c6c91bb1
Add BigQuery schema conversion util (#5034)
* Add BigQuery schema conversion util

* Update bigquery_etl/schema/__init__.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2024-02-13 16:46:46 -05:00
Anna Scholtz b0a1a32246
Speed up generate-sql (#4921)
* Speed up glean_usage generator

* Refactoring
2024-01-31 12:08:29 -08:00
Anna Scholtz 03357769cc
Move view, schema and remaining configs to bqetl_project.yaml (#4051)
* Move view configs to bqetl_project.yaml

* Move schema config to bqetl_project.yaml

* Move docs config to bqetl_project.yaml

* Replace remaining configs
2023-07-11 13:10:57 -07:00
kik-kik 9b5c04a7bb
bug(1741487): Rename url2 and related fields in stable views (#4029)
* Bug 1741487 - Rename url2 and related fields in stable views

This removes the following unpopulated fields from Glean views: `metrics.url`, `metrics.text`, `metrics.jwe`, and `metrics.labeled_rate`. If any of these metrics exist in the source table under `2`-suffixed name, it is also aliased to its original name (`url2` to `url` and so on).
Suffixed fields are still preserved until view consumers migrate.

* Remove redundant comma from generated sql

* Ignore missing fields in views if any of them were removed

* added a todo comment

* Added additional context around why we are excluding some of the non-suffixed fields and why alising to remove suffix 2 from some fields

---------

Co-authored-by: Arkadiusz Komarzewski <akomarzewski@mozilla.com>
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-07-10 09:31:15 -07:00
Sean Rose faf4dc8269
Dryrun date param fixes (#3942)
* Always rewrite dryrun date query params as `submission_date`.

* Quote date partition column in dryrun to get schema.
2023-06-14 21:31:18 +00:00
Sean Rose b7b7c23913
Preserve the order of column schema properties in `schema.yaml` files. (#3923)
When using `bqetl query schema update` to create a new `schema.yaml` file, BigQuery returns the column schema properties in a sensible order (`name`, `type`, `mode`, `fields`), but our `schema.yaml` output has been sorting those properties alphabetically which makes it much less readable.

Also, when using `bqetl query schema update` to update an existing `schema.yaml` file, this will now preserve whatever order the column schema properties were in.
2023-06-09 16:15:40 +00:00
Anna Scholtz 30d7507f02
[Bug 1712332] Generate UNIONed app ping views (#3545)
* Generate UNIONed app ping views

* Address review feedback
2023-02-01 14:00:06 -08:00
Anna Scholtz cd48f7c09c
Pass --use_cloud_function to get_schema_from_table (#3442) 2022-12-08 16:45:30 -08:00
Anna Scholtz 6f795228e5 bqetl update all query schemas and deploy new tables 2022-12-06 17:01:35 -08:00
Sean Rose 32ab51ba83
Only merge descriptions from Glean stable table schemas to views (#3391) 2022-12-01 09:50:01 -08:00
Sean Rose 65224e4399
Reduce logging to speed up Netlify CI builds (bug 1761292) (#2920)
* Comment out print to reduce Netlify build logging by 84%.
* Omit Maven transfer progress from Netlify build logs.
* Omit wget transfer progress from Netlify build logs.
2022-04-28 16:12:16 -07:00
Anna Scholtz cf966d2280 Move stable view generation into separate module 2022-01-05 12:26:52 -08:00
Daniel Thorn 233f0c24ab
Add dryrun options to more bqetl commands (#2121) 2021-06-14 12:52:21 -07:00
Anna Scholtz 03d4b1d39f Handle exceptions and float type when adding descriptions to stable views 2021-06-14 10:38:58 -07:00
Anna Scholtz 8809da61d7 Add field descriptions to stable views 2021-06-14 10:38:58 -07:00
Daniel Thorn 063cf808a0
Fix schema validation for CREATE TABLE statements (#2074) 2021-05-25 14:47:53 -07:00
Daniel Thorn 3c8894fdf1
Make schema validation part of dryrun (#2069) 2021-05-25 14:53:09 -04:00
Anna Scholtz 14175f5426 Update all downstream dependencies 2021-05-19 12:51:11 -07:00
Anna Scholtz 8f9e9d5286 Update derived_from schema 2021-05-19 12:51:11 -07:00
Anna Scholtz 8d9d42576d Ensure correct directory structure for temporary SQL query files 2021-05-07 12:30:04 -07:00
Anna Scholtz f18d19fd5f Update workflow docs 2021-05-07 10:30:02 -07:00
Anna Scholtz c9f3d79b26 Update downstream dependency schemas 2021-05-07 10:30:02 -07:00
Anna Scholtz 74a759c942 Remove submission_date field from experiment_search_aggregates_v1 2021-03-10 12:41:13 -08:00
Daniel Thorn a190e18264
Automatically sort python imports (#1840) 2021-02-24 17:11:52 -05:00
Anna Scholtz de644c906e Refactor schema file handling 2021-02-19 09:34:15 -08:00
Anna Scholtz 47dbfe545d Update query schema CLI 2021-02-19 09:34:15 -08:00
Anna Scholtz b46034cd97 Add schema tests 2021-02-19 09:34:15 -08:00
Anna Scholtz a79cf7a4e2 Schema abstraction 2021-02-19 09:34:15 -08:00
Anna Scholtz 1605dd0368 Preliminary query schema functionality 2021-02-19 09:34:15 -08:00