bigquery-etl

Граф коммитов

Автор	SHA1	Сообщение	Дата
Anna Scholtz	381107db71	Revert #6418 (#6429 )	2024-10-31 15:41:23 +00:00
Anna Scholtz	2add865249	Speed up schema updates (#6418 ) * Parallelize dependency graph * Use GCP API to get table schema when not using cloud function * Reuse GCP credentials * Update dependency tests * Remove print	2024-10-30 14:52:05 +00:00
Ben Wu	66443eee29	Bug 1920544 Create view to union firefox desktop crashes (#6257 )	2024-09-27 17:48:44 +00:00
Anna Scholtz	0dda2651a4	Fall back to running query to get view schema (#6035 )	2024-08-08 16:32:28 -07:00
Anna Scholtz	7024b6976e	Pass ID token to dryrun instances to speed things up (#6019 ) * Pass ID token to dryrun instances to speed things up * Parallelize metadata and dependency generation * Use table schema from dryrun function	2024-08-08 12:38:43 -07:00
Ben Wu	de40bdfe93	Use updated table object in view publish (#5973 )	2024-07-25 11:56:45 -04:00
Sean Rose	4b8574f3bd	Fix bugs in `derived_view_schemas` SQL generator (#5592 ) * Limit `derived_view_schemas` SQL generator to actual view directories. * Fix the `derived_view_schemas` SQL generator to get the view schemas by dry-running their latest SQL and/or from their latest `schema.yaml` file. Getting the schema from the currently deployed view wasn't appropriate because it wouldn't reflect the latest view code. * Rename `View.view_schema` to `View.schema`. * Change `View` so its schema dry-runs use the cloud function (CI doesn't have permission to run dry-run queries directly). * Apply partition column filters in view dry-run queries when possible for speed/efficiency. * Don't allow missing fields to prevent view schema enrichment. * Only copy column descriptions during view schema enrichment. * Only try enriching view schemas from their reference table `schema.yaml` files if those files actually exist. * Change `main_1pct` view to select directly from the `main_remainder_1pct_v1` table, so the `derived_view_schemas` SQL generator can detect the partition column to use and successfully dry-run the view to determine its schema. * Formalize the order `bqetl generate all` runs the SQL generators in. * Have `bqetl generate all` run `derived_view_schemas` last, in case other SQL generators create derived views. * Fix `Schema._traverse()` to only recurse if both fields are records.	2024-06-07 19:05:36 -07:00
Sean Rose	5ee7b8cdc7	Pass the query's project to the dryrun cloud function (#4904 ) So we don't have to rely on the dryrun cloud function using the `moz-fx-data-shared-prod` project by default.	2024-03-07 15:37:17 -08:00
Lucia	84ee88e2b9	Dependabot/pip/black 24.1.1 fix (#5027 ) * Bump black from 23.10.1 to 24.1.1 Bumps [black](https://github.com/psf/black) from 23.10.1 to 24.1.1. - [Release notes](https://github.com/psf/black/releases) - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) - [Commits](https://github.com/psf/black/compare/23.10.1...24.1.1) --- updated-dependencies: - dependency-name: black dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * Reformat files with black to fix dependabot update. * Reformat with black 24.1.1. Update test dag with required space. * Update test dags. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-02-19 15:27:34 +01:00
Sean Rose	20da21c16b	Ignore stable table schemas without `bq_dataset_family` and `bq_table` metadata. (#5037 ) The `glean/glean` schema now has `mozPipelineMetadata`, so we need to be more specific to continue excluding it.	2024-02-14 11:10:06 +01:00
Alexander	a5c6c91bb1	Add BigQuery schema conversion util (#5034 ) * Add BigQuery schema conversion util * Update bigquery_etl/schema/__init__.py Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> --------- Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>	2024-02-13 16:46:46 -05:00
Anna Scholtz	b0a1a32246	Speed up generate-sql (#4921 ) * Speed up glean_usage generator * Refactoring	2024-01-31 12:08:29 -08:00
Anna Scholtz	03357769cc	Move view, schema and remaining configs to bqetl_project.yaml (#4051 ) * Move view configs to bqetl_project.yaml * Move schema config to bqetl_project.yaml * Move docs config to bqetl_project.yaml * Replace remaining configs	2023-07-11 13:10:57 -07:00
kik-kik	9b5c04a7bb	bug(1741487): Rename url2 and related fields in stable views (#4029 ) * Bug 1741487 - Rename url2 and related fields in stable views This removes the following unpopulated fields from Glean views: `metrics.url`, `metrics.text`, `metrics.jwe`, and `metrics.labeled_rate`. If any of these metrics exist in the source table under `2`-suffixed name, it is also aliased to its original name (`url2` to `url` and so on). Suffixed fields are still preserved until view consumers migrate. * Remove redundant comma from generated sql * Ignore missing fields in views if any of them were removed * added a todo comment * Added additional context around why we are excluding some of the non-suffixed fields and why alising to remove suffix 2 from some fields --------- Co-authored-by: Arkadiusz Komarzewski <akomarzewski@mozilla.com> Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2023-07-10 09:31:15 -07:00
Sean Rose	faf4dc8269	Dryrun date param fixes (#3942 ) * Always rewrite dryrun date query params as `submission_date`. * Quote date partition column in dryrun to get schema.	2023-06-14 21:31:18 +00:00
Sean Rose	b7b7c23913	Preserve the order of column schema properties in `schema.yaml` files. (#3923 ) When using `bqetl query schema update` to create a new `schema.yaml` file, BigQuery returns the column schema properties in a sensible order (`name`, `type`, `mode`, `fields`), but our `schema.yaml` output has been sorting those properties alphabetically which makes it much less readable. Also, when using `bqetl query schema update` to update an existing `schema.yaml` file, this will now preserve whatever order the column schema properties were in.	2023-06-09 16:15:40 +00:00
Anna Scholtz	30d7507f02	[Bug 1712332] Generate UNIONed app ping views (#3545 ) * Generate UNIONed app ping views * Address review feedback	2023-02-01 14:00:06 -08:00
Anna Scholtz	cd48f7c09c	Pass --use_cloud_function to get_schema_from_table (#3442 )	2022-12-08 16:45:30 -08:00
Anna Scholtz	6f795228e5	bqetl update all query schemas and deploy new tables	2022-12-06 17:01:35 -08:00
Sean Rose	32ab51ba83	Only merge descriptions from Glean stable table schemas to views (#3391 )	2022-12-01 09:50:01 -08:00
Sean Rose	65224e4399	Reduce logging to speed up Netlify CI builds (bug 1761292) (#2920 ) * Comment out print to reduce Netlify build logging by 84%. * Omit Maven transfer progress from Netlify build logs. * Omit wget transfer progress from Netlify build logs.	2022-04-28 16:12:16 -07:00
Anna Scholtz	cf966d2280	Move stable view generation into separate module	2022-01-05 12:26:52 -08:00
Daniel Thorn	233f0c24ab	Add dryrun options to more bqetl commands (#2121 )	2021-06-14 12:52:21 -07:00
Anna Scholtz	03d4b1d39f	Handle exceptions and float type when adding descriptions to stable views	2021-06-14 10:38:58 -07:00
Anna Scholtz	8809da61d7	Add field descriptions to stable views	2021-06-14 10:38:58 -07:00
Daniel Thorn	063cf808a0	Fix schema validation for CREATE TABLE statements (#2074 )	2021-05-25 14:47:53 -07:00
Daniel Thorn	3c8894fdf1	Make schema validation part of dryrun (#2069 )	2021-05-25 14:53:09 -04:00
Anna Scholtz	14175f5426	Update all downstream dependencies	2021-05-19 12:51:11 -07:00
Anna Scholtz	8f9e9d5286	Update derived_from schema	2021-05-19 12:51:11 -07:00
Anna Scholtz	8d9d42576d	Ensure correct directory structure for temporary SQL query files	2021-05-07 12:30:04 -07:00
Anna Scholtz	f18d19fd5f	Update workflow docs	2021-05-07 10:30:02 -07:00
Anna Scholtz	c9f3d79b26	Update downstream dependency schemas	2021-05-07 10:30:02 -07:00
Anna Scholtz	74a759c942	Remove submission_date field from experiment_search_aggregates_v1	2021-03-10 12:41:13 -08:00
Daniel Thorn	a190e18264	Automatically sort python imports (#1840 )	2021-02-24 17:11:52 -05:00
Anna Scholtz	de644c906e	Refactor schema file handling	2021-02-19 09:34:15 -08:00
Anna Scholtz	47dbfe545d	Update query schema CLI	2021-02-19 09:34:15 -08:00
Anna Scholtz	b46034cd97	Add schema tests	2021-02-19 09:34:15 -08:00
Anna Scholtz	a79cf7a4e2	Schema abstraction	2021-02-19 09:34:15 -08:00
Anna Scholtz	1605dd0368	Preliminary query schema functionality	2021-02-19 09:34:15 -08:00

39 Коммитов