Граф коммитов

1440 Коммитов

Автор SHA1 Сообщение Дата
Anna Scholtz 4bffcd2652
Fix renaming test files on stage deploys (#5275) 2024-03-25 12:14:46 -07:00
Alexander 18205eabcb
feat(managed-backfills): Add date range to standardize backfill dates, support unpartitioned table backfills, align backfill date partitions for backfill complete (#5256)
* feat(managed-backfills): add date range iterator to standardize backfill date ranges

* refactor: query backfill to use new date range module

* feat(managed-backfills): DENG-1285 - Support completing backfills for unpartitioned tables and align partitions for offset table backfills
2024-03-25 11:28:56 -04:00
Winnie Chan 325d982f31
DENG-1019: Cleaned backfill validations (#5248)
* Cleaned backfill validations
2024-03-22 14:54:13 -07:00
Ben Wu 0bc2dd1cba
Avoid table stage deploy when there are no queries (#5269) 2024-03-22 08:28:49 -07:00
Anna Scholtz a27aa10f84
Speed up stage deploys (#5262) 2024-03-21 14:31:22 -07:00
Alexander ccf9bf910c
fix: ignore mypy type in stripe script (#5259) 2024-03-21 14:10:04 -04:00
Anna Scholtz 54dcb424ac
Remove skipping DAG validation in CI (#5238) 2024-03-19 12:57:49 -07:00
Winnie Chan 33f9017c75
DENG-2823: Added deprecate cli command (#5219)
* Added deprecate cli command

* Fixed typo

* Fixed failed tests

* Fixed deletion date label

* Update bigquery_etl/metadata/parse_metadata.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Fixed deletion date

* Fixed arguments optional

* Added return back

* Added invalid deletion date test

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2024-03-19 11:17:32 -07:00
Winnie Chan f89d0522c8
DENG-802 Changed backfill cli commands (#5217)
* Fixed backfill cli commands

* Fixed status values

* Update bigquery_etl/cli/backfill.py

Co-authored-by: Alexander <anicholson@mozilla.com>

* Update bigquery_etl/cli/backfill.py

Co-authored-by: Alexander <anicholson@mozilla.com>

* Reversed conditions

* Changed to singular backfill

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2024-03-18 11:39:03 -07:00
Alexander 8fc842d5ae
Write backfills json even if no backfills to process (#5209) 2024-03-14 12:04:42 -04:00
Alexander 47196d3ba3
DENG-2950 - Support getting scheduled backfills that need processing as well (#5170) 2024-03-08 12:42:35 -05:00
Winnie Chan 69f8b357c1
DENG1481: Added dry run in processing (#5134)
* Added dry run in processing

* Update bigquery_etl/cli/backfill.py

Co-authored-by: Alexander <anicholson@mozilla.com>

* Removed deploy when dry run

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2024-03-08 07:49:15 -08:00
Sean Rose 5ee7b8cdc7
Pass the query's project to the dryrun cloud function (#4904)
So we don't have to rely on the dryrun cloud function using the `moz-fx-data-shared-prod` project by default.
2024-03-07 15:37:17 -08:00
Anna Scholtz 70a355a0dd
Require authentication for dry run function and run gcloud auth when … (#5171)
* Require authentication for dry run function and run gcloud auth when not logged in

* authenticate step in CI, remove interactive gcloud auth

* Skip dryrun for ltv_state_values_v2

* Refactor skip_fork in CI, clarify login requirements
2024-03-06 15:06:29 -08:00
Alexander 027eb69562
DENG-2950 - Rename drafting -> initiate, modify json output of scheduled command (#5164)
* Update json write for scheduled commands to include date and watchers for DAG

* Rename drafting to initiate
2024-03-05 09:12:47 -05:00
Sean Rose 1efeaef01b
Change Fivetran Airflow operators to run synchronously so the task concurrency limit is respected. (#5144) 2024-02-29 14:10:38 -08:00
Winnie Chan 6c501a620c
Deng-2845: Remove default deprecated false (#5118)
* Removed deprecated field

* Removed deprecated false in metadata yaml

* Fixed test
2024-02-27 10:33:50 -08:00
Alexander 101e8e4543
Preempt IndexError when unable to parse view content (#5114) 2024-02-26 11:57:41 -05:00
Winnie Chan 2387660ab7
DENG-822: Validate workgroups in backfills (#5081)
* Updated workgroup validations

* Fixed indentation in config.yml

* removed sys exit

Co-authored-by: Alexander <anicholson@mozilla.com>

* raise value error

Co-authored-by: Alexander <anicholson@mozilla.com>

* added workgroup constant

* Fixed value error

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2024-02-23 12:54:38 -08:00
Sean Rose 5f9512fa01
Output DAG tasks for external dependencies first (#5100)
So that if any of the ETLs use `depends_on` to manually depend on one of those external dependencies the external task variable will be guaranteed to exist before it's referenced in the `set_upstream()` call.
2024-02-23 11:09:42 -08:00
Ben Wu 1a1188214b
Bug 1868244 Publish views if dry run schema differs from deployed (#5057) 2024-02-22 10:59:04 -05:00
Sean Rose 54f4a6df70
Limit Fivetran task concurrency so it only tries to run one sync per connector at a time. (#5083) 2024-02-21 11:55:42 -08:00
Winnie Chan cee2479a0f
Catch all exceptions (#5079) 2024-02-21 10:38:34 -08:00
kik-kik 3ba5b32bb8
Following srose suggestion and removing the usage of fivetran sensor to avoid double wait / blocking (#5080) 2024-02-21 19:28:16 +01:00
kik-kik 5572947730
DENG-2492 update column metadata (#5074) (#5015)
Co-authored-by: Katie Windau <153020235+kwindau@users.noreply.github.com>
2024-02-21 18:16:51 +01:00
Winnie Chan 14491fca27
Catch table not found exception (#5059) 2024-02-20 08:48:00 -08:00
Lucia 84ee88e2b9
Dependabot/pip/black 24.1.1 fix (#5027)
* Bump black from 23.10.1 to 24.1.1

Bumps [black](https://github.com/psf/black) from 23.10.1 to 24.1.1.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/23.10.1...24.1.1)

---
updated-dependencies:
- dependency-name: black
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* Reformat files with black to fix dependabot update.

* Reformat with black 24.1.1. Update test dag with required space.

* Update test dags.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-19 15:27:34 +01:00
Sean Rose 25e4c224a0
Support waiting for tables/partitions to exist before running ETLs (#5039)
* Implement `depends_on_tables_existing` and `depends_on_table_partitions_existing` scheduling metadata.

* Refactor repeated timedelta validation logic into `validate_timedelta_string` utility function.

* Replace Google Search Console empty-check ETLs with `depends_on_table_partitions`.
2024-02-15 14:38:52 -08:00
Sean Rose 20da21c16b
Ignore stable table schemas without `bq_dataset_family` and `bq_table` metadata. (#5037)
The `glean/glean` schema now has `mozPipelineMetadata`, so we need to be more specific to continue excluding it.
2024-02-14 11:10:06 +01:00
Alexander a5c6c91bb1
Add BigQuery schema conversion util (#5034)
* Add BigQuery schema conversion util

* Update bigquery_etl/schema/__init__.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2024-02-13 16:46:46 -05:00
Frank Bertsch 2d407f7e93
GROWTH-101 - Update gclid_conversions view to 1-row per conversion (#4612)
* Update gclid_conversions view to 1-row per conversion

* Fully qualify table
2024-02-12 14:16:59 -08:00
Winnie Chan 8ec7516157
Issue 4135: Added publish metadata cli command (#5011)
* Added publish metadata cli command

* Removed publish metadata script
2024-02-12 11:12:14 -08:00
Sean Rose 27f15163a1
Fix `bqetl query schema deploy` to find script ETLs specified as `{dataset}.{table}`. (#5004) 2024-02-09 10:17:06 -08:00
kik-kik 9049398c36
adding firefox_ios_derived.retention_clients to shredder config (#4993) 2024-02-09 12:14:00 +01:00
Sean Rose 802da71a2c
Add ETLs and views for Google Search Console data (DENG-1733) (#4892)
* Add ETLs for historical Google Search Console data synced by Fivetran.

* Fix formatting of `CASE` subclauses like `WHEN` inside Jinja blocks.

* Add ETLs for current Google Search Console data exported directly to BigQuery.

* Add views for Google Search Console data.
2024-02-07 12:53:32 -08:00
Anna Scholtz 138841d351
Package bqetl and publish to PyPI (#4917)
* pyproject.toml for bqetl

* Correctly resolve SQL generators from package

* CircleCI config to publish tagged versions to PyPI

* Get version from git tags
2024-02-05 09:04:04 -08:00
Anna Scholtz a4c7b0ab40
Remove gke_command usages (#4900) 2024-02-05 08:40:34 -08:00
Alexander f36e75ab2b
Revert "Restrict derived view schema generation to views with upstream schema…" (#4941)
This reverts commit f5ee129b63.
2024-02-01 12:08:51 -05:00
Anna Scholtz b0a1a32246
Speed up generate-sql (#4921)
* Speed up glean_usage generator

* Refactoring
2024-01-31 12:08:29 -08:00
Sean Rose 86c0eca325
Update `bqetl format` to not have an extra blank line after Jinja expressions (#4899)
* Update `bqetl format` to not have an extra blank line after Jinja expressions.

* Update `bqetl format` to add a blank line before `#fail` and `#warn` comments.
2024-01-30 09:35:12 -08:00
Anna Scholtz 073b1f050d
Fix dataset deprecation metadata (#4874)
* Update dataset workgroup_access when deprecated: true

* Update deprecation metadata tests

* Add metadata.yaml files in telemetry_derived for tables that are managed through other tooling

* Deprecate telemetry_derived datasets
2024-01-26 22:03:54 +00:00
Sean Rose 1b9750cdfb
Don't deploy ETLs with `destination_table: null` set in their scheduling metadata. (#4897) 2024-01-26 12:21:45 -08:00
Sean Rose f49ca466f9
Ignore missing SQL generators. (#4896) 2024-01-26 11:55:16 -08:00
Frank Bertsch 401b8e7351
Use state_values_v2 for Android LTV pipeline (#4887)
* Don't try to write existing view files

* Use state_values_v2 for client ad click predictions

* Normalize countries in client_ltv

* Don't get view if unavailable

* Add test for new version of existing table

* Fully qualify tables in view defn
2024-01-26 13:56:39 -05:00
Anna Scholtz eb8c0fb9e2
Run schema updates for new tables using is_init() (#4890) 2024-01-25 14:47:27 -08:00
Sean Rose 9201c870ce
Make `bqetl query initialize` work for ETLs using `is_init()` in non-default projects. (#4882) 2024-01-24 13:24:02 -08:00
Sean Rose a70b2aa689
Support symlinks (#4881)
* Avoid using `Path.glob()` or `Path.rglob()` for recursive file searches.

Because they don't currently support following symlinks (they will in Python 3.13).

* Specify `followlinks=True` as necessary when calling `os.walk()`.
2024-01-24 13:02:43 -08:00
Anna Scholtz b0387fb7de
Remove referenced_tables usages (#4834)
* Remove referenced_tables usages

* Resolve check dependencies when generating DAGs

* Add tests for checks automatically resolving dependencies

* Incorporate feedback for removing referenced_tables

* Use depends_on for empty_checks

* More depends_on and fixes
2024-01-22 12:41:05 -08:00
Sean Rose 0530c1dc81
Fix `verify-format-sql` CI check not reporting SQL formatting issues (#4827)
* Verify the format of the original SQL, not the generated SQL.

The generated SQL gets reformatted by `bqetl query render`.

* Format all SQL.

* Quote column names containing Jinja expressions to prevent `bqetl format` causing invalid SQL.

* Adjust indentation of some comments to align with the formatted SQL.

* Refactor final `SELECT` in `telemetry_derived.clients_first_seen_v2` to work better with `bqetl format` SQL formatting.

* Fix trailing line comments breaking inline block formatting.

* Fix leading whitespace before Jinja comments not being preserved.

* Add `schema.yaml` for `firefox_ios_derived.baseline_clients_yearly_v1`.

So the `deploy-changes-to-stage` CI can work for the downstream `firefox_ios.baseline_clients_yearly` view.

* Add `schema.yaml` for `firefox_accounts_derived/fxa_users_services_daily_v1`.

So the `dry-run-sql` CI can work for the downstream `firefox_accounts_derived.fxa_users_services_last_seen_v1` ETL.

* Correct `schema.yaml` and `init.sql` for `firefox_accounts_derived.fxa_users_last_seen_v1`.

So the `dry-run-sql` CI can work for the downstream `firefox_accounts_derived.fxa_users_last_seen_v1` ETL.

* Fully qualify table reference in `init.sql` for `firefox_accounts_derived.fxa_users_last_seen_v1`.

So the table dependency will get detected by the `deploy-changes-to-stage` CI to deploy it so the `dry-run-sql` CI can work for the `init.sql` file.

* Improve `JinjaComment` inheritance and docstring.

* Implement `Line.ends_with_line_comment` property and refactor `inline_block_format()`.
2024-01-22 11:48:08 -08:00
Alexander befe468aea
Use rich for backfill CLI (#4866) 2024-01-22 13:47:01 -05:00
Alexander f5ee129b63
Restrict derived view schema generation to views with upstream schema files and directly copy reference schemas for simple views. (#4848)
* Refactor
* Copy reference schema directly if it's available
* Refactor default view code
2024-01-19 16:24:50 -05:00
Sean Rose a912c28c68
Fix `bqetl stage` to create parent dataset for stored procedures. (#4863) 2024-01-19 12:46:35 -08:00
kik-kik 5c6f1429fb
feat(DENG-1590): added existing fxa tables to shredder config (#4851)
* added existing fxa tables to shredder config

* Apply suggestions from code review

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* removing some of the fxa tables from the config as suggested by srose

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2024-01-19 12:56:28 +01:00
Anna Scholtz ece50f6d2c
Fix duplicate wait_for tasks in public data JSON DAG (#4849)
Co-authored-by: Katie Windau <153020235+kwindau@users.noreply.github.com>
2024-01-18 08:55:53 -06:00
kik-kik 573d5e2658
added firefox_ios_derived.firefox_ios_clients_v1 to shredder config (#4852) 2024-01-18 15:33:10 +01:00
kik-kik 0bfc394689
added fenix client funnels to shredder config (#4833) 2024-01-18 11:48:48 +01:00
Alexander fe62e09781
Remove remaining mentions of no_partition (#4803) 2024-01-17 10:38:47 -05:00
Sean Rose 9aea89370b
Add `fxa_delete_events_v2` ETL based on FxA logs from GCP (#4843)
* Add `fxa_delete_events_v2` ETL based on FxA logs from GCP.

* Add `fxa_delete_events` view combining `fxa_delete_events_v1` and `fxa_delete_events_v2` data.

* Use `fxa_delete_events` view for Shredder.

* Update sql/moz-fx-data-shared-prod/firefox_accounts_derived/fxa_delete_events_v2/metadata.yaml

---------

Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>
2024-01-17 14:34:54 +01:00
Sean Rose 3062b502f9
Escape underscores in `LIKE` patterns (#4810) 2024-01-11 17:21:24 -08:00
Sean Rose 1d1addb86c
Indent join conditions (#4223)
* Indent join conditions.

* Put parentheses around some `BETWEEN ... AND ...` join conditions.
2024-01-11 15:50:26 -08:00
Anna Scholtz 826e1881c0
Add skip-existing option to ./bqetl query initialize (#4792)
* Add skip-existing option to ./bqetl query initialize

* Handle initialization exceptions and refactor skip-existing check

* Refactoring of ./bqetl initialization

* Add --force option to ./bqetl initialize

* Update bigquery_etl/cli/query.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Update bigquery_etl/cli/query.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Update bigquery_etl/cli/query.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Update bigquery_etl/cli/query.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Update bigquery_etl/cli/query.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2024-01-10 11:00:54 -08:00
Alexander 6c5e59634e
Support backfilling unpartitioned tables and non-date query parameters (#4769)
* Initial commit

* Support non-date parameters and formatting
2024-01-10 13:37:33 -05:00
Sean Rose 7bc55cfc8b
Handle Jinja whitespace control characters in `bqetl format` (#4784)
* Handle Jinja whitespace control characters in `bqetl format`.

* Use default formatting for Jinja in `bigquery_usage_v2` ETL.

* Reformat `sql_generators/active_users/templates/mobile_checks.sql`.
2024-01-10 10:07:21 -08:00
Leli beea0fd9e9
follow up for #4777 (#4778)
* DAG docs - fix broken links and add tags to docs

* change tests

* remove empty line

* fix typo

* fix second test template

* add if case for private-bigquery-etl
2024-01-04 20:15:10 +01:00
Leli 84e1188b15
DAG docs - fix broken links and add tags to docs (#4777) 2024-01-04 18:31:36 +01:00
Alexander fab7e04764
Remove dryrun from view validation in CI (#4774)
* Remove dryrun from view validation

* Remove access denied view validation skips
2024-01-04 12:06:02 -05:00
Alexander e0996c20cd
Bqetl on rich-cli 💸 (#3775) 2024-01-03 11:30:54 -05:00
Alexander 7a80984757
DENG-1193 Deprecate generated dataset docs (#4657) 2023-12-15 12:17:26 -05:00
Jan-Erik Rediger 1a49bda54c
Add `syndication` as possible metadata field (#4715)
91acdfce70 added this, which in turn broke
(at least) doc generation.
2023-12-15 11:26:03 +01:00
Alexander 463dc15bf1
Support shared-prod UDFs (#4708) 2023-12-14 13:45:13 -05:00
Daniel Thorn b0bfc65052
DENG-965 - symbolicate and signaturize crash pings (#4642) 2023-12-12 08:57:52 -08:00
Alexander 776c590db2
ci-fix Ignore dataset.update required permissions when dryrunning authorized views (#4681)
* Refactor, add typehint
* Add datasets.update clause denied for authorized views
2023-12-11 14:52:19 -05:00
Anna Scholtz c31ae16efb
Revert "Define `event_monitoring_live_v1` views in `view.sql` files (#4576)" (#4680)
This reverts commit 2c4cc5eefe.
2023-12-11 10:15:30 -08:00
Sean Rose 2c4cc5eefe
Define `event_monitoring_live_v1` views in `view.sql` files (#4576)
* Define `event_monitoring_live_v1` views in `view.sql` files.

So they get automatically deployed by the `bqetl_artifact_deployment.publish_views` Airflow task.

* Support materialized views in view naming validation.

* Handle `IF NOT EXISTS` in view naming validation.

* Use regular expression to extract view ID in view naming validation.

This simplifies the logic and avoids a sqlparse bug where it doesn't recognize the `MATERIALIZED` keyword.

* Update other view regular expressions to allow for materialized views.
2023-12-08 11:54:02 -08:00
Sean Rose 308822d7cf
Have `bqetl query` commands fail if they don't find a matching query (#4662)
* Have `bqetl query` commands fail if they don't find a matching query.

* Update `test_run_query_no_query_file` test.
2023-12-07 16:57:11 -08:00
Alexander f045e9d849
Support offset backfills, require metadata (#4627)
* Skip backfills for queries without metadata.yaml

* Support date_partition_offset

* Fixed exclude, modified exception

* Add test for offset backfill

* Apply suggestions from code review

Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>

* Formatting

---------

Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>
2023-12-05 14:07:09 -05:00
kik-kik 076a0e0775
feat(DENG-2083): added firefox_ios_derived.clients_activation_v1 and corresponding view (#4631)
* added firefox_ios_derived.clients_activation_v1 and corresponding view

* fixing a missing seperator in firefox_ios_derived.clients_activation_v1 checks

* adding firefox_ios_derived.clients_activation_v1 to shredder configuration

* removed is_suspicious_device_client as it should not be there, thanks bani for pointing this out

* fixed black formatting error inside shredder/config.py

* applied bqetl formatting

* minor styling tweak as suggested by bani in PR#4631
2023-12-05 12:42:39 +01:00
Eduardo Filho 0bf4c279d6
GLAM avoid scientific notation for big sample counts (#4647)
* GLAM avoid scientific notation for big sample counts

* Cast to bignumeric instead of numeric
2023-12-04 17:47:50 -05:00
Anna Scholtz 68ece978e0
Resolve correct task_id for tasks nested in a group (#4637) 2023-12-01 11:38:59 -08:00
kik-kik 639381f13d
firefox_ios source added to shredder config (#4638) 2023-12-01 17:48:56 +01:00
kik-kik 9409d2b6cb
feat(DENG-1774 / cancelled): deleting fenix_derived/firefox_android_clients_v2, v1 will remains the active model (#4610)
* deleting fenix_derived/firefox_android_clients_v2, v1 will remain the active model

* removed fenix_derived.firefox_android_clients_v2 from shredder config
2023-12-01 11:16:54 +01:00
Anna Scholtz 7087dbff30
Separate Airflow tasks for glean_usage (#4588)
* Add support for assigning Airflow tasks to task groups

* Generate separate Airflow tasks for glean_usage

* Remove Airflow dependencies from old glean_usage tasks
2023-11-30 09:48:17 -08:00
Eduardo Filho ec297972c6
Glam accounts for sampling when calculating sample_count for windows & release probes (#4581)
* Glam - fix legacy windows & release probes' sample count going fwd

* Glam FOG accounts for sampling when calculating total_sample for windows & release probes

* fog - fix client count and sample count

* Add channel filtering for fog
2023-11-23 17:06:20 -05:00
Lucia fe2bf1d2de
DS-3361. Update documentation of initialize command. (#4592)
Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
2023-11-21 22:02:46 +01:00
Frank Bertsch 5cf8d30153
Add session date param; fix checks CLI bug (#4579)
* Fix checks to filter on partitions

* Don't print "missing checks file" on success

Previously, the statement that checks.sql files
were missing was printed on any execution of the for
statement. ("else" clauses after "for"s execute after
completion of the "for" clause).

Instead, we want to print only when there are no files.
2023-11-17 15:33:23 -05:00
Linh Nguyen c1c73e690e
Make sure that metadata `friendly_name` and `description` are not None (#4513)
* Fill empty description

* Assign a friendly name if the table doesn't have one

* Update metadata tests

* Update bigquery_etl/metadata/parse_metadata.py

Co-authored-by: Alexander <anicholson@mozilla.com>

* update test again

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2023-11-17 11:48:11 -05:00
kik-kik 6e4c09a677
added fenix_derived.firefox_android_clients_v2 to shredder config (#4564) 2023-11-16 11:26:59 +01:00
Sean Rose e44e5ca705
Generate normal task dependencies from `depends_on` if the task is in the same DAG (#4569)
* Generate normal task dependencies from `depends_on` if the task is in the same DAG.

* Update `metadata.yaml` files to use `depends_on` rather than `upstream_dependencies`.
2023-11-14 16:06:00 -08:00
Lucia 894d42dde1
DS-3054. Support running an initialization query in parallel (#4322)
* DS-3054. Create functions to support running an initialization query for all sample_ids in parallel.

* DS-3054. Update _run_query function.

* DS-3054. Use _run_query and mapped values for initialization in parallel.

* DS-3054. Unify initialization to run in parallel and get sample_id range from metadata.

* DS-3054. Minimize formatting of query template and remove need to modify existing initialization queries. Validate if a query should use parallelized or regular update.

* DS-3054. Adding link to caveats.

* DS-3054. Update sample_id range for initialization.

* DS-3054. Use current implementation of run_query.

* DS-3054. Update using a parameter instead of initialization in metadata.

* DS-3054. DAG update with new parameter.

* Pass parameters before calling _run_query().

* Use --append_tablein favour of INSERT INTO.

* DS-3054 Separate parallel and non parallel init, plus some improvements.

---------

Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
2023-11-07 20:03:48 +01:00
Anna Scholtz e7e7eaae06
Set depend_on_past=False for warn checks (#4526) 2023-11-06 10:39:58 -08:00
kik-kik 0962ba65fe
prefixing schema error message inside dryrun to "ERROR" to make it easier to find when searching logs for cause of exit code 1 (#4522) 2023-11-06 12:12:50 +01:00
Frank Bertsch a271c024b2
Dont generate dags in bqetl query schedule command (#4517) 2023-11-03 08:59:27 -07:00
Anna Scholtz 185f833f2a
Materialized views and aggregated tables for event monitoring (#4478)
* WIP event monitoring

* Add FxA custom events to view definition (#4483)

* Add FxA custom events to view definition

* Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql

* Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql

* Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql

* Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Move event monitoring to glean_usage generator

* Add cross-app event monitoring view

* Generate cross app monitoring

* Simplyfy event monitoring aggregation

---------

Co-authored-by: akkomar <akkomar@users.noreply.github.com>
2023-11-01 14:20:20 -07:00
Frank Bertsch 55c5d412c1
Allow running multiple checks (#4471)
* Allow running multiple checks

* Don't yield anything on no matches
2023-10-24 14:39:01 -04:00
Frank Bertsch ac0af012c2
Add opt-in to running checks for backfill (#4455) 2023-10-18 17:34:58 -04:00
akkomar 7a36416554
Set project in init jobs (#4453)
This fixes https://github.com/mozilla/bigquery-etl/pull/4452
2023-10-18 16:58:18 +02:00
akkomar 0171f93596
Set project in init jobs (#4452) 2023-10-18 16:04:09 +02:00
akkomar c3c5ecffd4
Don't set destination table for init jobs (#4451)
This reverts https://github.com/mozilla/bigquery-etl/pull/4193/files

By convention all but two init.sql jobs use `CREATE TABLE` statement. Setting destination table on a job that runs these queries causes an `BadRequest: 400 Cannot set destination table in jobs with DDL statements` error as observed in [1].

Apart from removing setting of destination_table this fixes two init queries.

[1] https://workflow.telemetry.mozilla.org/dags/copy_deduplicate/grid?dag_run_id=scheduled__2023-10-17T01%3A00%3A00%2B00%3A00&task_id=baseline_clients_first_seen&tab=logs
2023-10-18 14:45:22 +02:00
Frank Bertsch 164ba19abf
Glean usage checks (#4445)
* WIP: Add checks for glean_usage

* Ignore pycache in autogenerated click cmds

* Move check to backfill command

* Remove view checks
2023-10-17 17:03:41 -04:00
Sean Rose 4bbbc32a5b
Put assert UDFs in `mozfun` project (#4367)
* Put assert UDFs in `mozfun` project.

* Tweak syntax in `assert.array_equals()` to avoid SQLGlot parsing error.
  https://github.com/tobymao/sqlglot/issues/2348

* Fix SQL syntax error in `assert.struct_equals()` tests.

* Fix UDF dependency file path logic when deploying to stage.

* Change regular expressions in `parse_routine` module to allow quotes around routines' dataset and name.
2023-10-13 10:58:42 -07:00
Eduardo Filho 83569d8211
Add sampling to glam-fog (#4409)
* Add sampling to glam-fog

* Simplify count logic

* Update bigquery_etl/glam/templates/clients_daily_histogram_aggregates_v1.sql

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Update bigquery_etl/glam/templates/clients_daily_scalar_aggregates_v1.sql

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-10-13 09:24:04 -04:00
Anna Scholtz 35ae323487
Funnel generators POC (#4390)
* Add funnel generation logic

* Example funnel config

* Fix funnel columns

* funnel generation dimensions

* Optimize segmenting generated funnels

* Add funnel generation docs

* Schedule generated funnels

* Skip DAGs with no tasks

* Add background info funnel generator

* Add funnel generation tests

* Fix join_previous_step_on

* Add funnel example config
2023-10-12 14:05:08 -07:00
Anna Scholtz 61da5cca03
Respect sql_dir in dryrun skip (#4334)
* Respect sql_dir in dryrun skip

* Update bigquery_etl/dryrun.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Update bigquery_etl/dryrun.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

* Set sql_dir when using Schema.from_query_file()

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2023-10-12 13:27:54 -07:00
Anna Scholtz 3a8c6a9426
Filter files with multiple suffixes in stage deploy (#4403) 2023-10-10 15:02:54 -07:00
Daniel Thorn cd3aaabb66
Remove main summary and main_v4 from shredder (#4388) 2023-10-06 10:07:25 -07:00
Sean Rose 191eded481
Shred four more FxA tables. (#4376)
* moz-fx-data-shared-prod.firefox_accounts_derived.events_daily_v1
  * moz-fx-data-shared-prod.firefox_accounts_derived.funnel_events_source_v1
  * moz-fx-data-shared-prod.firefox_accounts_derived.fxa_log_device_command_events_v1
  * moz-fx-data-shared-prod.firefox_accounts_derived.fxa_log_device_command_events_v2
2023-10-03 13:18:43 -07:00
Mike Williams 124a8613cc
fix DENG-1091: automatically add triage/confidential to private DAGs (#4363)
Co-authored-by: Marlene Hirose <92952117+Marlene-M-Hirose@users.noreply.github.com>
Co-authored-by: lelilia
2023-09-29 13:15:35 -04:00
Curtis Morales 6e32c52e2c
Don't retry check tasks (#4359)
* Don't retry check tasks

* Update test

* Fix one more test
2023-09-28 15:23:23 -04:00
Daniel Thorn ea05e6c6dc
Bug 1852630 - Rename main_remainder_v4 to main_v5 (#4353)
and point at new copy_deduplicate tasks for similar pings
2023-09-28 09:08:55 -07:00
Daniel Thorn d0cc8dfbe8
Add main_v5 et al to shredder (#4352) 2023-09-27 14:26:27 -07:00
akkomar 3ae03d6861
Update split main ping queries parent task (#4347)
This is required after these queries were moved out of copy_deduplicate_all in https://github.com/mozilla/telemetry-airflow/pull/1822
2023-09-26 16:10:57 +02:00
Sean Rose e04e314f46
Shred `firefox_accounts_derived.fxa_gcp_*_events_v1`. (#4341) 2023-09-25 09:22:43 -07:00
Anna Scholtz 1b6e598c9e
Publish private-bigquery-etl DAGs to private-generated-sql (#4319) 2023-09-19 08:18:32 -07:00
Anna Scholtz 3f79cc5151
Generate soft etl checks (#4268)
* Add markers to check cli command to differentiate warning from hard failures

* Fix CI issues

* Fix dag generation

* Incorporate Feedback

* Generate Airflow tasks for #fail and #warn checks

---------

Co-authored-by: Alekhya Kommasani <akommasani@mozilla.com>
Co-authored-by: Alekhya <88394696+alekhyamoz@users.noreply.github.com>
2023-09-13 10:22:39 -07:00
Alekhya 2e916eb856
DENG1381 - Add bqetl support for deprecation metadata (#4213)
* Support bq dataset deprecation process (metadata)

* Add bqetl metadata cli command

* Initial draft for adding deprecation support to bqetl

* Incorporate Anna's feedback

* Fix based on whd's feedback

* Fix ci issues

* Remove unnecessary logic from metadata.py

* Add dataset metadata yaml for ga_derived

* Ignore dirs that do not have dataset_metadata yaml

* Remove unwanted dataset metadata yamls

* Update bigquery_etl/cli/metadata.py

Co-authored-by: whd <whd@users.noreply.github.com>

---------

Co-authored-by: whd <whd@users.noreply.github.com>
2023-09-12 18:47:54 +00:00
Anna Scholtz cb9eff55fb
Handle references to INFORMATION_SCHEMA when deploying to stage (#4233) 2023-09-12 09:49:49 -07:00
Sean Rose d33db5b00f
Don't quote wildcard tables twice when updating stage references. (#4227) 2023-08-31 17:11:43 -07:00
Anna Scholtz f1f552ef47
Fix publishing udfs that use backticks in identifiers (#4225)
* Fix publishing udfs that use backticks in identifiers

* Update bigquery_etl/routine/parse_routine.py

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>

---------

Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
2023-08-31 16:00:44 -07:00
Alexander 3c7f95e314
Skip tables with all filtered backfill entries (#4217) 2023-08-30 10:54:09 -04:00
Lucia 27262acdfd
Default DAG for bqetl queries (#4143)
* DENG-1314 Implement changes to bqetl and create default DAG.

* DENG-1314. Update Documentation.

* DENG-1314. Dummy query to enable generating DAG and run tests.

* DENG-1314. Update tests.

* Update bigquery_etl/cli/query.py

Raise exception when scheduling information is missing.

Co-authored-by: Daniel Thorn <dthorn@mozilla.com>

* DENG-1314. Update tests.

* DS-3054. Update query creation to set bqetl_default as default value for --dag. Update tests.

* Default task and tests update.

* Default task and tests update.

* 3650 - Remove default DAG option, update DAG template comment & tests.

* 3650 - Condition for DAG warning.

* 3650 - Update docs.

* Clarification on sql/moz-fx-data-shared-prod/analysis/bqetl_default_task_v1/metadata.yaml

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Update docs/cookbooks/creating_a_derived_dataset.md

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

---------

Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
Co-authored-by: Daniel Thorn <dthorn@mozilla.com>
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-08-29 14:32:52 +02:00
kik-kik 7afc4c44f1
docs(DENG-960): bqetl data checks cli docs (#4200)
* Small tweaks made to the cli cmds comments / help display for data checks

* added usage docs to data_checks reference docs

* Apply suggestions from code review provided by scholtzan

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-08-24 17:24:23 +02:00
Lucia 7d5f17c1aa
Add destination table when initializing query. If not added, data is initialized in a temporary table. (#4193)
Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-08-16 18:26:11 +02:00
Alexander 05ab70070f
DENG-899 - Add json write functionality to bqetl schedule command (#4139)
* DENG-899 - Add json write functionality to bqetl schedule command

* Patch client.get_table so we don't need access
2023-08-15 12:57:38 -04:00
Winnie Chan cb0cad35e7
Added firefox_android_clients_v1 to shredder (#4141)
Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com>
2023-08-14 09:55:20 -07:00
Anna Scholtz 647ff690f7
[DENG-1107] Correctly resolve upstream dependencies when using checks.sql (#4079) 2023-08-09 15:11:01 -07:00
kik-kik b927ed22be
feat(DENG-949): Added `render` subcommand and `--dry-run` flag to the bqetl check command (#4045)
* added render subcommand to the bqetl check command

* added a dry_run flag to bqqetl check run command

* added a test to make sure run command exists with status code 0

* added test for check render subcommand

* fixing linter checks

* attempting using an alternative way of testing the render command

* fixing render test by testing the _render() directly rather than the render cli wrapper

* removed dead test

* Apply suggestions from code review by ascholtz

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* fixed black and mypy errors

* fixed app_store_funnel_v1 check formatting

* reformatted tests checks

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-08-09 16:39:47 +02:00
Linh Nguyen 147b2dbf37
Include metadata check in view publishing (#4159)
* Include metadata check in view publishing

* Address review
2023-08-07 10:24:56 -04:00
Daniel Thorn 98cb7bb013
Implement simple generic active subscriptions table for KPIs (#4149) 2023-08-02 12:26:54 -07:00
Winnie Chan 4f6870d260
Fixed backfill query cli destination table name (#4123)
* Fixed desintation table

* Added check for destination table
2023-07-28 09:50:10 -07:00
Winnie Chan 36359804ef
Added fenix_derived tables to shredder (#4137)
* Added activations to shredder
2023-07-28 08:29:48 -07:00
Lucia e88bfaa441
Enable using more definitions e.g. macros in scheduling parameters. (#4136)
Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
2023-07-27 18:09:56 +02:00
Linh Nguyen 64f8599e4d
GLAM: update FOG min total_users (#4122) 2023-07-25 14:46:07 -04:00
Linh Nguyen 0a1689d6cf
Increase FOG client count minimum filter (#4104) 2023-07-21 11:54:40 -04:00
betling 99e4072018
Betling history bookmarks (#4092)
* added history and bookmarks fields

* adding automated corrections

* some auto schema updates but perhaps not all

* Update schemas

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-07-19 14:02:51 -04:00
Linh Nguyen bb2e48fa3b
Remove buildhub2 filter in GLAM templates (#4088) 2023-07-18 14:37:27 -04:00
Anna Scholtz e0cf9de09a
Fix schema updates (#4086) 2023-07-17 11:52:59 -07:00
Alexander f9ff8022d8
Publish dag name as a label (#4084) 2023-07-17 11:58:20 -04:00
Winnie Chan 97cb4117ad
DENG-807 Added backfill complete cli command (#4040)
* Added complete command

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2023-07-13 15:44:41 -07:00
Anna Scholtz d66bb2a8d9
Convert non_user_facing_dataset_suffixes to tuple when loading from bqetl_project.yaml (#4066) 2023-07-12 11:12:01 -07:00
akkomar 4ed032cceb
Set ConfigLoader's project directory on module initialization (#4062) 2023-07-12 17:45:08 +02:00
Anna Scholtz 5c0748cf79
Add missing / for generating docs (#4055) 2023-07-11 15:55:01 -07:00
Anna Scholtz 3f9181c6e1
Fix getting skipped routines from config (#4054) 2023-07-11 14:23:59 -07:00
Anna Scholtz 03357769cc
Move view, schema and remaining configs to bqetl_project.yaml (#4051)
* Move view configs to bqetl_project.yaml

* Move schema config to bqetl_project.yaml

* Move docs config to bqetl_project.yaml

* Replace remaining configs
2023-07-11 13:10:57 -07:00
Anna Scholtz 8d72cfa9fe
Move routine config to bqetl_project.yaml (#4038) 2023-07-11 10:52:48 -07:00
Glenda Leonard b71e25bc77
Removed checks.sql from dryrun. (#4050) 2023-07-11 11:45:49 -04:00
Daniel Thorn 6137048eeb
DS-2642 - Import stripe itemized tax report (#3999) 2023-07-10 17:22:18 -07:00
Winnie Chan 91882dd150
DENG-806 Added backfill process cli command (#3936)
* Added backfill process command
2023-07-10 16:13:42 -07:00
Anna Scholtz 3a61fd34bb
Move format skip files to bqetl_project.yaml (#4033) 2023-07-10 10:10:47 -07:00
kik-kik 9b5c04a7bb
bug(1741487): Rename url2 and related fields in stable views (#4029)
* Bug 1741487 - Rename url2 and related fields in stable views

This removes the following unpopulated fields from Glean views: `metrics.url`, `metrics.text`, `metrics.jwe`, and `metrics.labeled_rate`. If any of these metrics exist in the source table under `2`-suffixed name, it is also aliased to its original name (`url2` to `url` and so on).
Suffixed fields are still preserved until view consumers migrate.

* Remove redundant comma from generated sql

* Ignore missing fields in views if any of them were removed

* added a todo comment

* Added additional context around why we are excluding some of the non-suffixed fields and why alising to remove suffix 2 from some fields

---------

Co-authored-by: Arkadiusz Komarzewski <akomarzewski@mozilla.com>
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-07-10 09:31:15 -07:00
Anna Scholtz b3efbf3c88
Pass date partition parameters to check tasks (#4032) 2023-07-07 12:36:38 -07:00
Anna Scholtz dc482ad8d5
[DENG-948] Macro support for data checks (#3993)
* Add support for check macros

* Add min_rows() check macro

* Add is_unique() check macro

* Add in_range() check macro

* Update ssl_ratios DAG

* Add test for macro checks

* Fix rendering
2023-07-06 14:36:59 -07:00
Anna Scholtz d9bda0df7e
Add ConfigLoader and move dry run skip to bqetl_project.yaml (#4000)
* Add ConfigLoader and move dry run skip to bqetl_project.yaml

* format tests
2023-07-06 10:42:29 -07:00
Anna Scholtz 3286508bc5
Update view metadata in a single update_table() operation (#4017) 2023-07-04 10:55:55 -07:00
Sean Rose 352cffedb8
Add `stripe_subscriptions_history_v2` ETL (DENG-974) (#4009)
* Add `synced_at` column to `stripe_subscriptions_changelog_v1`.

* Tweak `stripe_subscriptions_changelog_v1` tax rate and discount joins to only include those that existed when the change happened.

* Parse subscription metadata in `stripe_subscriptions_changelog_v1`.

* Add `stripe_external.invoice_line_item_v1` ETL.

* Add `stripe_subscriptions_revised_changelog_v1` ETL.

* Add `stripe_subscriptions_history_v2` ETL.
2023-06-30 14:18:31 -07:00
Anna Scholtz c3ebf87ccb
Ensure all necessary parameters are passed to DQ checks (#4005) 2023-06-29 13:40:52 -07:00
Linh Nguyen 7c90d5f8e7
Publish view metadata (#3909) 2023-06-29 16:28:17 -04:00
Alekhya 9d8e7087ec
Add top_sites and quick_suggest views to skip dryrun (#4004) 2023-06-29 10:39:27 -05:00
Alekhya 01333782b1
DENG 946 - Update DAG generation to include ETL checks (#3969)
* CAccomodate dq checks in dag generation

* Modify the tests to include dq check

* Generate dags to include bigquery_dq_check

* rename destination to source for dq check

* Add DQ check to download attribution dag

* Update bigquery_etl/query_scheduling/templates/airflow_dag.j2

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Update bigquery_etl/query_scheduling/generate_airflow_dags.py

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Set upstream check dependencies using upstream_dependencies

* Change bigquery_dq_check as per gcp.py utils

* remove sql_file_path in airflow jinja

* Fix download attribution dag

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-06-28 13:50:41 -04:00
Curtis Morales b7b1b835ba
Add trigger_rule as an option for generated airflow tasks (#3772)
* Add trigger_rule as an option for generated airflow tasks

* Add test

* Move trigger rule options to enum and add to documentation
2023-06-27 13:58:52 -04:00
Sean Rose 01e982a366
Shred `firefox_accounts_derived.fxa_stdout_events_v1`. (#3982) 2023-06-23 15:02:01 -07:00
Leli ca2c84e9b5
Add to bqetl docs on query run (#3879)
* change examples for bqetl query run and remove weird indentation from code examples

* actually upload the correct file

* note on order of parameters
2023-06-22 13:52:45 +00:00
kik-kik 6eb0647238
feat(DENG-722): preparing for fxa AWS to GCP migration (nonprod) (#3882)
* added a new table for the new nonprod fxa backend events and a fxa_all_events_nonprod view to simulate the process we will need to follow for prod

* added date filters to the nonprod_fxa_all_events view as requested by akkomar and updated the metadata

* added the new nonprod_fxa_server_events_v1 table to dry run skip due to permissions

* improved the comment about deleting a view as requested by akkomar

* tweaked date filtering as requested by srose

* pulled nonprod_fxa schema from DENG-1006-fxa-log-fields

* added schema.yaml for nonprod_fxa_server_events_v1

* deleted init.sql and added clustering config to metadata.yaml instead

* added AS as requested by srose

* fixed yaml lint errors

* added the ability to pass end_date param into Airflow task

* updated nonprod_fxa queries and schema for fxa_server_events_v1 as requested by srose, this query also pulls data for stout which now has end date

* regenerated bqetl_fxa_events DAG

* renamed fxa_log to fxa_server as agreed on with srose

* reverted merging of the stdout and server event etls due to incompatible schemas

* removed changes related to task level end_date

* removed date filter for stdout events

* undoing test changes

* added country to fxa_server_events_v1 schema

* tweaked selected ordering as requested by srose and updated comments and metadata.yaml
2023-06-21 10:09:28 +00:00
Winnie Chan 0192e9a542
DENG-1021 Added destination table param to query commands (#3951)
* Added destination param

* Updated deploy help
2023-06-20 16:17:50 +00:00
Anna Scholtz 03d55819dc
[Bug 1821767] Speed up table deploys and schema updates (#3967)
* Speed up schema update

* Speed up schema update

* Sort and update schemas in parallel

* Update sql/moz-fx-data-shared-prod/telemetry_derived/clients_daily_v6/metadata.yaml

Co-authored-by: Daniel Thorn <dthorn@mozilla.com>

---------

Co-authored-by: Daniel Thorn <dthorn@mozilla.com>
2023-06-19 23:27:45 +00:00
Frank Bertsch 1418e2018b
Fixes to Android attributable_clients (#3802)
* Create new Fenix attributable_clients table

Further updates to attributable clients

- Handle clients who were only _activated_ on that day
- Separate facts/dimensions
- Rename some things
- Add metadata about why a client is present
- Limit new_activations to just activated clients
- Rename client_count field
- Include submission_date in activation join
- Move to v2

* Add DAG

* Add schema file

* Move some joins to view; add initialization

1. Move attribution & activation joins to the view. This lets
   us immediately access updates to those tables, rather than
   re-materializing this table on changes there.
2. Add the capability to init from a query file. This uses
   an `is_init` jinja function, which is only set to True
   when run from `bqetl query initialize`.

* Use dict for default template vars

* Add default for addl_templates

* Reformat files

* Update view

* Regenerate DAG

* Keep metadata field in view
2023-06-16 19:57:57 +00:00
Sean Rose b636f20235
Don't dryrun the `subscription_platform_derived.stripe_subscriptions_changelog_v1` ETL. (#3958)
The CI account doesn't have permission to access the tables in the `stripe_external` dataset.
2023-06-16 18:07:18 +00:00
Sean Rose a50e36ddc8
Format Jinja blocks like SQL blocks. (#3952) 2023-06-15 23:42:28 +00:00
Glenda Leonard 953529a3a5
Process subsequent checks for a table if a prior check fails for that table (#3943)
* Process subsequent checks for a table if a prior check fails for that table.

* Updated to use sqlparse to parse checks.sql.
2023-06-15 21:12:08 +00:00
Anna Scholtz 25a20bdfbf
Regex for matching UDF names (#3949) 2023-06-15 18:40:16 +00:00
Sean Rose faf4dc8269
Dryrun date param fixes (#3942)
* Always rewrite dryrun date query params as `submission_date`.

* Quote date partition column in dryrun to get schema.
2023-06-14 21:31:18 +00:00
Linh Nguyen d82acc1856
Simplify GLAM template for getting the latest version (#3933)
* Simplify GLAM template for getting the latest version

* Add comment about using buildhub2 data for Fenix
2023-06-14 20:03:01 +00:00
Eduardo Filho d9c68a48d1
glam: Partition clients_histogram_aggregates by sample_id (#3868)
* glam: Partition clients_histogram_aggregates by sample_id (has been running like this since April 3 from a different branch)

* glam: add description and eol to init

* glam: Partition clients_histogram_aggregates by sample_id (has been running like this since April 3 from a different branch)

* glam: add description and eol to init

* add init.sql to missing tbls

* Add schema.yaml

* increase ci output timeout to 30m

* remove init.sql to prevent ci from trying to derive schema from it and break

* Fix schema.yaml files

* Revert output timeout to default
2023-06-14 16:37:59 +00:00
Glenda Leonard c69fee0b5f
DENG-941 initial impl of check rendering and execution. (#3885)
* initial impl

* Updated based on PR feedback

* Moved check from query to separate command

* Expanded from --partition option to generic --parameter option

* Removed `query check` command (check moved to new command)

* Update bigquery_etl/cli/check.py

remove date param format check

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Removed 'parameter' parameter, everything is passed through ctx.args and then converted to a dict for Jinja rendering.  There are no restrictions on ctx.args values.

* Merge error

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-06-13 19:31:59 +00:00
Daniel Thorn 65365226b5
Don't deduplicate query arguments (#3935) 2023-06-13 17:34:00 +00:00
kik-kik b2a06b8779
if --parameter passed we set use_legacy_sql option to False by default and bq called with query by default if not explicitly passed in the bqetl query run command (#3922) 2023-06-13 08:53:03 +00:00
Winnie Chan b9d01ca959
DENG-990 Refractor backfill cli commands (#3924)
* Refractored backfill cli commands

* Adjusted  validate command
2023-06-12 17:24:35 +00:00
Sean Rose 02afdfb443
Ignore comments when detecting dependency table names. (#3927)
Otherwise the names of unaliased table references followed by a comment will incorrectly include the comment as part of the name.
2023-06-10 20:20:02 +00:00
Winnie Chan 58c96b4246
DENG-815 Add backfill info cli command (#3915)
* Added backfill info command

* Update bigquery_etl/cli/backfill.py

Co-authored-by: Alexander <anicholson@mozilla.com>

* Fixed status click choice

* Added backfill str method

* Added new backfill utils files

* Update bigquery_etl/cli/backfill.py

Co-authored-by: Alexander <anicholson@mozilla.com>

* Update bigquery_etl/cli/backfill.py

Co-authored-by: Alexander <anicholson@mozilla.com>

* Removed status default

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2023-06-09 17:32:32 +00:00
Sean Rose b7b7c23913
Preserve the order of column schema properties in `schema.yaml` files. (#3923)
When using `bqetl query schema update` to create a new `schema.yaml` file, BigQuery returns the column schema properties in a sensible order (`name`, `type`, `mode`, `fields`), but our `schema.yaml` output has been sorting those properties alphabetically which makes it much less readable.

Also, when using `bqetl query schema update` to update an existing `schema.yaml` file, this will now preserve whatever order the column schema properties were in.
2023-06-09 16:15:40 +00:00
Curtis Morales eb02488f34
Fix google sheets metadata and change from "google_sheet" to "google_sheets" for consistency with google (#3914) 2023-06-07 19:25:31 +00:00
Linh Nguyen d6a55664d0
Revert "Simplify GLAM template for getting latest versions (#3880)" (#3908)
This reverts commit 8ad45a0592.
2023-06-06 18:41:03 +00:00
kik-kik 71e7201e65
feat(): added support for `--log-level` to bqetl query command and using logging instead of print() (#3891)
* added support for --log-level to bqetl query command and updated print statements to be log statements

* now --log-level flag is a bqetl global flag

* fixing linter errors

* Update bigquery_etl/cli/__init__.py

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Update bigquery_etl/cli/__init__.py

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* fixed indentation of --log-level option

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-06-06 09:16:41 +00:00
Sean Rose 1d3030e698
Remove ZetaSQL kludges. (#3898)
ZetaSQL was removed in #3755.
2023-06-05 18:03:07 +00:00
Sean Rose 1a527d743e
Fix `bqetl stage` table ID quoting (#3899)
* Fix `bqetl stage` ID quoting.

Quoting the entire table ID breaks cases where an unaliased table name is used to qualify a column reference.

* Have `bqetl stage` preserve fully quoted references.

* Simplify regular expressions for fully quoted references.

* Compile all reference replacement regular expressions for performance.
2023-06-05 16:24:14 +00:00
Linh Nguyen 8ad45a0592
Simplify GLAM template for getting latest versions (#3880)
* Simplify GLAM latest version template

* Use buildhub2 table instead

---------

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-05 14:23:21 +00:00
Alexander 5330dd19da
Move schema and initialization logic for firefox_android_clients_v1 to metadata (#3893)
* Move schema and initializing logic for firefox_android_clients_v1 to metadata

* bqetl query schema update

* Stage table on init.sql change as well
2023-06-05 13:14:03 +00:00
Sean Rose c70a17144a
Save current SubPlat ETL views logic in versioned ETLs (DENG-973) (#3883)
* Save current SubPlat ETL views logic in versioned ETLs (DENG-973).

* Add `incremental` labels to the new tables.

* List all CJMS ETLs to dryrun-skip rather than using `glob`.

The `glob` approach doesn't currently work well with the CI staging process.
2023-06-02 19:53:49 +00:00
Lucia dd4789c8aa
DENG-970 Only Glean in Focus Android view. (#3877)
* DENG-970 Only Glean in Focus Android view.

* DENG-970 Only Glean in Focus Android view.

* DENG-970 Only Glean in Focus Android view.

* DENG-970 Only Glean in Focus Android view.

* DENG-970 Only Glean in Focus Android view.

* DENG-970 Only Glean in Focus Android view.

* DENG-970 CI fix

* DENG-970 CI failure fix. Related to issue 3889.

* Fix UDF dependencies deploy on stage

* DENG-970 Revert specific calling to dataset for UDF.

---------

Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
Co-authored-by: Brad Ochocki <brad.ochocki@gmail.com>
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-06-02 16:37:40 +00:00
Sean Rose f52700dcfe
Format transaction statements properly (#3892)
* Format transaction statements properly.

* Test transaction statement formatting.
2023-06-02 16:36:11 +00:00
Alexander 19bcffa8f7
During stage don't rename test dependencies that have already been renamed (#3890) 2023-06-02 15:13:38 +00:00
skahmann3 3c1cbf5a98
[RS-727] Add Sponsored Tiles server-side fill rate telemetry (#3872)
* Create table

* Create a view for sponsored_tiles_ad_req_fill

* skip the view from deploy stage CI check

* Delete metadata.yaml

* delete query.sql

---------

Co-authored-by: Alekhya Kommasani <akommasani@mozilla.com>
Co-authored-by: Alekhya <88394696+alekhyamoz@users.noreply.github.com>
2023-06-01 13:56:58 -04:00
Winnie Chan 071c53e4cb
DENG-803/805: Create & Validate backfill cli commands (#3760)
* Added backfill create and validate cli ommand

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>
2023-06-01 10:06:09 -07:00
Marlene Hirose c08f21c2d5
add csv recognition to tooling (#3881)
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-06-01 09:45:44 -07:00
Sean Rose 3f4f5a7f94
Increase task name limit from 62 characters to 250 characters (#3876)
The 62 character limit was due to a Kubernetes pod label limit, which has been worked around as of Airflow 2.0.1.
2023-06-01 09:13:38 -07:00
Anna Scholtz 94d28a329f
Review for #3787 (#3791)
* Bug 1823627 - Normalize the channel based on probeinfo data in UNIONized views

* Handle fenix channel normalization for app pings

* Parallelize stage schema deploys

* Fix schema field order

---------

Co-authored-by: Jan-Erik Rediger <jrediger@mozilla.com>
2023-06-01 07:22:27 -07:00
kik-kik 4af8068912
added new fxa tables to shredder config (#3871) 2023-06-01 14:07:21 +02:00
kik-kik 46daa24670
feat(DENG-789): making apple ads data accessible (#3847)
* added apple_ads_derived for copying over apple_ad data from the fivetran dataset, and apple_ads views now read from it

* added bqetl_fivetran_apple_ads.py DAG responsible for copying apple_ads data from the fivetran project over to moz-fx-data-shared-prod

* now dryrun skips apple_ads_derived instead of apple_ads as the query now accesses restricted dataset

* added schema files for apple_ads_derived datasets

* added descriptions to schema.yaml files for apple_ads_derived namespace

* added dataset_metadata for apple_ads_derived to include a link to the dbt transformations

* fixed apple_ads view definitions

* removed application label and referenced_tables section inside metadata.yaml for apple_ads as requsted by srose in PR#3847

* corrected source project for apple_ads views

* renamed apple_ads_derived to apple_ads_external

* added * to apple_ads_external namespace name to skip in the dryrun due to integration test deployment

* made tweaks to apple_ads and apple_ads_external datasets/namespaces as requested by whd

* updated apple_ads_external skip rule to the way it is meant to be defined, this will work once a fix is rolled out for dryrun

* fixed dag bqetl_fivetran_apple_ads description and updated the schedule to run once a day
2023-05-26 17:32:19 +02:00
Lucia 38731a440b
RS-722 Remove task name from printed message in DAG generation (#3859)
* RS-722 Remove task_name from dag generation when it is not available.

* RS-722 Reformat files.

---------

Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
2023-05-25 12:07:41 -04:00
Anna Scholtz 46397f42b5
Deploy UDF references to stage for views (#3849) 2023-05-23 13:10:40 -07:00
Leli f17ba7ad25
add input and output parameters to udf and stored procedure documenattion (#3843)
* rename udf_functions to mozfun_doc_functions and add edgecases

* refactor generate_mozfun_docs

* add parameters to stored procedures

* bolden input and output
2023-05-23 18:20:15 +02:00