* Define `event_monitoring_live_v1` views in `view.sql` files.
So they get automatically deployed by the `bqetl_artifact_deployment.publish_views` Airflow task.
* Support materialized views in view naming validation.
* Handle `IF NOT EXISTS` in view naming validation.
* Use regular expression to extract view ID in view naming validation.
This simplifies the logic and avoids a sqlparse bug where it doesn't recognize the `MATERIALIZED` keyword.
* Update other view regular expressions to allow for materialized views.
* Skip backfills for queries without metadata.yaml
* Support date_partition_offset
* Fixed exclude, modified exception
* Add test for offset backfill
* Apply suggestions from code review
Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>
* Formatting
---------
Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>
* added firefox_ios_derived.clients_activation_v1 and corresponding view
* fixing a missing seperator in firefox_ios_derived.clients_activation_v1 checks
* adding firefox_ios_derived.clients_activation_v1 to shredder configuration
* removed is_suspicious_device_client as it should not be there, thanks bani for pointing this out
* fixed black formatting error inside shredder/config.py
* applied bqetl formatting
* minor styling tweak as suggested by bani in PR#4631
* deleting fenix_derived/firefox_android_clients_v2, v1 will remain the active model
* removed fenix_derived.firefox_android_clients_v2 from shredder config
* Add support for assigning Airflow tasks to task groups
* Generate separate Airflow tasks for glean_usage
* Remove Airflow dependencies from old glean_usage tasks
* Glam - fix legacy windows & release probes' sample count going fwd
* Glam FOG accounts for sampling when calculating total_sample for windows & release probes
* fog - fix client count and sample count
* Add channel filtering for fog
* Fix checks to filter on partitions
* Don't print "missing checks file" on success
Previously, the statement that checks.sql files
were missing was printed on any execution of the for
statement. ("else" clauses after "for"s execute after
completion of the "for" clause).
Instead, we want to print only when there are no files.
* Fill empty description
* Assign a friendly name if the table doesn't have one
* Update metadata tests
* Update bigquery_etl/metadata/parse_metadata.py
Co-authored-by: Alexander <anicholson@mozilla.com>
* update test again
---------
Co-authored-by: Alexander <anicholson@mozilla.com>
* Generate normal task dependencies from `depends_on` if the task is in the same DAG.
* Update `metadata.yaml` files to use `depends_on` rather than `upstream_dependencies`.
* DS-3054. Create functions to support running an initialization query for all sample_ids in parallel.
* DS-3054. Update _run_query function.
* DS-3054. Use _run_query and mapped values for initialization in parallel.
* DS-3054. Unify initialization to run in parallel and get sample_id range from metadata.
* DS-3054. Minimize formatting of query template and remove need to modify existing initialization queries. Validate if a query should use parallelized or regular update.
* DS-3054. Adding link to caveats.
* DS-3054. Update sample_id range for initialization.
* DS-3054. Use current implementation of run_query.
* DS-3054. Update using a parameter instead of initialization in metadata.
* DS-3054. DAG update with new parameter.
* Pass parameters before calling _run_query().
* Use --append_tablein favour of INSERT INTO.
* DS-3054 Separate parallel and non parallel init, plus some improvements.
---------
Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
* Put assert UDFs in `mozfun` project.
* Tweak syntax in `assert.array_equals()` to avoid SQLGlot parsing error.
https://github.com/tobymao/sqlglot/issues/2348
* Fix SQL syntax error in `assert.struct_equals()` tests.
* Fix UDF dependency file path logic when deploying to stage.
* Change regular expressions in `parse_routine` module to allow quotes around routines' dataset and name.
* Respect sql_dir in dryrun skip
* Update bigquery_etl/dryrun.py
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* Update bigquery_etl/dryrun.py
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* Set sql_dir when using Schema.from_query_file()
---------
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
* Fix publishing udfs that use backticks in identifiers
* Update bigquery_etl/routine/parse_routine.py
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>
---------
Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>