* Add statements to generate glam queries for fenix
* Use newlines in single string for multiple products
* Move glam generation into generate_sql script
* Add documentation on ignoring target project
* Add migration script for joining against first seen table
* Update logic for is_new_profile
* Update templates to use DDL with partitioning/clustering
* Fix output of migrate tables to backfill-8
* Add instructions for backfilling
* Fix linting errors
* Add first_seen_date and related test fixtures
* Use is_new_profile instead of baseline_first_seen
* Update view for baseline_clients_first_seen
* Fix yamllint issues
* Set is_new_profile when submission matches first seen
* Include AS in table alias
* Nit: capitalize AS
* Update bigquery_etl/glean_usage/templates/baseline_clients_daily_v1.sql
Co-authored-by: Jeff Klukas <jklukas@mozilla.com>
* Update bigquery_etl/glean_usage/templates/baseline_clients_daily_v1.sql
Co-authored-by: Jeff Klukas <jklukas@mozilla.com>
* Update clustering specification
Co-authored-by: Jeff Klukas <jklukas@mozilla.com>
* Move argument parser into shared function
* Move shared main entrypoint into common
* Update example script to include other usage queries
* Commit generated queries for example usage queries
* Parallelize generation of example queries
* Add docstring
* Remove ios example queries for daily and last seen
* Fix pydocstyle linting
* Add update_example_glean_usage to CI
* Add initial boilerplate for clients_first_seen
* Remove submission_timestamp as a field
* [wip] Join data against legacy fennec id if applicable
* Remove user facing view
* Revert "Remove user facing view"
This reverts commit a728a7882170eadad5413c7a7046c0f38297bb87.
* Add flag for fennec_id
* Update logic to limit rows in partitions to submission_date
* Add all sql in glean_usage to format ignores
* Separate init and query
* Add default encoders for testing sql
* Add test for initialization of baseline clients first seen in fenix
* Update query to update over previous history
* Add test for aggregation
* Add generated sql and tests for simple baseline clients first seen
* Add dry-run exceptions for clients first seen tables
* Add clients first seen to generated sql
* Update bigquery_etl/glean_usage/templates/baseline_clients_first_seen.metadata.yaml
Co-authored-by: Jeff Klukas <jklukas@mozilla.com>
* Update bigquery_etl/glean_usage/templates/baseline_clients_first_seen.metadata.yaml
Co-authored-by: Jeff Klukas <jklukas@mozilla.com>
* Group by sample id instead of min
* Add submission_date as baseline first seen date
Co-authored-by: Jeff Klukas <jklukas@mozilla.com>
* Use dataset labels to speed up stable view generation
Builds on new dry run affordance from
https://github.com/mozilla/bigquery-etl/pull/1858
We also remove the `--no-dry-run` option now since only the single dry run
is now needed, and stable view generation completes in less than 2 seconds.
* Add CI task to push content to generated-sql branch
Fixes#1742
The
[`generated-sql`](https://github.com/mozilla/bigquery-etl/tree/generated-sql)
branch now exists and you can browse the contents. See, for example,
[telemetry.main](https://github.com/mozilla/bigquery-etl/tree/generated-sql/sql/moz-fx-data-shared-prod/telemetry/main)
Follow-ups for which I'll file issues:
- This doesn't currently publish the generated Glean baseline ETL queries
and views; we'll need to update that logic to use probe-scraper metadata
rather than listing tables in BigQuery (due to creds) to integrate it.
- Docs publishing should reference this generated content rather
* Script for validating view definitions
* Add SKIP list for view validation
* Add view validation step to CI
* Regex for validating referenced tables in view definitions
* Add glean_usage ETL generation to generate_all_views
The new `generate_all_views` script is intended to replace `generate_views`
as the entrypoint for Jenkins. Its usage is demonstrated in the
`generate_and_publish_views` script.
This supports the move to user queries in the `mozdata` project.
* Add --user-facing-only
Co-authored-by: whd <whd@users.noreply.github.com>
* Experiment enrollment aggregates hourly
* Experiment enrollments recents query
* Add execution_delay support for tasks
* Experiment enrollment aggregates base query
* Schedule experiment enrollment cumulative population estimate and active population
* Experiment enrollment monitoring queries as views
* Script for exporting experiment monitoring data to GCS
* Export experiment monitoring data script aggregating data of longer running experiments
* Parallelize experiment monitoring data export
* init.sql for experiment enrollment monitoring queries
* Use Airflow ds_format macro for hourly destination table
* Use Airflow macros for experiments monitoring hourly execution delay
* experiment_enrollment_cumulative_population_estimate as query instead of view
* Fix referenced tables in enrollment_aggregates_hourly metadata and add comment
* Simplify cumulative population estimate query
* Add script to determine query dependencies
* Add schemas and folders for minimal test
* Add schema for geckoview_versions
* Add query params to each query
* Update schema for new queries
* Remove main from bootstrap file
* Add dataset prefix to schemas
* Add failing test for clients_histogram_aggregates
It turns out that the dependency resolution I'm using for autogenerate
the schemas is ignoring the views. I actually want to keep the views
around. The tables also all need to be prefixed with the dataset name or
they won't be inserted into the sql query correctly.
* Add successful test for clients histogram aggregates
* Add minimal tests for clients_scalar_aggregates
* Remove skeleton files for views (no test support for views)
* Add tests for latest versions
* Add tests for scalar bucket counts that passes
* Add scalar bucket counts
* Add test for scalar percentiles
* Add test for histogram bucket counts
* Add passing test for probe counts
* Add test for histogram percentiles
* Add tests for extract counts
* Update readme
* Add data for scalar percentiles test
* Fix linting errors
* Fix mypy issues with tests module
* Name it data instead of tests.*.data
* Ignore mypy on tests directory
* Remove mypy section
* Remove extra line in pytest
* Try pytest invocation of mypy-scripts-are-modules
* Run mypy outside of pytest
* Use exec on pytest instead of mypy
* Update tests/sql/glam-fenix-dev/glam_etl/bootstrap.py
Co-authored-by: Ben Wu <benjaminwu124@gmail.com>
* Update tests/sql/glam-fenix-dev/glam_etl/README.md
Co-authored-by: Ben Wu <benjaminwu124@gmail.com>
* Document bootstrap in documentation
* Use artificial range for histogram_percentiles
* Simplify parameters for scalar probe counts
* Simplify tests for histogram probe counts
* Add test for incremental histogram aggregates
* Update scalar percentile counts to count distinct client ids
* Update readme for creating a new test
* Use unorded list for sublist
* Use --ignore-glob for pytest to avoid data files
Co-authored-by: Ben Wu <benjaminwu124@gmail.com>
* Resolve generated sql to glam-fenix-dev and change output in sql/ dir
* Add new script for testing glam-fenix queries
* Add generated sql for version control
* Use variables correctly in bash
* Remove latest versions from UDF
* Update test to generate minimum set of tables for nightly
* Commit generated queries for testing
* Cast only if not glob
* Ignore dryrun and publish view for glam-fenix-dev
* Fix linting error
* Update comments
* Use DST_PROJECT consistently in scripts
* Update comments
* Update script/glam/test/test_glean_org_mozilla_fenix_glam_nightly
Co-authored-by: Ben Wu <benjaminwu124@gmail.com>
* Update script/glam/generate_and_run_desktop_sql
Co-authored-by: Ben Wu <benjaminwu124@gmail.com>
Co-authored-by: Ben Wu <benjaminwu124@gmail.com>
* Make versions to keep configurable
* Replace app_version with app_build_id in nightly
* Add jsonschema as a requirement=
* Filter based on build date instead of version for nightly
* Add script for comparing the output of two branches
* Add option for specifying the bucket in export
* Cast build_id to integer
* Remove latest versions from histogram aggregates
* Format logical_app_id
* Use @submission_date parameter in latest versions
* Add glam cli for listing processed app ids
* Make backfill scripts more consistent
* Add export to glam glean cli
* Add pandas dependency
* Add black format of glam-cli
* Commit hashes based on bigquery-etl container
* Fix various linting issues
* Be stricter with is_logical matching
* Fix more linting issues
* Add scripts for backfilling and exporting all fenix aggregates
* Update script/glam/export_glean_all_fenix
Co-authored-by: Ben Wu <benjaminwu124@gmail.com>
Co-authored-by: Ben Wu <benjaminwu124@gmail.com>
* Add views for logical app ids
* Add new generated sql
* Update generate_glean_sql script to handle logical apps
* Update logical app view for partitiontime
* Make sure to generate view for all of the app ids
* Update last versions to be logical app id agnostic
* Add formatting for black
* Fix linting error
* Update bigquery_etl/glam/generate.py
Co-authored-by: Ben Wu <benjaminwu124@gmail.com>
* Add "all" option to STAGE
* Add new metrics added since last PR
Co-authored-by: Ben Wu <benjaminwu124@gmail.com>