Граф коммитов

15 Коммитов

Автор SHA1 Сообщение Дата
Anthony Miyaguchi e13ee14db7
Set build_date to null instead of defaulting to glob (#1145) 2020-07-08 14:07:03 -07:00
Anthony Miyaguchi 684380d37e
Add build date to glam_etl templates (#1109)
* Add updates to the glam fenix etl

* Update extract functions to include build date

* Add build_date to exports
2020-07-07 14:14:14 -07:00
Anthony Miyaguchi 1f7c218729
[glam-etl] Use transpose logic for fenix extracts (#1011) 2020-05-28 11:57:55 -07:00
Anthony Miyaguchi 2e68006911
[glam-etl] Use static combinations and avoid double counting (#1002)
* Use compact notation for static combinations

* Avoid double counting by moving cubing into bucket counts

* Modify udf_merged_user_data for reuse outside of clients aggregates

* Duplicate rows from all combos by client id
2020-05-28 11:53:03 -07:00
Anthony Miyaguchi afe7732fc8
Bug 1639345 - Fix ping type using static name from table id (#999)
* Add literal for ping type and remove channel filter for fenix

* Add static ping-type and remove channel filter

* Add docstring for utility function
2020-05-21 13:12:52 -07:00
Anthony Miyaguchi a46197d268
Change naming convention of sql/glam_etl (#912)
* Rename all queries to use org_mozilla_fenix__

* Change all references within jobs to use new prefix

* Update run_glam_sql for new naming convention

* Rename probe counts and histogram counts

* Rename generate_fenix_sql to generate_sql

* Fix lint error and add new views to publish_view ignore

* Add updated clients daily queries

* Add consistent usage of header for better BigQuery job logs

* Add parameters for product to generate SQL for.

* Generate schemas in parallel

* Rename generate_sql to generate_glean_sql
2020-05-21 11:54:36 -07:00
Anthony Miyaguchi 82a6a5f687
Fenix exports for GLAM (#870)
* Add views for extracting to glam

* wip: Add export script

* Rename extract queries and don't run them

* Add user counts

* Add generated sql

* Update extract queries to rely on current project

* Fix optional day partitioning

* Fix extraction to glam-fenix-dev project

* Add globs for ignoring dryrun and format

* Reorder columns in probe count extract

* Filter on null channel and version

* Do not print header

* Refactor datacube groupings and fix scalar_percentiles

* Rename extract tables

* Convert user counts into a view to avoid needless materialization

* Rename client probe counts to probe counts

* Update publish_views so job does not fail
2020-04-14 11:45:59 -07:00
Anthony Miyaguchi 6e3351272e
Cleanup GLAM etl queries for Fenix (#851)
* Rename scalar_aggregates_incremental and remove telemetry

* Remove ping-type and telemetry variables

* Add initial files for final views

* Add glam templates to format ignore

* Add blacked scalar_percentiles

* Generate view for scalar aggregates

* Add generated view for daily histogram aggregate view

* Add probe counts to generated views

* Generalize writing out queries

* Add a latest_versions to generate

* ADd histogram_percentiles to generate

* Add scalar bucket counts to generate

* Add histogram bucket counts to generate

* Add clients scalar aggregates to generate

* Move probe counts to generate

* Add scalar percentiles to generate

* Add clients histogram aggregates to generate

* Use generate within generate_fenix_sql script

* Fix probe_counts view and bucket counts

* Rename template for scalar bucket counts

* Fix client probe counts view

* Remove irrelevant telemetry where clause

* Add docstrings and shorten lines

* Add probe counts view to dryrun and publish_views ignores

* Rename udf for merged user data

* Use python3
2020-04-03 14:17:32 -07:00
Anthony Miyaguchi 33160eea96
Add histogram percentiles for Fenix data into GLAM (#829)
* Add initial histogram_percentiles

* Update metric_type with histogram_type suffix

* Add generated SQL (backwards incompatible)

* Add body for copy of histogram_percentiles_v1`

* Update histogram_percentiles with Glean specific metrics

* Add histogram_percentiles module

* Uncomment histogram percentiles

* Add generated SQL

* Add template to ignore section of format_sql

* Add histogram percentiles to ignore of dryrun

* Move udf into persistent_udf directory

* Rewrite udf_js.glean_percentile

* Add generated SQL
2020-03-25 10:17:21 -07:00
Anthony Miyaguchi 4f0080559a
Add histograms to fenix glam etl (#766)
* Add initial template for histogram aggregates

* Factor out common functions and get all distributions

* Add viable query for histogram aggregates

* Add more efficient aggregation

* Update header and update comment

* Add code to generate clients daily histograms

* Add queries for generated sql

* Return non-zero exit code when histograms not found

* Delete empty queries to reduce data scanned

* Add non-zero exit code for scalars if probes are not found

* Sort histograms for stable output

* Add view for histogram aggregates

* Add initial sql for histogram aggregates

* Format template scripts

* Add mostly reformatted sql for aggregates

* Update histogram aggregates before adding statements

* Fix up details for daily aggregation

* Add completed histograms template

* Add code to generate clients histogram aggregates

* Add init for clients histogram aggregates

* Remove sample_id from set of attributes

* Add sections to run generated sql

* Add generated sql

* Remove extra latest_version columns

* Fix many small issues during first draft of sql

* Fix clients histogram aggregates

* Add initial modification to probe counts

* Add histogram bucket counts

* Add option to generate histogram probe counts

* Update generated_fenix_sql for histograms

* Add generated sql

* Update run_fenix_sql

* Fix bucket counts

* Update source table for probe counts

* Add missing ping_type to histograms

* Add first,last,num buckets

* Update probe counts so it succeeds

* Add mozilla_schema_generator to dependencies

* Add metadata from probe-info for custom distributions

* Update probe counts with metadata for custom distributions

* Add UDF for generating functional buckets

* Add proper bucketing by including range_max of measures

* Format histogram udfs

* Add updated templates to skip

* Add new queries to dryrun ignore

* Add view to the publish ignore list

* Fix python linting

* Remove old comments from probe counts

* Do not count metadata from custom distributions twice

* Remove sum from histogram aggregates

* Add generated SQL

* Add sample_id to histograms earlier in pipeline

* Add generated SQL

* Add comments to functional bucketing for metrics
2020-03-18 13:53:28 -07:00
Anthony Miyaguchi 4e773ba6eb
Simplify scalar aggregates for glam-fenix etl (#767)
* Update daily aggregates to run all scalars in a single query

* Update generate and run script for new scalar aggregates

* Update generated sql (and view)

* Fix linting

* Update SKIP for format
2020-02-26 11:22:20 -08:00
Anthony Miyaguchi 4f2d5dd51d
Add script for intitial scheduling (#757)
* Update run script to be more generic

* Update run script with parameters and avoid reusing destination

* Add view for clients_daily_scalar_aggregates

* Add new view to dryrun

* Remove old comment
2020-02-21 11:13:20 -08:00
Anthony Miyaguchi d0b71bcefd
End-to-end Fenix scalar aggregates (#743)
* Refactor render into a separate function

* Add variables for source and destination tables

* Add support for aggregating glean pings

* Add render_init along with --init option

* Add partition clause and add proper init file

* Add attributes_type to the template

* Update clients_scalar_aggregates_v1 with dataset.table

* Add command for generating init for fenix scalars aggregates

* Add queries for fenix_clients_scalar_aggregates_v1

* Update partititioning in init

* Update glam scripts for scalar aggregates

* Update version to only include valid versions

* Add generated sql

* Add --quiet flag

* Add notes

* Fix linting and CI errors

* Ignore glam_etl in dryrun

* Add initial template files that have been formatted

* Update generated queries

* Add metric counts for histogram and scalars

* Update metric_counts_v1 for scalars only

* Add formatted version of telemetry_derived/clients_scalar_bucket_counts_v1

* Add module for generating metric bucketing

* Refactor generate_fenix_sql for skipping stages

* Add templates to format SKIP

* Fix trailing whitespace

* Add option to generate fenix bucket/probe counts

* Add initial bucket/probe counts sql for fenix

* Sort attributes for stable query generation

* Refactor bucketing logic

* Add scalar_metric_types variable

* Add argument parser and glean variables

* Update scalar bucket counts for glean

* Update run_fenix_sql with bucket counts

* Fix invalid syntax

* Do not aggregate booleans as a scalar

* Add scalar_metric_types to metric_counts_v1

* Add argparser and change source tablename to scalar

* Update fenix_clients_scalar_probe_counts_v1

* Remove first_bucket

* Add scalar_probe_counts to run script

* Removing first_bucket requires changing where clause conditional

* Get grouping attributes correct

* Give columns stable ordering

* Add correct query (that is too complex)

* Reduce number of combinations

* Simplify logic for null values

* Cast booleans instead of when clause

* Format

* Rename files to avoid confusion

* Add initial scalar_percentiles

* Add initial files for scalar_percentiles

* Add scalar_percentiles for fenix

* Add scalar_percentiles to run script

* Add problematic files to SKIP in format and dryrun

* Add installation ping

* Fix missing merge item

* Add missing newlines

* Reduce set of grouped attributes

* Factor out boolean_metric_types
2020-02-19 13:43:53 -08:00
Anthony Miyaguchi 0d892cba4e
Add scalar aggregates from clients daily scalar aggregates for Fenix (#735)
* Refactor render into a separate function

* Add variables for source and destination tables

* Add support for aggregating glean pings

* Add render_init along with --init option

* Add partition clause and add proper init file

* Add attributes_type to the template

* Update clients_scalar_aggregates_v1 with dataset.table

* Add command for generating init for fenix scalars aggregates

* Add queries for fenix_clients_scalar_aggregates_v1

* Update partititioning in init

* Update glam scripts for scalar aggregates

* Update version to only include valid versions

* Add generated sql

* Add --quiet flag

* Add notes

* Fix linting and CI errors

* Ignore glam_etl in dryrun

* Add latest_versions template

* Add generated code for latest versions

* Update header

* Add latest versions to run script

* Update version filter using fenix_latest_versions_v1
2020-02-19 10:51:22 -08:00
Anthony Miyaguchi f32f866129
Bug 1610983 - Add clients daily scalar aggregates for GLAM in Fenix (#724)
* Add copy of clients_daily_scalar_aggregates for fenix

* Change table to Fenix metrics ping and modify columns

* Modify get_scalar_probes to fetch the relevant list of metrics

* Remove logic for keyed booleans

* Add valid generated SQL for scalars

* Generate valid keyed_scalars

* Factor out attributes into reusable string

* Use the bigquery-etl formatter

* Add `--no-parameterize` flag for debugging in console

* Add option for table_id

* Add comma conditionally

* Add script to run against all Glean pings in dataset

* Move scripts into appropriate locations

* Use stable tables as source for generate script

* Report glean metric types instead of scalar/keyed-scalar

* Fix linting

* Add script to generate sql for each table in org_mozilla_fenix

* Add generated sql

* Rename script for running etl in testing environment

* Update run script to use generated sql

* Fix missing --table-id parameter

* Update header comment in script

* Update generated sql

* Add ping_type to list of attributes

* Update generated schemas
2020-02-06 14:01:24 -08:00