bigquery-etl

Граф коммитов

Автор	SHA1	Сообщение	Дата
Anthony Miyaguchi	e13ee14db7	Set build_date to null instead of defaulting to glob (#1145 )	2020-07-08 14:07:03 -07:00
Anthony Miyaguchi	684380d37e	Add build date to glam_etl templates (#1109 ) * Add updates to the glam fenix etl * Update extract functions to include build date * Add build_date to exports	2020-07-07 14:14:14 -07:00
Anthony Miyaguchi	1f7c218729	[glam-etl] Use transpose logic for fenix extracts (#1011 )	2020-05-28 11:57:55 -07:00
Anthony Miyaguchi	2e68006911	[glam-etl] Use static combinations and avoid double counting (#1002 ) * Use compact notation for static combinations * Avoid double counting by moving cubing into bucket counts * Modify udf_merged_user_data for reuse outside of clients aggregates * Duplicate rows from all combos by client id	2020-05-28 11:53:03 -07:00
Anthony Miyaguchi	afe7732fc8	Bug 1639345 - Fix ping type using static name from table id (#999 ) * Add literal for ping type and remove channel filter for fenix * Add static ping-type and remove channel filter * Add docstring for utility function	2020-05-21 13:12:52 -07:00
Anthony Miyaguchi	a46197d268	Change naming convention of sql/glam_etl (#912 ) * Rename all queries to use org_mozilla_fenix__ * Change all references within jobs to use new prefix * Update run_glam_sql for new naming convention * Rename probe counts and histogram counts * Rename generate_fenix_sql to generate_sql * Fix lint error and add new views to publish_view ignore * Add updated clients daily queries * Add consistent usage of header for better BigQuery job logs * Add parameters for product to generate SQL for. * Generate schemas in parallel * Rename generate_sql to generate_glean_sql	2020-05-21 11:54:36 -07:00
Anthony Miyaguchi	82a6a5f687	Fenix exports for GLAM (#870 ) * Add views for extracting to glam * wip: Add export script * Rename extract queries and don't run them * Add user counts * Add generated sql * Update extract queries to rely on current project * Fix optional day partitioning * Fix extraction to glam-fenix-dev project * Add globs for ignoring dryrun and format * Reorder columns in probe count extract * Filter on null channel and version * Do not print header * Refactor datacube groupings and fix scalar_percentiles * Rename extract tables * Convert user counts into a view to avoid needless materialization * Rename client probe counts to probe counts * Update publish_views so job does not fail	2020-04-14 11:45:59 -07:00
Anthony Miyaguchi	6e3351272e	Cleanup GLAM etl queries for Fenix (#851 ) * Rename scalar_aggregates_incremental and remove telemetry * Remove ping-type and telemetry variables * Add initial files for final views * Add glam templates to format ignore * Add blacked scalar_percentiles * Generate view for scalar aggregates * Add generated view for daily histogram aggregate view * Add probe counts to generated views * Generalize writing out queries * Add a latest_versions to generate * ADd histogram_percentiles to generate * Add scalar bucket counts to generate * Add histogram bucket counts to generate * Add clients scalar aggregates to generate * Move probe counts to generate * Add scalar percentiles to generate * Add clients histogram aggregates to generate * Use generate within generate_fenix_sql script * Fix probe_counts view and bucket counts * Rename template for scalar bucket counts * Fix client probe counts view * Remove irrelevant telemetry where clause * Add docstrings and shorten lines * Add probe counts view to dryrun and publish_views ignores * Rename udf for merged user data * Use python3	2020-04-03 14:17:32 -07:00
Anthony Miyaguchi	33160eea96	Add histogram percentiles for Fenix data into GLAM (#829 ) * Add initial histogram_percentiles * Update metric_type with histogram_type suffix * Add generated SQL (backwards incompatible) * Add body for copy of histogram_percentiles_v1` * Update histogram_percentiles with Glean specific metrics * Add histogram_percentiles module * Uncomment histogram percentiles * Add generated SQL * Add template to ignore section of format_sql * Add histogram percentiles to ignore of dryrun * Move udf into persistent_udf directory * Rewrite udf_js.glean_percentile * Add generated SQL	2020-03-25 10:17:21 -07:00
Anthony Miyaguchi	4f0080559a	Add histograms to fenix glam etl (#766 ) * Add initial template for histogram aggregates * Factor out common functions and get all distributions * Add viable query for histogram aggregates * Add more efficient aggregation * Update header and update comment * Add code to generate clients daily histograms * Add queries for generated sql * Return non-zero exit code when histograms not found * Delete empty queries to reduce data scanned * Add non-zero exit code for scalars if probes are not found * Sort histograms for stable output * Add view for histogram aggregates * Add initial sql for histogram aggregates * Format template scripts * Add mostly reformatted sql for aggregates * Update histogram aggregates before adding statements * Fix up details for daily aggregation * Add completed histograms template * Add code to generate clients histogram aggregates * Add init for clients histogram aggregates * Remove sample_id from set of attributes * Add sections to run generated sql * Add generated sql * Remove extra latest_version columns * Fix many small issues during first draft of sql * Fix clients histogram aggregates * Add initial modification to probe counts * Add histogram bucket counts * Add option to generate histogram probe counts * Update generated_fenix_sql for histograms * Add generated sql * Update run_fenix_sql * Fix bucket counts * Update source table for probe counts * Add missing ping_type to histograms * Add first,last,num buckets * Update probe counts so it succeeds * Add mozilla_schema_generator to dependencies * Add metadata from probe-info for custom distributions * Update probe counts with metadata for custom distributions * Add UDF for generating functional buckets * Add proper bucketing by including range_max of measures * Format histogram udfs * Add updated templates to skip * Add new queries to dryrun ignore * Add view to the publish ignore list * Fix python linting * Remove old comments from probe counts * Do not count metadata from custom distributions twice * Remove sum from histogram aggregates * Add generated SQL * Add sample_id to histograms earlier in pipeline * Add generated SQL * Add comments to functional bucketing for metrics	2020-03-18 13:53:28 -07:00
Anthony Miyaguchi	4e773ba6eb	Simplify scalar aggregates for glam-fenix etl (#767 ) * Update daily aggregates to run all scalars in a single query * Update generate and run script for new scalar aggregates * Update generated sql (and view) * Fix linting * Update SKIP for format	2020-02-26 11:22:20 -08:00
Anthony Miyaguchi	4f2d5dd51d	Add script for intitial scheduling (#757 ) * Update run script to be more generic * Update run script with parameters and avoid reusing destination * Add view for clients_daily_scalar_aggregates * Add new view to dryrun * Remove old comment	2020-02-21 11:13:20 -08:00
Anthony Miyaguchi	d0b71bcefd	End-to-end Fenix scalar aggregates (#743 ) * Refactor render into a separate function * Add variables for source and destination tables * Add support for aggregating glean pings * Add render_init along with --init option * Add partition clause and add proper init file * Add attributes_type to the template * Update clients_scalar_aggregates_v1 with dataset.table * Add command for generating init for fenix scalars aggregates * Add queries for fenix_clients_scalar_aggregates_v1 * Update partititioning in init * Update glam scripts for scalar aggregates * Update version to only include valid versions * Add generated sql * Add --quiet flag * Add notes * Fix linting and CI errors * Ignore glam_etl in dryrun * Add initial template files that have been formatted * Update generated queries * Add metric counts for histogram and scalars * Update metric_counts_v1 for scalars only * Add formatted version of telemetry_derived/clients_scalar_bucket_counts_v1 * Add module for generating metric bucketing * Refactor generate_fenix_sql for skipping stages * Add templates to format SKIP * Fix trailing whitespace * Add option to generate fenix bucket/probe counts * Add initial bucket/probe counts sql for fenix * Sort attributes for stable query generation * Refactor bucketing logic * Add scalar_metric_types variable * Add argument parser and glean variables * Update scalar bucket counts for glean * Update run_fenix_sql with bucket counts * Fix invalid syntax * Do not aggregate booleans as a scalar * Add scalar_metric_types to metric_counts_v1 * Add argparser and change source tablename to scalar * Update fenix_clients_scalar_probe_counts_v1 * Remove first_bucket * Add scalar_probe_counts to run script * Removing first_bucket requires changing where clause conditional * Get grouping attributes correct * Give columns stable ordering * Add correct query (that is too complex) * Reduce number of combinations * Simplify logic for null values * Cast booleans instead of when clause * Format * Rename files to avoid confusion * Add initial scalar_percentiles * Add initial files for scalar_percentiles * Add scalar_percentiles for fenix * Add scalar_percentiles to run script * Add problematic files to SKIP in format and dryrun * Add installation ping * Fix missing merge item * Add missing newlines * Reduce set of grouped attributes * Factor out boolean_metric_types	2020-02-19 13:43:53 -08:00
Anthony Miyaguchi	0d892cba4e	Add scalar aggregates from clients daily scalar aggregates for Fenix (#735 ) * Refactor render into a separate function * Add variables for source and destination tables * Add support for aggregating glean pings * Add render_init along with --init option * Add partition clause and add proper init file * Add attributes_type to the template * Update clients_scalar_aggregates_v1 with dataset.table * Add command for generating init for fenix scalars aggregates * Add queries for fenix_clients_scalar_aggregates_v1 * Update partititioning in init * Update glam scripts for scalar aggregates * Update version to only include valid versions * Add generated sql * Add --quiet flag * Add notes * Fix linting and CI errors * Ignore glam_etl in dryrun * Add latest_versions template * Add generated code for latest versions * Update header * Add latest versions to run script * Update version filter using fenix_latest_versions_v1	2020-02-19 10:51:22 -08:00
Anthony Miyaguchi	f32f866129	Bug 1610983 - Add clients daily scalar aggregates for GLAM in Fenix (#724 ) * Add copy of clients_daily_scalar_aggregates for fenix * Change table to Fenix metrics ping and modify columns * Modify get_scalar_probes to fetch the relevant list of metrics * Remove logic for keyed booleans * Add valid generated SQL for scalars * Generate valid keyed_scalars * Factor out attributes into reusable string * Use the bigquery-etl formatter * Add `--no-parameterize` flag for debugging in console * Add option for table_id * Add comma conditionally * Add script to run against all Glean pings in dataset * Move scripts into appropriate locations * Use stable tables as source for generate script * Report glean metric types instead of scalar/keyed-scalar * Fix linting * Add script to generate sql for each table in org_mozilla_fenix * Add generated sql * Rename script for running etl in testing environment * Update run script to use generated sql * Fix missing --table-id parameter * Update header comment in script * Update generated sql * Add ping_type to list of attributes * Update generated schemas	2020-02-06 14:01:24 -08:00

15 Коммитов