bigquery-etl

Граф коммитов

Автор	SHA1	Сообщение	Дата
Winnie Chan	61b9b2ffe3	DENG-772: Fenix population with first_session ping and min seq (#4090 ) * Changed to min seq and capture null seq	2023-07-28 08:55:39 -07:00
Eduardo Filho	25d4ab4042	Bug 1844886: Add non-norm cols to glam scalar tbls (#4111 ) * Add non-norm cols to scalar tbls * Add missing schema to scalar_percentiles_v1 * Add missing schema to scalar_percentiles_v1	2023-07-25 11:13:20 -04:00
Eduardo Filho	144a508ee6	Glam non norm agg (#3873 ) * glam: Partition clients_histogram_aggregates by sample_id (has been running like this since April 3 from a different branch) * glam: Non normalized aggregations to legacy histograms * glam: add non-normalized aggs to probe counts extract * glam: add init.sql to relevant tbls for non-norm aggs * glam: ignore dryrun histogram_percentiles * glam: add description and eol to init * glam: Partition clients_histogram_aggregates by sample_id (has been running like this since April 3 from a different branch) * glam: Non normalized aggregations to legacy histograms * glam: add non-normalized aggs to probe counts extract * glam: add init.sql to relevant tbls for non-norm aggs * glam: ignore dryrun histogram_percentiles * glam: add description and eol to init * fix schema files * fix clients_histogram_probe_counts schema * remove another init.sql * fix dryrun ignore order * fix table name * change dryrun ignore order to try avoiding fenix for being on path * another change in dryrun * Move glam queries from dryrun to bqetl_project.yaml to ignore * add tbl deps on tests	2023-07-19 17:01:58 -04:00
Rebecca BurWei	4ffdb8484a	Add has_adblocker_addon to search_clients_daily (#3558 ) * feat: adblocker addons field * Update sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v8/query.sql Co-authored-by: Curtis Morales <cmorales@mozilla.com> * fix: use private table * fix: where clause for private table * Reference static addons table * Switch to new monetization_blocking_addons table * Drop project name in reference to monetization_blocking_addons * Don't dry-run search_clients_daily * Add has_adblocker_addons to search_clients_daily_v8 tests --------- Co-authored-by: Curtis Morales <cmorales@mozilla.com>	2023-07-19 12:04:38 -04:00
Sean Rose	a4fdf3c65e	Include new fields for SubPlat in FxA events views (DENG-1006) (#3926 ) * Include new fields for SubPlat in `fxa_content_auth_stdout_events`. * Include new fields for SubPlat in `nonprod_fxa_content_auth_stdout_events`. * Include new fields for SubPlat in `fxa_all_events`. * Move new `time` column to be by the other timestamp columns. * Keep `subscribed_plan_ids` as a string so it's accessible in Looker. * Add `schema.yaml` files for FxA events ETLs. So the tables can be successfully staged for CI for downstream ETLs/views to pass. * Fully qualify view in `fxa_users_daily_v1` to try to get test to pass. * Rename `time` column `event_time`. * Include new fields for SubPlat in `nonprod_fxa_all_events`. --------- Co-authored-by: Daniel Thorn <dthorn@mozilla.com>	2023-06-23 15:41:13 +00:00
Alexander	8423c7ad2e	Use baseline_clients_daily instead of ping and first_seen for fenix_android_clients (#3910 ) * Change source for first_seen and baseline to baseline_clients_daily * Edit tests and schemas * Update to fenix.baseline_clients_daily	2023-06-08 14:50:14 +00:00
Alexander	505c895f62	GROWTH-41 Add last_seen columns to firefox_android_clients (#3863 ) * Added last_reported columns * Fixed tests * Added missing locale field	2023-06-05 16:37:59 +00:00
Lucia	cbe42ab9a9	Deng 850 firefox android clients reported ping (#3789 ) * DENG-850 Retrieve FALSE instead of NULL in the in metadata when there isn't first_session or metrics ping. * DENG-850 Unitest for no first session ping. * DENG-850 syntax fix * DENG-850 Tests for first session ping, and no baseline ping. * DENG-850 Tests suite. * DENG-850 YAML fixes. * DENG-850 Adjustment to the case of reported first_session and metrics ping. The unitests are adjusted to get the value for reported pings. * DENG-850 Add sample id to test. --------- Co-authored-by: Lucia Vargas <lvargas@mozilla.com>	2023-05-26 12:50:08 +02:00
Alexander	db604e4b3d	DENG-796 - newtab_visits (#3762 ) Add new table - Newtab Visits	2023-05-25 09:32:25 -04:00
Glenda Leonard	a31072d408	DENG-775 downloads_with_attribution_v2 (#3716 ) * DENG-775 Added session_id to JOIN between GA data and stub_attr.stdout. Also expanded date range on GA session data to [download_date - 2 days, download_date + 1 day] * Updated query to handle missing GA download_session_id. It effectively applies V1 logic to the MISSING_GA_CLIENT dl_tokens.	2023-04-26 14:47:34 -04:00
Anna Scholtz	48d8c7603d	Metric hub integration - rewrite SSL ratios to use metrics (#3698 ) * Add metrics.data_source() * Rewrite SSL ratios to use metrics * Fix docs formatting	2023-04-04 15:41:44 -07:00
Glenda Leonard	b13f45bc63	DENG-658 - Initial table definitions for dl_token processing. (#3644 ) * Initial table definitions for dl_token processing. Includes update to sql pytest_plugin to account for tablenames with date suffixes. * Removed cluster reference and shortened description * Added sql/moz-fx-data-marketing-prod/ga_derived/downloads_with_attribution_v1/query.sql to dryrun skip * Added time_on_site * Moved country_names sample test data file. * Update bigquery_etl/pytest_plugin/sql.py Co-authored-by: Daniel Thorn <dthorn@mozilla.com> * Update sql/moz-fx-data-marketing-prod/ga_derived/downloads_with_attribution_v1/query.sql Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com> * Update sql/moz-fx-data-marketing-prod/ga_derived/downloads_with_attribution_v1/query.sql Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com> * Updated based on PR feedback. Added LEFT JOIN to ensure sessions without pageviews are not dropped. * Set has_ga_download_event = null if exception=GA_UNRESOLVABLE * Standardized logic for time_on_site * - Added test for multiple downloads for 1 session - Added detailed description of table. * Updated to use mode_last_retain_nulls instead of ANY_VALUE * Set pageviews, unique_pageviews = 0 if null. * Added boolean additional_download_occurred to indicate if another download occurred in the same session. --------- Co-authored-by: Daniel Thorn <dthorn@mozilla.com> Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>	2023-03-23 16:45:58 -04:00
kik-kik	ed1317cb7d	bug(): added org_mozilla_firefox_derived/client_deduplication_v1 to dry run skip (#3667 ) * added org_mozilla_firefox_derived/client_deduplication_v1 to dry run skip * fixing fxa_users_daily_v1 test	2023-03-16 15:42:04 +00:00
Rebecca BurWei	26366152b9	RS-595 (#3518 ) * feat: new field in search clients daily - is_sap_monetizable * Added column to tests --------- Co-authored-by: Alexander Nicholson <anicholson@mozilla.com>	2023-03-10 12:38:52 -05:00
Alexander	60c85e7c54	Revert CI changes for private UDFs and add stub documentation - DENG-735 (#3652 ) * Revert "CI fixes for supporting private UDFs in bigquery-etl - DENG-735 (#3631)" This reverts commit `edcfe758f7`. * Added stub UDF for monetized_search * Add docs for using a private internal UDF	2023-03-10 11:41:46 -05:00
Alexander	edcfe758f7	CI fixes for supporting private UDFs in bigquery-etl - DENG-735 (#3631 ) * Minimize stub normalize_search_engine UDF and usage in search_clients_last_seen tests * Move sql tests downstream of private-generate-sql and copy UDFs into sql-dir for tests	2023-03-09 16:54:42 -05:00
m-d-bowerman	69c0c1d5fc	[DS-2566] Add sidebar search probes to clients_daily tables (#3629 ) * Add sidebar search probes to clients_daily tables * update downstream schemas * Format fix * Add new field to test * Update schemas with hist fields * Remove duplicated field from schema --------- Co-authored-by: Glenda Leonard <75265513+gleonard-m@users.noreply.github.com>	2023-03-08 18:14:58 -05:00
Leli	05975e88ff	Fivetran remove dev (#3619 ) * remove dev destination * fivetran - daily_connector_costs restructure CTEs for clearer GROUP BY	2023-03-02 13:00:48 +01:00
Leli	5aebf9cae9	fivetran costs change tables (#3599 ) * fivetran costs change tables * fivetran costs - incorporate code review * fivetran costs - change monthly_costs back to coalesce * fivetran costs - fix query	2023-02-27 19:26:21 +01:00
Leli	5108c2d307	add daily_active_rows to fivetran_costs (#3570 ) * add daily_active_rows to fivetran_costs * regenerate dag * regenerate dag with black 23.1.0	2023-02-08 13:45:49 +01:00
Daniel Thorn	ac053c326a	Update dependencies missed by dependabot (#3566 )	2023-02-06 12:14:32 -08:00
Curtis Morales	570f3bbab3	Fix urlbar_clients_daily_v1 (#3549 ) * Fix urlbar_clients_daily_v1 * Rename test file * Fix new name * Rename one more test case file	2023-02-01 10:57:57 -05:00
Leli	d161c02ef1	Fix Fivetran Costs calculations (#3547 ) * Add fivetran_costs * Add fivetran_costs - adding schemas to all tables and adressing other suggestions * Add fivetran_costs - renaming tables, adding to the schema * Add fivetran_costs - adding fivetran-dev * Add fivetran_costs - adding tests * Add fivetran_costs - adding tests * Add fivetran_costs - adding tests * Add fivetran_costs - adding tests * implementing suggestions * rerun dag creation * fixing the tests * fixing errors * change position of rounding	2023-02-01 13:30:09 +01:00
Leli	8b0158c8dc	Add fivetran_costs (#3509 ) * Add fivetran_costs * Add fivetran_costs - adding schemas to all tables and adressing other suggestions * Add fivetran_costs - renaming tables, adding to the schema * Add fivetran_costs - adding fivetran-dev * Add fivetran_costs - adding tests * Add fivetran_costs - adding tests * Add fivetran_costs - adding tests * Add fivetran_costs - adding tests * implementing suggestions * rerun dag creation	2023-01-31 16:27:29 +01:00
kik-kik	794ae7b22e	Update sql/moz-fx-data-shared-prod/firefox_accounts_derived/funnel_events_source_v1/query.sql Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com>	2022-12-15 13:11:07 +01:00
Chelsea Troy	a1a155a22f	Add urlbar_persisted to query for daily search client table (#3349 ) * Add urlbar_persisted to query for daily search client table * Add column to schema (not backward breaking) * Address test expectations * Update schemata and queries for companion tables * Make adjustments that Alex identified in the PR to make sure the new fields get ingested properly * Run schema update for clients_daily_v6	2022-12-05 15:33:43 -05:00
Anna Scholtz	a05e0dd80c	Update event_aggregates_v1 tests	2022-11-17 08:34:36 -08:00
Alekhya	56983f3e8d	Add Glean iOS Focus and Klar to search metrics (#3285 ) correct the column for default search engine add tests and ios to views	2022-10-18 13:52:53 -04:00
Frank Bertsch	ce992ca411	Bug 1791580 - Add qualified use to clients_last_seen (#3232 ) * Bug 1791580 - Add qualified use to clients_last_seen * Add new fields for private/normal browsing URIs * Update clients_last_seen_joined schema * Reformat view * Update and extend tests	2022-10-11 09:27:40 -04:00
Alexander	588d468dc8	Hoist schemas in SQL tests up to table dir (#3145 )	2022-08-17 13:11:24 -04:00
wil stuckey	4034b1a93d	Update table declarations to include session_id, sequence_no in sanitized and impression tables (#3092 ) Co-authored-by: whd <whd@users.noreply.github.com>	2022-08-09 16:40:09 +00:00
Nan Jiang	2b67d988d7	CONSVC 1898: Add suggest_data_sharing_enabled to event_aggregates (#3116 )	2022-07-26 16:55:49 -04:00
Alexander	f99f112336	Android Focus search ETL - DO-824, Bug 1749833 (#2682 ) Added glean data for Focus on Android to `mobile_search_clients_daily_v1`	2022-07-26 16:25:32 -04:00
Nan Jiang	8aa9596ff4	CONSVC 1813: Include iOS data into Contextual Services derived dataset (#3049 )	2022-06-30 11:05:04 -04:00
wil stuckey	a78127a050	#1775029 Update the suggest_impression_sanitized_v3 query (#3041 ) * Update the suggest_impression_sanitized_v3 query * exclude region and country when preparing for the join * filter `impressions.request_id` to non null to drop queries without a corresponding impression. * Add tests? Haven't figured out bootstraping issues on my M1 yet so not sure how well these will work. TO CI! * Swap the left and right and remove the conditional on the final join * Align expectations * Will this fix the tests? tune in to find out. * Fix expectations AGAIN * Update based on review comments and formatter changes	2022-06-23 15:50:16 +00:00
Anna Scholtz	2f5c6ac41a	Generate ExternalTaskMarkers for Airflow downstream dependencies	2022-06-22 11:05:25 -07:00
akkomar	ceda6dd35f	Use approximate client count in GLAM scalar_percentiles_v1 (#3039 ) This is a follow-up to https://github.com/mozilla/bigquery-etl/pull/3037 which unblocked `scalar_bucket_counts_v1`. `scalar_percentiles_v1` uses the same source table (`clients_scalar_aggregates_v1`) and started failing today with the same error (disk/memory limits exceeded for shuffle operations). `APPROX_COUNT_DISTINCT` used here runs HLL under the hood. The reason for using it here is that we can't split the aggregation here into two stages as in the aforementioned PR due to quantiles calculation. I have run this query locally and confirmed that it works.	2022-06-21 10:55:08 -04:00
Nan Jiang	97f676cc8e	CONSVC 1800: Add os to the Contextual Services derived dataset (#3027 ) * CONSVC 1800: Add os to the Contextual Services derived dataset * Review fixes Co-authored-by: Jeff Klukas <jklukas@mozilla.com>	2022-06-16 23:31:17 +00:00
Nan Jiang	e4b180dbbc	Bug 1757768 (follow-up): Fix test failures (#3017 )	2022-06-13 21:34:04 +00:00
Nan Jiang	44399cd7af	Bug 1757768: add match_type to contextual services derived dataset (#2897 ) * Bug 1757768: add match_type to contextual services derived dataset * f test	2022-06-08 17:52:41 +00:00
Rebecca BurWei	997708a74f	Add country to urlbar_clients_daily (#3009 ) * Add country to urlbar_clients_daily	2022-06-03 15:26:59 -04:00
Jeff Klukas	3db4633376	CONSVC-1681 Add mobile data to contextual services event_aggregates (#2805 ) * CONSVC-1681 Add mobile data to contextual services event_aggregates See https://mozilla-hub.atlassian.net/browse/CONSVC-1681 * Use 'phone' instead of 'mobile' * Update init.sql * Commentary on filter * Aggregation test update * Update overactive filter test * Dry run exemptions * Update sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_v1/query.sql * format	2022-03-31 10:53:29 -04:00
Rebecca BurWei	78885a77dd	Add experiments field from clients_daily to urlbar_clients_daily (#2814 ) * Add experiments field * Updated urlbar_clients_daily schema * Updated tests Co-authored-by: Jeff Klukas <jklukas@mozilla.com> Co-authored-by: Alexander Nicholson <anicholson@mozilla.com>	2022-03-23 16:48:16 -04:00
Alekhya	a76cd01efa	add minimum client count for fenix (#2642 ) add minimum client count for fenix add minimum client count for fenix add minimum client count for fenix add minimum client count for fenix	2022-01-12 11:49:59 -05:00
Alekhya	4b178bc49b	Minimum client counts (#2628 ) * added minimum client count for desktop added minimum client count for desktop added minimum client count for desktop added minimum client count for desktop added minimum client count for desktop * Update the sql query * Updated the total_users > 375 than 100	2022-01-06 14:50:29 -05:00
Jeff Klukas	59e16919aa	Lowercase and trim in suggest_impressions_sanitized_v2 (#2581 ) This better matches the current client behavior for matching. We're currently getting `<disallowed>` in results and results with uppercase letters. I don't think preserving these differences has analytical value, and it makes the results harder to work with.	2021-12-16 08:16:19 -05:00
Jeff Klukas	ec7e68f213	ROAD-85 Simple sanitization job for Merino logs (#2522 ) * ROAD-85 Simple sanitization job for Merino logs See https://mozilla-hub.atlassian.net/browse/ROAD-85 This uses the adM allowlist of queries for sanitization, so can be expressed entirely in a single query. Future iterations will involve python logic and will likely need to be held elsewhere. * Separate external query and query to copy data into shared-prod	2021-12-14 16:18:24 -05:00
Alekhya	2f1413fee1	Revert "correcting minimum client count - desktop and fenix (#2544 )" (#2566 ) This reverts commit `5b743090b4`.	2021-12-10 10:15:52 -05:00
Alekhya	5b743090b4	correcting minimum client count - desktop and fenix (#2544 ) * correcting minimum client count - desktop and fenix * corrected test cases for desktop * corrected the join for desktop	2021-12-06 10:14:42 -05:00
Alexander Nicholson	48d3ac3f60	Bug 1742183 Added iOS probes to mobile_search_clients_daily (#2526 )	2021-11-25 17:19:49 -05:00
Jeff Klukas	1a87c497c0	Bug 1737374 Add new Suggest data prefs to clients_daily etc (#2486 ) * Bug 1737374 Add new Suggest data prefs to clients_daily etc See https://bugzilla.mozilla.org/show_bug.cgi?id=1737374#c3 * format sql * Add code comments about the new prefs	2021-11-10 09:47:27 -05:00
Alexander Nicholson	85d994bcd1	Bug 1738132 - Added handoff sources to search_clients_daily (#2477 )	2021-11-04 12:09:08 -04:00
Rebecca BurWei	074090b329	add provider for quicksuggest tables (#2442 ) * add provider for quicksuggest tables * update test * update test * update test * update test * update test * update test Co-authored-by: Jeff Klukas <jklukas@mozilla.com>	2021-10-20 15:37:08 -04:00
Rebecca BurWei	1d3ec900d6	add provider to event_aggregates (#2433 ) * add provider to event_aggregates * unknown provider for quicksuggest * Formatting * Update tests * Allow field addition Co-authored-by: Jeff Klukas <jklukas@mozilla.com>	2021-10-19 14:52:01 +00:00
Jeff Klukas	20721db140	Bug 1734464 Add clients_daily field for onboarding-choice (#2410 ) * Bug 1734464 Add clients_daily field for onboarding-choice See https://bugzilla.mozilla.org/show_bug.cgi?id=1734464 * Add missing field in final query of urlbar_clients_daily	2021-10-08 20:40:19 +00:00
Will Lachance	92bbc84aca	Bug 1731093 - Generate events_daily for other Fenix variants (#2351 ) Firefox for Android beta and nightly channels.	2021-10-04 15:53:08 +00:00
XuanLuo	f1b9dff761	Add channel as a dimension in search_aggregates (#2363 )	2021-09-27 04:55:27 -07:00
Jeff Klukas	c5353ca5bf	Pbd monitoring (#2367 ) * Add PBD monitoring tables * Refactor to harmonize and use metadata Co-authored-by: Wesley Dawson <whd@mozilla.com>	2021-09-22 16:20:09 -04:00
Alexander Nicholson	491e9da342	Bug 1731277 Cast position key to integer (#2349 )	2021-09-20 12:35:26 -04:00
Jeff Klukas	9b71cce148	Add authorized view for ctxtsvc and add search_terms dataset (#2328 ) * Add authorized view for ctxtsvc and add search_terms dataset * Add a search_terms_derived dataset definition	2021-09-20 09:50:52 -04:00
Jeff Klukas	579520f3c4	Bug 1729970 Add quicksuggest prefs to urlbar_clients_daily (#2336 ) * Bug 1729970 Add quicksuggest prefs to urlbar_clients_daily Follow-up to https://github.com/mozilla/bigquery-etl/pull/2332 * Update test	2021-09-15 20:00:01 +00:00
Alexander Nicholson	8899b9a392	Fix for Bug 1729084 - Add UDF to remove outliers (#2321 ) Added replace_outlier_values_with_zero UDF and use it to replace the values in the keyed scalar metrics with 0 if they pass a threshold value. Also renamed some function params, added test and fixed an off-by-1 error in index->position transform	2021-09-13 09:52:41 -04:00
Ben Wu	d7c904f96e	Bug 1711797 Add access point search probes to search clients daily (#2253 )	2021-08-18 17:03:13 -04:00
Ben Wu	dbf25769cd	Replace normalize search engine with stub implementation (#2258 )	2021-08-12 16:30:13 +00:00
Linh Nguyen	e42b2faa25	Use the most recent bucket ranges in glam for categorical histograms (fixes #2220 ) (#2223 )	2021-07-27 17:05:39 -07:00
Ben Wu	56bb9a542e	Bug 1673976 - Add ios glean to search tables with version filter (#2219 )	2021-07-26 20:31:32 +00:00
Ben Wu	9d063e4108	Bug 1716074 - Derive search clients daily from clients daily (#2127 )	2021-07-19 14:30:35 +00:00
Jeff Klukas	cc288f5dc0	Remove test_aggregation case for clients_daily_v6 (#2117 ) Motivated in particular by https://github.com/mozilla/bigquery-etl/pull/2115 where new changes ended up making the test case too complex to run. The difficulty of updating this test case is outweighing the safety benefit at this point, so we are removing, but leaving a pointer in case we want to reestablish the test in the future.	2021-06-11 14:02:46 -04:00
Ben Wu	9ea4319563	Add missing search sources to clients daily (#2090 )	2021-06-03 16:51:45 -04:00
Anthony Miyaguchi	5b362e289f	Fix empty result sets for incremental core clients first seen table (#2057 ) * Add tests for core_clients_first_seen_init * Add failing test for core clients first seen * Fix issues with core clients first seen * Keep left join * Update sql/moz-fx-data-shared-prod/telemetry_derived/core_clients_first_seen_v1/query.sql Co-authored-by: Jeff Klukas <jklukas@mozilla.com> Co-authored-by: Jeff Klukas <jklukas@mozilla.com>	2021-05-24 17:11:11 -04:00
Jeff Klukas	0e524168a9	Test fixup	2021-05-24 08:48:45 -04:00
Ben Wu	15de5cc6d8	Temporarily remove glean ios from mobile search (#2040 )	2021-05-19 08:35:38 -07:00
Ben Wu	e48271f3cf	Bug 1673976 - Add glean ios search probes to mobile search tables (#1974 )	2021-05-13 11:47:57 -04:00
Ben Wu	856912d723	Filter out high user click counts for adm derived table (#2010 )	2021-05-06 16:04:33 -04:00
Jeff Klukas	2c8f455162	Bug 1709595 Add new attribution experiment fields to clients_daily (#2004 ) * Implement schema.yaml for the clients_daily chain * Bug 1709595 Add new attribution experiment fields to clients_daily * Add schema for clients_daily_joined_v1 * yamllint * DAG update * Update tests * Add --force option	2021-05-05 15:17:13 -04:00
Ben Wu	5d3a2ac75c	Bug 1708486 - Create aggregate table for contextual services (#2001 )	2021-05-05 11:32:20 -07:00
Frank Bertsch	aea2312a66	Add regression test for bug 1707921	2021-04-27 12:42:38 -04:00
Jeff Klukas	b0013088c4	Bug 1707640 - Calculate sample_id in baseline_clients_first_seen (#1978 ) * Bug 1707640 - Calculate sample_id in baseline_clients_first_seen See https://bugzilla.mozilla.org/show_bug.cgi?id=1707640#c8 * Also update query.sql * Remove redundancy in init.sql * Fixup test	2021-04-26 15:13:34 -04:00
Jeff Klukas	6b2dbec0c0	Add first_seen_date to core_clients_daily and last seen (#1962 ) * Add first_seen_date to core_clients_daily and last seen Supports KPI work for iOS and Focus apps. See https://docs.google.com/document/d/1-sifTuu3lWd5umvaUmncFrdBIK6eKVTPzmGDLv6GDak/edit?ts=6078667e# * Update tests * Add new_profiles to mobile_usage * Make sure is_new_profile reflects only current day * Remove is_new_profile from core_clients_last_seen This field could be confusing. If we do `COUNTIF(is_new_profile)`, we'll overcount since a client that appears on a single day will continue to appear in clients_last_seen with is_new_profile=True carried over from the original day of observation. * Remove is_new_profile from core_clients_last_seen query * bugfixes * DAG change	2021-04-20 09:49:01 -04:00
Anna Scholtz	54864c33c3	Add is_taskbar_pinned and launch_method to clients_daily	2021-04-19 10:22:24 -07:00
Anthony Miyaguchi	871270f2c4	[DS-1424] Join baseline clients daily with first seen table (#1946 ) * Add first_seen_date and related test fixtures * Use is_new_profile instead of baseline_first_seen * Update view for baseline_clients_first_seen * Fix yamllint issues * Set is_new_profile when submission matches first seen * Include AS in table alias * Nit: capitalize AS * Update bigquery_etl/glean_usage/templates/baseline_clients_daily_v1.sql Co-authored-by: Jeff Klukas <jklukas@mozilla.com> * Update bigquery_etl/glean_usage/templates/baseline_clients_daily_v1.sql Co-authored-by: Jeff Klukas <jklukas@mozilla.com> * Update clustering specification Co-authored-by: Jeff Klukas <jklukas@mozilla.com>	2021-04-12 12:29:57 -07:00
Anthony Miyaguchi	459f64576c	Add baseline clients daily test (#1941 ) * Update table_name_from_baseline to strip project * Remove project ids from query to facilitate testing * Rewrite require_partition_filter in tests * Add basic tests for baseline clients daily	2021-04-08 08:39:28 -04:00
Anthony Miyaguchi	1503a7fa89	[DS-1424] Implementation of mobile clients first seen (#1934 ) * Add initial boilerplate for clients_first_seen * Remove submission_timestamp as a field * [wip] Join data against legacy fennec id if applicable * Remove user facing view * Revert "Remove user facing view" This reverts commit a728a7882170eadad5413c7a7046c0f38297bb87. * Add flag for fennec_id * Update logic to limit rows in partitions to submission_date * Add all sql in glean_usage to format ignores * Separate init and query * Add default encoders for testing sql * Add test for initialization of baseline clients first seen in fenix * Update query to update over previous history * Add test for aggregation * Add generated sql and tests for simple baseline clients first seen * Add dry-run exceptions for clients first seen tables * Add clients first seen to generated sql * Update bigquery_etl/glean_usage/templates/baseline_clients_first_seen.metadata.yaml Co-authored-by: Jeff Klukas <jklukas@mozilla.com> * Update bigquery_etl/glean_usage/templates/baseline_clients_first_seen.metadata.yaml Co-authored-by: Jeff Klukas <jklukas@mozilla.com> * Group by sample id instead of min * Add submission_date as baseline first seen date Co-authored-by: Jeff Klukas <jklukas@mozilla.com>	2021-04-05 11:36:39 -07:00
Ben Wu	bfc4980f5d	Bug 1698578 - Parse fenix tagged ad click keys (#1906 )	2021-03-31 16:05:35 -04:00
Daniel Thorn	dfeea39ac5	Enforce more yaml lint rules (#1878 )	2021-03-09 17:25:01 -05:00
Arkadiusz Komarzewski	409dd87451	Bug 1695073 - search_aggregates: handle organic search (#1851 )	2021-02-26 17:48:40 +01:00
Sunah Suh	69ddb69787	Bug 1693141: handle engine suffixes in search scalar keys (#1829 ) * Convert test data to yaml and add ad_click/search_with_ads scalar data * Convert expected data to yaml * Fix expected test results * Add new columns in test_experiments * Update sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v8/query.sql Co-authored-by: Ben Wu <benjaminwu124@gmail.com>	2021-02-25 13:48:34 -05:00
Daniel Thorn	a190e18264	Automatically sort python imports (#1840 )	2021-02-24 17:11:52 -05:00
Ben Wu	0ced85af5a	Remove segment limit in mobile search in-content source (#1810 )	2021-02-18 12:57:49 -05:00
Frank Bertsch	dcb8405920	Move tests to event_types_history	2021-02-10 17:03:02 -05:00
Frank Bertsch	9115862493	Add event_types query generation	2021-02-10 17:03:02 -05:00
Daniel Thorn	acac30a6fb	Fix null handling for structs in clients daily (#1667 )	2021-01-13 13:26:07 -08:00
Daniel Thorn	61f1a85897	Produce clients_daily from main_v4 (#1519 )	2021-01-12 14:15:57 -08:00
Anthony Miyaguchi	ce9fe86ed2	Fix #1587 - fix inconsistent range_min and range_max in bucket counts (#1591 ) * Fix egregious double counting in scalar bucket counts * Update for newer version of black * Update scalar bucket count test to account for combinations * Update minimal test for histogram bucket counts * Add test for multiple clients in histogram aggregates * Remove deduplicated cte in histogram bucket counts * Use count distinct for client counts to be explicit	2020-12-04 14:47:45 -08:00
Ben Wu	b50a95944d	Separate queries on clients_scalar_aggregates by app_version (#1594 )	2020-12-03 14:26:35 -05:00
Anthony Miyaguchi	4234c40040	Add minimal set of tests for GLAM Fenix queries (#1488 ) * Add script to determine query dependencies * Add schemas and folders for minimal test * Add schema for geckoview_versions * Add query params to each query * Update schema for new queries * Remove main from bootstrap file * Add dataset prefix to schemas * Add failing test for clients_histogram_aggregates It turns out that the dependency resolution I'm using for autogenerate the schemas is ignoring the views. I actually want to keep the views around. The tables also all need to be prefixed with the dataset name or they won't be inserted into the sql query correctly. * Add successful test for clients histogram aggregates * Add minimal tests for clients_scalar_aggregates * Remove skeleton files for views (no test support for views) * Add tests for latest versions * Add tests for scalar bucket counts that passes * Add scalar bucket counts * Add test for scalar percentiles * Add test for histogram bucket counts * Add passing test for probe counts * Add test for histogram percentiles * Add tests for extract counts * Update readme * Add data for scalar percentiles test * Fix linting errors * Fix mypy issues with tests module * Name it data instead of tests..data Ignore mypy on tests directory * Remove mypy section * Remove extra line in pytest * Try pytest invocation of mypy-scripts-are-modules * Run mypy outside of pytest * Use exec on pytest instead of mypy * Update tests/sql/glam-fenix-dev/glam_etl/bootstrap.py Co-authored-by: Ben Wu <benjaminwu124@gmail.com> * Update tests/sql/glam-fenix-dev/glam_etl/README.md Co-authored-by: Ben Wu <benjaminwu124@gmail.com> * Document bootstrap in documentation * Use artificial range for histogram_percentiles * Simplify parameters for scalar probe counts * Simplify tests for histogram probe counts * Add test for incremental histogram aggregates * Update scalar percentile counts to count distinct client ids * Update readme for creating a new test * Use unorded list for sublist * Use --ignore-glob for pytest to avoid data files Co-authored-by: Ben Wu <benjaminwu124@gmail.com>	2020-12-01 17:11:45 -08:00
Jeff Klukas	603fec3850	Bug 1677609 Join clients_first_seen into clients_last_seen (#1561 ) * Bug 1677609 Join clients_first_seen into clients_last_seen Several folks on DS report that they have been getting great value from clients_first_seen, as the first_seen_date there is a much more stable way to define new profiles compared to using profile_created_date from pings. Currently, using first_seen_date requires doing a join between these two tables. This PR adds that join to the clients_last_seen query itself to make this workflow more efficient. I'd like to get this merged before we proceed with the backfill discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1677609 This change has a few operational implications. Most importantly, it makes clients_last_seen dependent on clients_first_seen, so those queries can no longer proceed in parallel. `clients_first_seen` takes on average 10 minutes to run, so we'll be delaying all ETL downstream of `clients_last_seen` by about 10 minutes, which seems acceptable. It also adds some mental complexity to the model. The extra join does not appear to significantly slow down the `clients_last_seen` query itself; it scans about 15% more data and consumes about 15% more slot time. I expect the performance is dominated by the existing join between clients_daily and the previous day of clients_last_seen.	2020-11-30 09:28:53 -05:00
Jeff Klukas	24207636dd	Bug 1677609 Add core active fields to clients_last_seen (#1560 ) * Bug 1677609 Add core active fields to clients_last_seen See https://bugzilla.mozilla.org/show_bug.cgi?id=1677609 This adds just the new underlying bit pattern fields that will need to be backfilled, and these will be hidden from users initially. After the backfill is complete, we will update the view to include these fields along with the various fields derived from them. We include days_visited_10_uri_bits which was not explicitly requested in the context of this bug, but was proposed as part of the prototype feature_usage table (https://github.com/mozilla/bigquery-etl/pull/1193); it may be useful for future comparisons. * Update tests to match new logic	2020-11-17 14:15:37 -05:00
Sunah Suh	813a485d2d	Bug 1673404: Add searchmode scalars to clients_daily and search_clients_daily (#1549 ) Add searchmode scalars to clients_daily and search_clients_daily	2020-11-13 15:59:20 -06:00
Rhys	1ace0fe2b7	Ran YAMLlint on all yaml files and resolved linting issues (fixes #1297 ) (#1481 ) * "Ran YAMLlint on all yaml files" * "Moved product info metadata table to README file" * "Reformatted yaml lists" * "Updated line breaks so script runs" * "Updated line breaks so script runs" * "Undid line breaks" * "Created custom config file" * "Removed base document id" * "Undid line breaks" * "Reformatted code" * "Trimmed whitespace" * "Undid line break" * "Introduced newline" * "Trimmed whitespace" * "Added yamillint to config file" * "Added yamllint to config file" * "Moved up yamllint test" * "Trimmed whitespace" * "Trimmed whitespace" * "Trimmed whitespace" * "Trimmed whitespace" * "Removing hyphen to fix CI error" * "Indentation to remove CI error" * "Included yamllint install in build run" * "Added yamllint in requirements.txt and .in file" * "Moved install yamllint step to its own stage" * "Updated yamllint test" * "Updated circleci step" * "Reformatted code" * "Added yamllint to circleci steps" * "Added checkout block to yamllint step" * "Trimmed whitespace" * "Undid yamllint step" * "Specified directory name for yamllint test" * "Fixed yamlint errors" * "Fixed yamllint errors" * "Fixed yamllint errors" * "Fixed yamllint errors" * "Ignore pathway in linting" * "Added ignore venv pathway during linting" * "Updated ignore block" * "Updated ignore block" * "Removed ignore block" * "Updated ignore block" * "Indented base as a list" * "Indented base item" * Update tests/sql/moz-fx-data-shared-prod/search_derived/mobile_search_clients_last_seen_v1/test_day_bit_shifting/expect.yaml Co-authored-by: Anthony Miyaguchi <acmiyaguchi@gmail.com> * "Resolved linting errors" * "Referenced tables put back on same line" * "Fixed linting error" * Update sql/moz-fx-data-shared-prod/account_ecosystem_derived/fxa_logging_users_daily_v1/metadata.yaml Co-authored-by: Anthony Miyaguchi <acmiyaguchi@gmail.com> * "Fixed linting error" Co-authored-by: Anthony Miyaguchi <acmiyaguchi@gmail.com>	2020-10-29 17:24:55 -07:00
Sunah Suh	c8d0136694	Bug 1671517: Add event counts to clients daily (#1455 )	2020-10-20 13:29:21 -05:00
Anthony Miyaguchi	349dff3ca2	Add table to determine Fenix nightly mapping of builds to geckoview versions (#1419 ) * Add initial incremental query for geckoview build dates * Add initial tests for incremental query (WIP) * Add files for initial tests * Rework query so it doesn't fail during tests * Fix schema so queries run * Add passing test for init * Add test for query aggregation * Add metadata file for scheduling the query * Move scripts from fenix_nightly to fenix * Remove scheduling * Add document strings. * Change dataset reference and indent comments correctly * Remove init and address feedback * remove init file * make query idempotent by appending window to each submission_date * rename n_builds to n_pings * reduce window size from 30 days to 14 days * avoid use of subqueries * Update tests for query * Fix tests * Add failing test for 100 * Fix query so it work across fx100 boundary * Add linting fixes	2020-10-16 11:57:23 -07:00
Anna Scholtz	93bc51ba5e	Move queries to right directories	2020-10-05 12:59:58 -07:00
Anna Scholtz	87f1a4e19f	Update tests	2020-10-05 12:59:58 -07:00

1 2 3 4 5

204 Коммитов