Граф коммитов

104 Коммитов

Автор SHA1 Сообщение Дата
Jeff Klukas 1a87c497c0
Bug 1737374 Add new Suggest data prefs to clients_daily etc (#2486)
* Bug 1737374 Add new Suggest data prefs to clients_daily etc

See https://bugzilla.mozilla.org/show_bug.cgi?id=1737374#c3

* format sql

* Add code comments about the new prefs
2021-11-10 09:47:27 -05:00
Alexander Nicholson 85d994bcd1
Bug 1738132 - Added handoff sources to search_clients_daily (#2477) 2021-11-04 12:09:08 -04:00
Rebecca BurWei 074090b329
add provider for quicksuggest tables (#2442)
* add provider for quicksuggest tables

* update test

* update test

* update test

* update test

* update test

* update test

Co-authored-by: Jeff Klukas <jklukas@mozilla.com>
2021-10-20 15:37:08 -04:00
Rebecca BurWei 1d3ec900d6
add provider to event_aggregates (#2433)
* add provider to event_aggregates

* unknown provider for quicksuggest

* Formatting

* Update tests

* Allow field addition

Co-authored-by: Jeff Klukas <jklukas@mozilla.com>
2021-10-19 14:52:01 +00:00
Jeff Klukas 20721db140
Bug 1734464 Add clients_daily field for onboarding-choice (#2410)
* Bug 1734464 Add clients_daily field for onboarding-choice

See https://bugzilla.mozilla.org/show_bug.cgi?id=1734464

* Add missing field in final query of urlbar_clients_daily
2021-10-08 20:40:19 +00:00
Will Lachance 92bbc84aca
Bug 1731093 - Generate events_daily for other Fenix variants (#2351)
Firefox for Android beta and nightly channels.
2021-10-04 15:53:08 +00:00
XuanLuo f1b9dff761
Add channel as a dimension in search_aggregates (#2363) 2021-09-27 04:55:27 -07:00
Jeff Klukas c5353ca5bf
Pbd monitoring (#2367)
* Add PBD monitoring tables

* Refactor to harmonize and use metadata


Co-authored-by: Wesley Dawson <whd@mozilla.com>
2021-09-22 16:20:09 -04:00
Alexander Nicholson 491e9da342
Bug 1731277 Cast position key to integer (#2349) 2021-09-20 12:35:26 -04:00
Jeff Klukas 9b71cce148
Add authorized view for ctxtsvc and add search_terms dataset (#2328)
* Add authorized view for ctxtsvc and add search_terms dataset

* Add a search_terms_derived dataset definition
2021-09-20 09:50:52 -04:00
Jeff Klukas 579520f3c4
Bug 1729970 Add quicksuggest prefs to urlbar_clients_daily (#2336)
* Bug 1729970 Add quicksuggest prefs to urlbar_clients_daily

Follow-up to https://github.com/mozilla/bigquery-etl/pull/2332

* Update test
2021-09-15 20:00:01 +00:00
Alexander Nicholson 8899b9a392
Fix for Bug 1729084 - Add UDF to remove outliers (#2321)
Added replace_outlier_values_with_zero UDF and use it to
replace the values in the keyed scalar metrics with 0 if
they pass a threshold value.
Also renamed some function params, added test and fixed
an off-by-1 error in index->position transform
2021-09-13 09:52:41 -04:00
Ben Wu d7c904f96e
Bug 1711797 Add access point search probes to search clients daily (#2253) 2021-08-18 17:03:13 -04:00
Ben Wu dbf25769cd
Replace normalize search engine with stub implementation (#2258) 2021-08-12 16:30:13 +00:00
Linh Nguyen e42b2faa25
Use the most recent bucket ranges in glam for categorical histograms (fixes #2220) (#2223) 2021-07-27 17:05:39 -07:00
Ben Wu 56bb9a542e
Bug 1673976 - Add ios glean to search tables with version filter (#2219) 2021-07-26 20:31:32 +00:00
Ben Wu 9d063e4108
Bug 1716074 - Derive search clients daily from clients daily (#2127) 2021-07-19 14:30:35 +00:00
Jeff Klukas cc288f5dc0
Remove test_aggregation case for clients_daily_v6 (#2117)
Motivated in particular by https://github.com/mozilla/bigquery-etl/pull/2115
where new changes ended up making the test case too complex to run.

The difficulty of updating this test case is outweighing the safety benefit
at this point, so we are removing, but leaving a pointer in case we want to
reestablish the test in the future.
2021-06-11 14:02:46 -04:00
Ben Wu 9ea4319563
Add missing search sources to clients daily (#2090) 2021-06-03 16:51:45 -04:00
Anthony Miyaguchi 5b362e289f
Fix empty result sets for incremental core clients first seen table (#2057)
* Add tests for core_clients_first_seen_init

* Add failing test for core clients first seen

* Fix issues with core clients first seen

* Keep left join

* Update sql/moz-fx-data-shared-prod/telemetry_derived/core_clients_first_seen_v1/query.sql

Co-authored-by: Jeff Klukas <jklukas@mozilla.com>

Co-authored-by: Jeff Klukas <jklukas@mozilla.com>
2021-05-24 17:11:11 -04:00
Jeff Klukas 0e524168a9 Test fixup 2021-05-24 08:48:45 -04:00
Ben Wu 15de5cc6d8
Temporarily remove glean ios from mobile search (#2040) 2021-05-19 08:35:38 -07:00
Ben Wu e48271f3cf
Bug 1673976 - Add glean ios search probes to mobile search tables (#1974) 2021-05-13 11:47:57 -04:00
Ben Wu 856912d723
Filter out high user click counts for adm derived table (#2010) 2021-05-06 16:04:33 -04:00
Jeff Klukas 2c8f455162
Bug 1709595 Add new attribution experiment fields to clients_daily (#2004)
* Implement schema.yaml for the clients_daily chain

* Bug 1709595 Add new attribution experiment fields to clients_daily

* Add schema for clients_daily_joined_v1

* yamllint

* DAG update

* Update tests

* Add --force option
2021-05-05 15:17:13 -04:00
Ben Wu 5d3a2ac75c
Bug 1708486 - Create aggregate table for contextual services (#2001) 2021-05-05 11:32:20 -07:00
Frank Bertsch aea2312a66 Add regression test for bug 1707921 2021-04-27 12:42:38 -04:00
Jeff Klukas b0013088c4
Bug 1707640 - Calculate sample_id in baseline_clients_first_seen (#1978)
* Bug 1707640 - Calculate sample_id in baseline_clients_first_seen

See https://bugzilla.mozilla.org/show_bug.cgi?id=1707640#c8

* Also update query.sql

* Remove redundancy in init.sql

* Fixup test
2021-04-26 15:13:34 -04:00
Jeff Klukas 6b2dbec0c0
Add first_seen_date to core_clients_daily and last seen (#1962)
* Add first_seen_date to core_clients_daily and last seen

Supports KPI work for iOS and Focus apps.
See https://docs.google.com/document/d/1-sifTuu3lWd5umvaUmncFrdBIK6eKVTPzmGDLv6GDak/edit?ts=6078667e#

* Update tests

* Add new_profiles to mobile_usage

* Make sure is_new_profile reflects only current day

* Remove is_new_profile from core_clients_last_seen

This field could be confusing.

If we do `COUNTIF(is_new_profile)`,
we'll overcount since a client that appears on a single day will continue
to appear in clients_last_seen with is_new_profile=True carried over from the
original day of observation.

* Remove is_new_profile from core_clients_last_seen query

* bugfixes

* DAG change
2021-04-20 09:49:01 -04:00
Anna Scholtz 54864c33c3 Add is_taskbar_pinned and launch_method to clients_daily 2021-04-19 10:22:24 -07:00
Anthony Miyaguchi 871270f2c4
[DS-1424] Join baseline clients daily with first seen table (#1946)
* Add first_seen_date and related test fixtures

* Use is_new_profile instead of baseline_first_seen

* Update view for baseline_clients_first_seen

* Fix yamllint issues

* Set is_new_profile when submission matches first seen

* Include AS in table alias

* Nit: capitalize AS

* Update bigquery_etl/glean_usage/templates/baseline_clients_daily_v1.sql

Co-authored-by: Jeff Klukas <jklukas@mozilla.com>

* Update bigquery_etl/glean_usage/templates/baseline_clients_daily_v1.sql

Co-authored-by: Jeff Klukas <jklukas@mozilla.com>

* Update clustering specification

Co-authored-by: Jeff Klukas <jklukas@mozilla.com>
2021-04-12 12:29:57 -07:00
Anthony Miyaguchi 459f64576c
Add baseline clients daily test (#1941)
* Update table_name_from_baseline to strip project

* Remove project ids from query to facilitate testing

* Rewrite require_partition_filter in tests

* Add basic tests for baseline clients daily
2021-04-08 08:39:28 -04:00
Anthony Miyaguchi 1503a7fa89
[DS-1424] Implementation of mobile clients first seen (#1934)
* Add initial boilerplate for clients_first_seen

* Remove submission_timestamp as a field

* [wip] Join data against legacy fennec id if applicable

* Remove user facing view

* Revert "Remove user facing view"

This reverts commit a728a7882170eadad5413c7a7046c0f38297bb87.

* Add flag for fennec_id

* Update logic to limit rows in partitions to submission_date

* Add all sql in glean_usage to format ignores

* Separate init and query

* Add default encoders for testing sql

* Add test for initialization of baseline clients first seen in fenix

* Update query to update over previous history

* Add test for aggregation

* Add generated sql and tests for simple baseline clients first seen

* Add dry-run exceptions for clients first seen tables

* Add clients first seen to generated sql

* Update bigquery_etl/glean_usage/templates/baseline_clients_first_seen.metadata.yaml

Co-authored-by: Jeff Klukas <jklukas@mozilla.com>

* Update bigquery_etl/glean_usage/templates/baseline_clients_first_seen.metadata.yaml

Co-authored-by: Jeff Klukas <jklukas@mozilla.com>

* Group by sample id instead of min

* Add submission_date as baseline first seen date

Co-authored-by: Jeff Klukas <jklukas@mozilla.com>
2021-04-05 11:36:39 -07:00
Ben Wu bfc4980f5d
Bug 1698578 - Parse fenix tagged ad click keys (#1906) 2021-03-31 16:05:35 -04:00
Daniel Thorn dfeea39ac5
Enforce more yaml lint rules (#1878) 2021-03-09 17:25:01 -05:00
Arkadiusz Komarzewski 409dd87451
Bug 1695073 - search_aggregates: handle organic search (#1851) 2021-02-26 17:48:40 +01:00
Sunah Suh 69ddb69787
Bug 1693141: handle engine suffixes in search scalar keys (#1829)
* Convert test data to yaml and add ad_click/search_with_ads scalar data

* Convert expected data to yaml

* Fix expected test results

* Add new columns in test_experiments

* Update sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v8/query.sql

Co-authored-by: Ben Wu <benjaminwu124@gmail.com>
2021-02-25 13:48:34 -05:00
Daniel Thorn a190e18264
Automatically sort python imports (#1840) 2021-02-24 17:11:52 -05:00
Ben Wu 0ced85af5a
Remove segment limit in mobile search in-content source (#1810) 2021-02-18 12:57:49 -05:00
Frank Bertsch dcb8405920 Move tests to event_types_history 2021-02-10 17:03:02 -05:00
Frank Bertsch 9115862493 Add event_types query generation 2021-02-10 17:03:02 -05:00
Daniel Thorn acac30a6fb
Fix null handling for structs in clients daily (#1667) 2021-01-13 13:26:07 -08:00
Daniel Thorn 61f1a85897
Produce clients_daily from main_v4 (#1519) 2021-01-12 14:15:57 -08:00
Anthony Miyaguchi ce9fe86ed2
Fix #1587 - fix inconsistent range_min and range_max in bucket counts (#1591)
* Fix egregious double counting in scalar bucket counts

* Update for newer version of black

* Update scalar bucket count test to account for combinations

* Update minimal test for histogram bucket counts

* Add test for multiple clients in histogram aggregates

* Remove deduplicated cte in histogram bucket counts

* Use count distinct for client counts to be explicit
2020-12-04 14:47:45 -08:00
Ben Wu b50a95944d
Separate queries on clients_scalar_aggregates by app_version (#1594) 2020-12-03 14:26:35 -05:00
Anthony Miyaguchi 4234c40040
Add minimal set of tests for GLAM Fenix queries (#1488)
* Add script to determine query dependencies

* Add schemas and folders for minimal test

* Add schema for geckoview_versions

* Add query params to each query

* Update schema for new queries

* Remove main from bootstrap file

* Add dataset prefix to schemas

* Add failing test for clients_histogram_aggregates

It turns out that the dependency resolution I'm using for autogenerate
the schemas is ignoring the views. I actually want to keep the views
around. The tables also all need to be prefixed with the dataset name or
they won't be inserted into the sql query correctly.

* Add successful test for clients histogram aggregates

* Add minimal tests for clients_scalar_aggregates

* Remove skeleton files for views (no test support for views)

* Add tests for latest versions

* Add tests for scalar bucket counts that passes

* Add scalar bucket counts

* Add test for scalar percentiles

* Add test for histogram bucket counts

* Add passing test for probe counts

* Add test for histogram percentiles

* Add tests for extract counts

* Update readme

* Add data for scalar percentiles test

* Fix linting errors

* Fix mypy issues with tests module

* Name it data instead of tests.*.data

* Ignore mypy on tests directory

* Remove mypy section

* Remove extra line in pytest

* Try pytest invocation of mypy-scripts-are-modules

* Run mypy outside of pytest

* Use exec on pytest instead of mypy

* Update tests/sql/glam-fenix-dev/glam_etl/bootstrap.py

Co-authored-by: Ben Wu <benjaminwu124@gmail.com>

* Update tests/sql/glam-fenix-dev/glam_etl/README.md

Co-authored-by: Ben Wu <benjaminwu124@gmail.com>

* Document bootstrap in documentation

* Use artificial range for histogram_percentiles

* Simplify parameters for scalar probe counts

* Simplify tests for histogram probe counts

* Add test for incremental histogram aggregates

* Update scalar percentile counts to count distinct client ids

* Update readme for creating a new test

* Use unorded list for sublist

* Use --ignore-glob for pytest to avoid data files

Co-authored-by: Ben Wu <benjaminwu124@gmail.com>
2020-12-01 17:11:45 -08:00
Jeff Klukas 603fec3850
Bug 1677609 Join clients_first_seen into clients_last_seen (#1561)
* Bug 1677609 Join clients_first_seen into clients_last_seen

Several folks on DS report that they have been getting great value from
clients_first_seen, as the first_seen_date there is a much more stable way
to define new profiles compared to using profile_created_date from pings.

Currently, using first_seen_date requires doing a join between these two tables.
This PR adds that join to the clients_last_seen query itself to make this
workflow more efficient. I'd like to get this merged before we proceed with
the backfill discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1677609

This change has a few operational implications. Most importantly, it makes
clients_last_seen dependent on clients_first_seen, so those queries can no
longer proceed in parallel. `clients_first_seen` takes on average 10 minutes
to run, so we'll be delaying all ETL downstream of `clients_last_seen` by
about 10 minutes, which seems acceptable. It also adds some mental complexity
to the model.

The extra join does not appear to significantly slow down the
`clients_last_seen` query itself; it scans about 15% more data and consumes
about 15% more slot time.
I expect the performance is dominated by the existing join between
clients_daily and the previous day of clients_last_seen.
2020-11-30 09:28:53 -05:00
Jeff Klukas 24207636dd
Bug 1677609 Add core active fields to clients_last_seen (#1560)
* Bug 1677609 Add core active fields to clients_last_seen

See https://bugzilla.mozilla.org/show_bug.cgi?id=1677609

This adds just the new underlying bit pattern fields that will need to be
backfilled, and these will be hidden from users initially.
After the backfill is complete, we will update the view to include these
fields along with the various fields derived from them.

We include days_visited_10_uri_bits which was not explicitly requested in
the context of this bug, but was proposed as part of the prototype feature_usage
table (https://github.com/mozilla/bigquery-etl/pull/1193); it may be useful
for future comparisons.

* Update tests to match new logic
2020-11-17 14:15:37 -05:00
Sunah Suh 813a485d2d
Bug 1673404: Add searchmode scalars to clients_daily and search_clients_daily (#1549)
Add searchmode scalars to clients_daily and search_clients_daily
2020-11-13 15:59:20 -06:00
Rhys 1ace0fe2b7
Ran YAMLlint on all yaml files and resolved linting issues (fixes #1297) (#1481)
* "Ran YAMLlint on all yaml files"

* "Moved product info metadata table to README file"

* "Reformatted yaml lists"

* "Updated line breaks so script runs"

* "Updated line breaks so script runs"

* "Undid line breaks"

* "Created custom config file"

* "Removed base document id"

* "Undid line breaks"

* "Reformatted code"

* "Trimmed whitespace"

* "Undid line break"

* "Introduced newline"

* "Trimmed whitespace"

* "Added yamillint to config file"

* "Added yamllint to config file"

* "Moved up yamllint test"

* "Trimmed whitespace"

* "Trimmed whitespace"

* "Trimmed whitespace"

* "Trimmed whitespace"

* "Removing hyphen to fix CI error"

* "Indentation to remove CI error"

* "Included yamllint install in build run"

* "Added yamllint in requirements.txt and .in file"

* "Moved install yamllint step to its own stage"

* "Updated yamllint test"

* "Updated circleci step"

* "Reformatted code"

* "Added yamllint to circleci steps"

* "Added checkout block to yamllint step"

* "Trimmed whitespace"

* "Undid yamllint step"

* "Specified directory name for yamllint test"

* "Fixed yamlint errors"

* "Fixed yamllint errors"

* "Fixed yamllint errors"

* "Fixed yamllint errors"

* "Ignore pathway in linting"

* "Added ignore venv pathway during linting"

* "Updated ignore block"

* "Updated ignore block"

* "Removed ignore block"

* "Updated ignore block"

* "Indented base as a list"

* "Indented base item"

* Update tests/sql/moz-fx-data-shared-prod/search_derived/mobile_search_clients_last_seen_v1/test_day_bit_shifting/expect.yaml

Co-authored-by: Anthony Miyaguchi <acmiyaguchi@gmail.com>

* "Resolved linting errors"

* "Referenced tables put back on same line"

* "Fixed linting error"

* Update sql/moz-fx-data-shared-prod/account_ecosystem_derived/fxa_logging_users_daily_v1/metadata.yaml

Co-authored-by: Anthony Miyaguchi <acmiyaguchi@gmail.com>

* "Fixed linting error"

Co-authored-by: Anthony Miyaguchi <acmiyaguchi@gmail.com>
2020-10-29 17:24:55 -07:00
Sunah Suh c8d0136694
Bug 1671517: Add event counts to clients daily (#1455) 2020-10-20 13:29:21 -05:00
Anthony Miyaguchi 349dff3ca2
Add table to determine Fenix nightly mapping of builds to geckoview versions (#1419)
* Add initial incremental query for geckoview build dates

* Add initial tests for incremental query (WIP)

* Add files for initial tests

* Rework query so it doesn't fail during tests

* Fix schema so queries run

* Add passing test for init

* Add test for query aggregation

* Add metadata file for scheduling the query

* Move scripts from fenix_nightly to fenix

* Remove scheduling

* Add document strings.

* Change dataset reference and indent comments correctly

* Remove init and address feedback

* remove init file
* make query idempotent by appending window to each submission_date
* rename n_builds to n_pings
* reduce window size from 30 days to 14 days
* avoid use of subqueries

* Update tests for query

* Fix tests

* Add failing test for 100

* Fix query so it work across fx100 boundary

* Add linting fixes
2020-10-16 11:57:23 -07:00
Anna Scholtz 93bc51ba5e Move queries to right directories 2020-10-05 12:59:58 -07:00
Anna Scholtz 87f1a4e19f Update tests 2020-10-05 12:59:58 -07:00