Граф коммитов

1393 Коммитов

Автор SHA1 Сообщение Дата
Anthony Miyaguchi b77b542743
Replace GLAM temp functions with persistent functions (#1523)
* Replace GLAM temp functions with persistent functions

* Add generated sql

* Fix typo in udf name

* Add missing files and fully qualify udfs

* Add missing namespace

* Namespace even more things

* format sql
2020-11-05 13:42:09 -08:00
Jeff Klukas 2f738eb4f0
Add bits_from_offsets UDF (#1513)
* Add bits_from_offsets UDF

This is relevant to the emerging "clients_all_time" work drafted in
https://github.com/mozilla/bigquery-etl/pull/1480

I'm proposing we add this under `udf` rather than in `mozfun` because I'm not
yet certain about the naming. If we want to have additional functionality to
support all-time bit patterns, I would like to have those organized under a
single `mozfun` namespace, and it's not clear yet what the interface should
look like.

This function on its own should be enough to empower a new DS workflow for
experimenting with new usage definitions before committing them to clients_daily
and clients_last_seen (to be documented).
2020-11-04 11:31:25 -05:00
Daniel Thorn 3e7fd0c1ff
Use union instead of join to make query more readable (#1507) 2020-11-02 13:02:24 -08:00
Sunah Suh bf6069a4c2
Bug 1673404: Add new search scalars to main summary en route to clients daily, etc (#1504)
* Add new search scalars to main summary en route to clients daily, etc

* fixup! Merge branch 'master' into search-probes-main-summary
2020-11-02 12:41:02 -06:00
Daniel Thorn 498d84cb57
Fix join logic for mozilla_vpn_external.users_v1 (#1503) 2020-10-30 11:58:11 -07:00
Rhys 1ace0fe2b7
Ran YAMLlint on all yaml files and resolved linting issues (fixes #1297) (#1481)
* "Ran YAMLlint on all yaml files"

* "Moved product info metadata table to README file"

* "Reformatted yaml lists"

* "Updated line breaks so script runs"

* "Updated line breaks so script runs"

* "Undid line breaks"

* "Created custom config file"

* "Removed base document id"

* "Undid line breaks"

* "Reformatted code"

* "Trimmed whitespace"

* "Undid line break"

* "Introduced newline"

* "Trimmed whitespace"

* "Added yamillint to config file"

* "Added yamllint to config file"

* "Moved up yamllint test"

* "Trimmed whitespace"

* "Trimmed whitespace"

* "Trimmed whitespace"

* "Trimmed whitespace"

* "Removing hyphen to fix CI error"

* "Indentation to remove CI error"

* "Included yamllint install in build run"

* "Added yamllint in requirements.txt and .in file"

* "Moved install yamllint step to its own stage"

* "Updated yamllint test"

* "Updated circleci step"

* "Reformatted code"

* "Added yamllint to circleci steps"

* "Added checkout block to yamllint step"

* "Trimmed whitespace"

* "Undid yamllint step"

* "Specified directory name for yamllint test"

* "Fixed yamlint errors"

* "Fixed yamllint errors"

* "Fixed yamllint errors"

* "Fixed yamllint errors"

* "Ignore pathway in linting"

* "Added ignore venv pathway during linting"

* "Updated ignore block"

* "Updated ignore block"

* "Removed ignore block"

* "Updated ignore block"

* "Indented base as a list"

* "Indented base item"

* Update tests/sql/moz-fx-data-shared-prod/search_derived/mobile_search_clients_last_seen_v1/test_day_bit_shifting/expect.yaml

Co-authored-by: Anthony Miyaguchi <acmiyaguchi@gmail.com>

* "Resolved linting errors"

* "Referenced tables put back on same line"

* "Fixed linting error"

* Update sql/moz-fx-data-shared-prod/account_ecosystem_derived/fxa_logging_users_daily_v1/metadata.yaml

Co-authored-by: Anthony Miyaguchi <acmiyaguchi@gmail.com>

* "Fixed linting error"

Co-authored-by: Anthony Miyaguchi <acmiyaguchi@gmail.com>
2020-10-29 17:24:55 -07:00
Daniel Thorn 415ee2fb62
Add ETL for Mozilla VPN acquisition funnel (#1465) 2020-10-29 15:01:17 -07:00
Ben Wu 1529510c6e
Add derived tables for blog.m.o google analytics (#1492) 2020-10-28 17:40:45 -04:00
William Lachance 1ca71429ea
Bug 1673903 - Add urlbar-searchmode to allowed SAP sources (#1494) 2020-10-28 14:40:22 -04:00
jailang 88b0e2066e Listify review_bugs in tests and metadata files 2020-10-28 09:01:43 -07:00
Daniel Thorn d6b2551854
Use resource id, not event id, when updating stripe resource tables (#1491) 2020-10-27 13:03:25 -07:00
Anna Scholtz 71ac2bc686 Exception handling for column sizes 2020-10-27 12:11:11 -07:00
Anna Scholtz 752cd03531 Script for determining column storage sizes 2020-10-27 12:11:11 -07:00
Anthony Miyaguchi b7695049c6
Fix #1457 - Generate and run Fenix ETL for GLAM in glam-fenix-dev (#1458)
* Resolve generated sql to glam-fenix-dev and change output in sql/ dir

* Add new script for testing glam-fenix queries

* Add generated sql for version control

* Use variables correctly in bash

* Remove latest versions from UDF

* Update test to generate minimum set of tables for nightly

* Commit generated queries for testing

* Cast only if not glob

* Ignore dryrun and publish view for glam-fenix-dev

* Fix linting error

* Update comments

* Use DST_PROJECT consistently in scripts

* Update comments

* Update script/glam/test/test_glean_org_mozilla_fenix_glam_nightly

Co-authored-by: Ben Wu <benjaminwu124@gmail.com>

* Update script/glam/generate_and_run_desktop_sql

Co-authored-by: Ben Wu <benjaminwu124@gmail.com>

Co-authored-by: Ben Wu <benjaminwu124@gmail.com>
2020-10-22 11:40:52 -07:00
Daniel Thorn 1b7ad0cfef
Setup more imports from Mozilla VPN CloudSQL (#1469) 2020-10-22 07:12:41 -07:00
Jeff Klukas 78ea72c5a5
Allow field addition for one day of clients_daily and friends (#1468)
* Allow field addition for one day of clients_daily and friends

Allows Airflow to apply https://github.com/mozilla/bigquery-etl/pull/1455

* Commit DAG changes
2020-10-21 13:30:54 -04:00
Sunah Suh c8d0136694
Bug 1671517: Add event counts to clients daily (#1455) 2020-10-20 13:29:21 -05:00
Jeff Klukas a9c531e3c6
Exempt a few files from dry run due to new table-level ACLs (#1462)
* Exempt a few files from dry run due to new table-level ACLs

The dry run service can no longer perform queries with wildcard table
specifications or access raw AET data. See https://github.com/mozilla-services/cloudops-infra/pull/2599

* Verbose referenced_tables for AET logging clients daily
2020-10-20 10:34:02 -05:00
Jeff Klukas eae0d6d3d2
Revert field order in smoot nondesktop (#1454)
Fixes https://github.com/mozilla/bigquery-etl/issues/1453
2020-10-19 14:23:19 -07:00
Anthony Miyaguchi b66f948bdd
Add CREATE OR REPLACE VIEW clause to geckoview_versions and move it to org_mozilla_fenix (#1452)
* Add CREATE OR REPLACE VIEW clause to geckoview_versions

* Move geckoview version to org_mozilla_fenix
2020-10-19 12:36:45 -07:00
Jeff Klukas 4538e7c749
Formalize product names in nondesktop_clients_last_seen (#1380)
The `nondesktop_clients_last_seen_v1` view was developed mostly as an
internal implementation detail for downstream tables, but it has become
useful in its own right. This PR formalizes the view by providing an alias
without a version modifier and it adds a `product` field with application
names that are short but more meaningful than the `app_name` field.

See discussion in https://jira.mozilla.com/browse/DO-330 about confusion that
has resulted from the name "Fennec iOS" used in dashboards, etc. This is a
step toward reducing that kind of confusion.

This PR also adds `contributes_to_2019_kpi` and `contributes_to_2020_kpi` fields
as source of truth for how we count KPI metrics. That logic is currently
copied and pasted in several places, which could lead to errors.

This will need a fair amount of review from data users before moving forward.
It will also require backfilling several downstream tables and communicating
the change.
2020-10-19 15:16:21 -04:00
Anthony Miyaguchi 49c9bbf340
Add view for geckoview versions with one row per build hour (#1450) 2020-10-19 09:26:28 -07:00
Daniel Thorn 9d7a566e1c
Refactor stripe tables to reduce confusion (#1444) 2020-10-16 15:08:44 -07:00
Anthony Miyaguchi 2521f926e7
Add daily schedule for geckoview versions (#1447)
* Add daily schedule for geckoview versions

* Remove unnecessary parameters
2020-10-16 14:37:30 -07:00
Anthony Miyaguchi 349dff3ca2
Add table to determine Fenix nightly mapping of builds to geckoview versions (#1419)
* Add initial incremental query for geckoview build dates

* Add initial tests for incremental query (WIP)

* Add files for initial tests

* Rework query so it doesn't fail during tests

* Fix schema so queries run

* Add passing test for init

* Add test for query aggregation

* Add metadata file for scheduling the query

* Move scripts from fenix_nightly to fenix

* Remove scheduling

* Add document strings.

* Change dataset reference and indent comments correctly

* Remove init and address feedback

* remove init file
* make query idempotent by appending window to each submission_date
* rename n_builds to n_pings
* reduce window size from 30 days to 14 days
* avoid use of subqueries

* Update tests for query

* Fix tests

* Add failing test for 100

* Fix query so it work across fx100 boundary

* Add linting fixes
2020-10-16 11:57:23 -07:00
Jeff Klukas b9ed5f1242
Bug 1654078 Limit geo to country level in regrets-reporter view (#1440)
* Bug 1654078 Limit geo to country level in regrets-reporter view

See https://bugzilla.mozilla.org/show_bug.cgi?id=1654078#c45
2020-10-15 15:18:26 -04:00
Anthony Miyaguchi a7271f0189
Replace udf.fenix_build_to_datetime with mozfun reference (#1441) 2020-10-15 11:54:22 -07:00
Jeff Klukas d809e6fa2d
Fix field name error causing Shredder to miss FxA account deletions (#1428)
While looking at Shredder logs, I noticed that all entries for FxA-related
derived tables show 0 deletions, like:

> 712215424909 bytes and deleted 0 rows from moz-fx-data-shared-prod.firefox_accounts_derived.fxa_users_daily_v1

It appears that `account.deleted` events populate a field named `uid` rather
than `user_id`. I was able to verify this by choosing a recent `uid` value
from a deletion event and counting events where that same value appears as
`user_id` in FxA logs. There were matching messages.
2020-10-13 14:44:14 -04:00
Daniel Thorn 6abddb0f0d
Fix backticks in stripe derived views (#1425) 2020-10-12 12:43:37 -07:00
Anna Scholtz cec9412abd
Fix backticks in stripe views (#1423)
* Fix backticks in stripe views

* Fix backticks in mozilla vpn view

Co-authored-by: Daniel Thorn <dthorn@mozilla.com>
2020-10-12 11:39:27 -07:00
Frank Bertsch cfe5973f80 Allow for cross-project references in routines
- Scrape all projects for routine defns when generating tests
- Create UDFs as non-temp for stored procedure tests
- Make assert functions default non-temp (to support above)
2020-10-09 15:17:56 -04:00
Daniel Thorn 228fc8e041
Fully qualify stripe and mozilla_vpn views (#1414) 2020-10-09 11:06:28 -07:00
Daniel Thorn 6ca03e713e
Export mozilla vpn waitlist from cloud SQL (#1397) 2020-10-09 10:18:30 -07:00
Anna Scholtz 0d51459bd1 Move dependencies to udf_js_lib 2020-10-08 10:30:22 -07:00
Anna Scholtz 5a8311e2af Update UDF parsing 2020-10-08 10:30:22 -07:00
Jeff Klukas 3e0f2e4511
Bug 1669516 Use `app_display_version` for Fenix AMO stats (#1394)
* Bug 1669516 Use `app_display_version` for Fenix AMO stats

* Use geckoview_version for fenix nightly

* Remove deprecated installs_v1

* Remove dev installs_v1
2020-10-08 08:34:32 -04:00
Daniel Thorn 203db55d16
Remove incorrect destination partition from stripe tasks (#1398) 2020-10-06 15:33:55 -07:00
Jeff Klukas 1bab64d7ff
Bug 1635918 Enable deletion of data from AET tables on user request (#1396)
* Bug 1635918 Enable deletion of data from AET tables on user request

* Skip dryrun

* Add to SOURCES

* Add timestamp to deletions
2020-10-06 16:30:01 -04:00
Anna Scholtz 518718d140
Move event_analysis bakck to mozfun (#1386) 2020-10-06 08:24:23 -04:00
Daniel Thorn 88ed89bd2c
Add stripe ETL to support Mozilla VPN dashboard (#1349) 2020-10-05 15:10:01 -07:00
Anna Scholtz 93bc51ba5e Move queries to right directories 2020-10-05 12:59:58 -07:00
Anna Scholtz bf33837ef7 Resolve rebase conflicts 2020-10-05 12:59:58 -07:00
Anna Scholtz d1c67dab53 Move projects into high-level sql/ folder 2020-10-05 12:59:58 -07:00