Граф коммитов

1258 Коммитов

Автор SHA1 Сообщение Дата
Anna Scholtz 22d2540e2f Fix test formatting 2020-07-02 13:37:38 -07:00
Anna Scholtz d802ae3810 Add doc generation tests 2020-07-02 13:37:38 -07:00
Anna Scholtz b6a7e84339 Generate docs for a single project 2020-07-02 13:37:38 -07:00
Anna Scholtz 88ecf499cd Generate docs 2020-07-02 13:37:38 -07:00
Jeff Klukas 9fb4c4c90d
Bug 1643683 Add Fenix DAU query for AMO stats (#1056)
* Bug 1643683 Add Fenix DAU query for AMO stats

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2020-07-02 15:18:48 -04:00
Jeff Klukas e8efd15766
Reduce complexity of getting histogram sum in main_summary (#1120)
* Reduce complexity of getting histogram sum in main_summary

Fixes #1119
2020-07-02 15:11:09 -04:00
Jeff Klukas 790fec1c52 mozfun.histogram.extract cleanup
Follow-up to #1000 now that the function is published and I've had some
chance to use it.
2020-07-02 12:31:31 -04:00
Jared Hirsch f7d1aeb8a7 Bug 1649871 - Add new event to FxA Amplitude export
* Add the new `fxa_activity - oauth_access_token_created` event to the
list of events used to generate the daily rollup of users and devices
2020-07-02 08:46:32 -04:00
Jeff Klukas ba18fca63e Replace null keys with 'Unknown' in amo_stats_dau
See https://github.com/mozilla/addons-server/issues/14790
2020-07-01 13:43:52 -04:00
Jeff Klukas ff2e30da32 Revert "Reference shared-prod views when republishing to other projects (#1105)"
This reverts commit 4654bbda31.
2020-07-01 13:38:21 -04:00
Jeff Klukas 6b354ae182 Fix mozfun udf reference 2020-07-01 12:51:05 -04:00
Jeff Klukas 10ff38610e
Support extraction of histogram in compact encoding (#1000)
Per the [Compact String Encoding for Histograms Proposal](https://docs.google.com/document/d/1k_ji_1DB6htgtXnPpMpa7gX0klm-DGV5NMY7KkvVB00/edit#)
2020-07-01 11:57:58 -04:00
Jeff Klukas 2433edf538 Remove ORDER BY in global_outages query
BQ reports an error "Result of ORDER BY queries cannot be clustered".
2020-07-01 10:55:33 -04:00
Jeff Klukas baeff74751 Bug 1649754 Remove reference to derived-datasets dry run url
We no longer have any destination tables in derived-datasets.
2020-07-01 09:52:12 -04:00
Anna Scholtz b1d2ae8604
Generate internet_outages DAG (#1111) 2020-07-01 09:19:31 -04:00
Alessio Placitelli 76bac7a98e
Create an ETL job for the Internet Outages (#1058)
* Add aggregation by country

* Copy the initial Italy focus query

This initial commit provides a baseline for the
next commits to ease review, since this initial
code was already reviewed.

* Cleanup the country list and replace FULL OUTER with LEFT joins

* Aggregate by city for cities with more than 15k inhabitants

The actual 15k limit is enforced at ingestion time.
This further limits the resulting cities to ones with at
least 1000 active daily users.

* Produce hourly aggregates

* Move the query to the `internet_outage` dataset

* Provide automatic daily scheduling through AirFlow

* Tweak the SQL addressing review comments

This additionally changes the `CAST` to
`SAFE_CAST` to account for weirdnesses in
the data.

* Add ssl_error_prop

* Add missing_dns_success

* Add missing_dns_failure

* Lower the minimum reported bucket size to 50

This allows us to match the EDA by Saptarshi and
to have a better comparable baseline.

* Document the oddities around `submission_timestamp_min`
2020-07-01 06:44:40 +02:00
Jeff Klukas 4654bbda31
Reference shared-prod views when republishing to other projects (#1105)
Fixes https://github.com/mozilla/bigquery-etl/issues/1075
2020-06-30 12:46:23 -04:00
Frank Bertsch dce09ef690
Set country to NULL for unknown search engines (#1104)
* Set country to NULL for unknown search engines

Previously the behavior was to error on unknown search engines
in the revenue join. However this causes failures when we
add new normalized search engines but don't have revenue data
available for them.

Instead we will set the country to NULL, and let this data
fall out during the join with revenue data, which won't
have that country.

* Cast null to string
2020-06-30 11:28:26 -04:00
Frank Bertsch a693d88f37
Add submission date to ltv revenue join (#1103) 2020-06-30 09:18:58 -04:00
Jeff Klukas 6c59168b0d
Better regex for Amplitude email export (#1100) 2020-06-26 15:28:28 -04:00
Frank Bertsch 5fb305781c
Fix onboarding events user properties (#1099)
The experiments in user properties were previously created as a
string in and of themselves; this led to a stringified array:
    "[\"exp-1\", \"exp-2\"]"
when in fact what we want is:
    ["exp-1", "exp-2"]

This change fixes that.
2020-06-25 17:09:56 -04:00
Frank Bertsch fe44ae8043
Fully qualify LTV UDFs (#1096)
* Fully qualify LTV UDFs

* Reformat sql
2020-06-25 15:06:22 -04:00
Jeff Klukas 9422909cfd
Retry once on "invalid snapshot time" when publishing views (#1095)
Fixes #1001
2020-06-25 13:44:54 -04:00
Jeff Klukas 371f47e55b
Fix view regressions (#1097)
* Fix to events view

Fixes a regression introduced in https://github.com/mozilla/bigquery-etl/pull/1092

* Additional regression fixes
2020-06-25 13:01:59 -04:00
Jeff Klukas 6969b83f49
Remove all references to derived-datasets in views (#1092)
These references are all to views that existed in both projects or to
static tables that have been copied into shared-prod.
2020-06-25 10:42:44 -04:00
Frank Bertsch 25cd0cbfcf
Update onboarding events user properties (#1093)
* Update onboarding events user properties

- Make experiments a list of experiment-branch strings
  rather than one property per-experiment
- Update platform to just be the os name, and not the version

* Reformat sql file

* Remove array_concat
2020-06-24 16:52:12 -04:00
Anna Scholtz 9c6ab95b46 bqetl_clients_daily DAG 2020-06-24 12:18:04 -07:00
Anna Scholtz 49ee9b34b1 Rename bqetl_clients to bqetl_clients_daily 2020-06-24 12:18:04 -07:00
Anna Scholtz 38e6acbee6 DAGs for client queries 2020-06-24 12:18:04 -07:00
Daniel Thorn 19a00ebce3
Add shredder script to forward deletion requests to amplitude (#1082) 2020-06-24 11:20:17 -07:00
Frank Bertsch 21766bc700
Version search contribution and add task (#1091) 2020-06-24 13:00:44 -04:00
Frank Bertsch 33ead2d75b
Add LTV and revenue join query (#1033)
* Add LTV and revenue join query

* Add docstrings to UDFs

* Add normalization step to ltv query

* Fix formatting

* Don't dryrun revenue query

* Fix UDFs and trailing comma

* Format one last time

* Update udf/parquet_array_sum.sql description

* Update udf/parquet_array_sum.sql
2020-06-24 12:57:02 -04:00
Anna Scholtz ecdc77a537 Add comment to addon_names metadata
Co-authored-by: Jeff Klukas <jeff@klukas.net>
2020-06-24 08:54:22 -07:00
Anna Scholtz 48430fcfe1 Move addons queries to bqetl_addons DAG 2020-06-24 08:54:22 -07:00
Ding Ding 2cba4ac96a Fix udf test 2020-06-23 21:08:30 -04:00
Ding Ding 0a2876e133 Add search_metric_contribution 2020-06-23 21:08:30 -04:00
Ding Ding 99919ea706 Remove query to add UDF first 2020-06-23 21:08:30 -04:00
Ding Ding 87351b6572 Update search_metric_contribution and add udf function
- add quantile_search_metric_contribution.sql in udf folder
- some fixes on search_metric_contribution
- update search_metric_contribution metadata
2020-06-23 21:08:30 -04:00
Ding Ding d5b7a71bfd update submission_date time period 2020-06-23 21:08:30 -04:00
Ding Ding af777bd8e8 add search_metric_contribution 2020-06-23 21:08:30 -04:00
Jeff Klukas 7a1441cbba
Add event type to email export (#1088) 2020-06-23 17:09:31 -04:00
Frank Bertsch dbca0c24cf Move fxa os to top-level user property
Previously, we were adding it as a user property in the user_props
json blob. However Amplitude did not correctly interpret this field
as an Amplitude top-level user property. This change is paralleled
with a new JSON import config that adds os to the import.
2020-06-23 11:21:23 -04:00
Jeff Klukas a8ca6f464d
Remove queries that write to derived-datasets.telemetry (#1084)
* Remove queries that write to derived-datasets.telemetry

The kpi_dashboard query is out of date; the 2020 dashboard is implemented in
Databricks and performs a modified version of the query logic.
We remove this table completely and we will send out a fx-data-dev email
to that effect.

Otherwise, the desktop exact mau table was the last scheduled query writing
to the telemetry dataset in the derived-datasets project.
2020-06-23 10:55:22 -04:00
Jeff Klukas c5bb15ee60 Eliminate literal '{%' in comment 2020-06-23 09:22:36 -04:00
xrao2 50012d8a0b
update aggregate_search_counts (#1081) 2020-06-22 13:29:16 -07:00
Ben Wu d295db373d
Remove search aggregate union with v6 (#1079) 2020-06-17 10:38:06 -04:00
Frank Bertsch c7abda1f49 Bug 1640226 - Fix FxA Amplitude properties 2020-06-17 09:31:43 -04:00
Jeff Klukas 42fe8c7e42 Avoid syntax that conflicts with Jinja 2020-06-16 16:10:35 -04:00
Jeff Klukas 09d631cc4e Use snapshot_date as submission_timestamp 2020-06-16 09:39:20 -04:00
Jeff Klukas c7c63112fa Consolidate regex 2020-06-16 09:39:20 -04:00