bigquery-etl

Граф коммитов

Автор	SHA1	Сообщение	Дата
Anna Scholtz	22d2540e2f	Fix test formatting	2020-07-02 13:37:38 -07:00
Anna Scholtz	d802ae3810	Add doc generation tests	2020-07-02 13:37:38 -07:00
Anna Scholtz	b6a7e84339	Generate docs for a single project	2020-07-02 13:37:38 -07:00
Anna Scholtz	88ecf499cd	Generate docs	2020-07-02 13:37:38 -07:00
Jeff Klukas	9fb4c4c90d	Bug 1643683 Add Fenix DAU query for AMO stats (#1056 ) * Bug 1643683 Add Fenix DAU query for AMO stats Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2020-07-02 15:18:48 -04:00
Jeff Klukas	e8efd15766	Reduce complexity of getting histogram sum in main_summary (#1120 ) * Reduce complexity of getting histogram sum in main_summary Fixes #1119	2020-07-02 15:11:09 -04:00
Jeff Klukas	790fec1c52	mozfun.histogram.extract cleanup Follow-up to #1000 now that the function is published and I've had some chance to use it.	2020-07-02 12:31:31 -04:00
Jared Hirsch	f7d1aeb8a7	Bug 1649871 - Add new event to FxA Amplitude export * Add the new `fxa_activity - oauth_access_token_created` event to the list of events used to generate the daily rollup of users and devices	2020-07-02 08:46:32 -04:00
Jeff Klukas	ba18fca63e	Replace null keys with 'Unknown' in amo_stats_dau See https://github.com/mozilla/addons-server/issues/14790	2020-07-01 13:43:52 -04:00
Jeff Klukas	ff2e30da32	Revert "Reference shared-prod views when republishing to other projects (#1105 )" This reverts commit `4654bbda31`.	2020-07-01 13:38:21 -04:00
Jeff Klukas	6b354ae182	Fix mozfun udf reference	2020-07-01 12:51:05 -04:00
Jeff Klukas	10ff38610e	Support extraction of histogram in compact encoding (#1000 ) Per the [Compact String Encoding for Histograms Proposal](https://docs.google.com/document/d/1k_ji_1DB6htgtXnPpMpa7gX0klm-DGV5NMY7KkvVB00/edit#)	2020-07-01 11:57:58 -04:00
Jeff Klukas	2433edf538	Remove ORDER BY in global_outages query BQ reports an error "Result of ORDER BY queries cannot be clustered".	2020-07-01 10:55:33 -04:00
Jeff Klukas	baeff74751	Bug 1649754 Remove reference to derived-datasets dry run url We no longer have any destination tables in derived-datasets.	2020-07-01 09:52:12 -04:00
Anna Scholtz	b1d2ae8604	Generate internet_outages DAG (#1111 )	2020-07-01 09:19:31 -04:00
Alessio Placitelli	76bac7a98e	Create an ETL job for the Internet Outages (#1058 ) * Add aggregation by country * Copy the initial Italy focus query This initial commit provides a baseline for the next commits to ease review, since this initial code was already reviewed. * Cleanup the country list and replace FULL OUTER with LEFT joins * Aggregate by city for cities with more than 15k inhabitants The actual 15k limit is enforced at ingestion time. This further limits the resulting cities to ones with at least 1000 active daily users. * Produce hourly aggregates * Move the query to the `internet_outage` dataset * Provide automatic daily scheduling through AirFlow * Tweak the SQL addressing review comments This additionally changes the `CAST` to `SAFE_CAST` to account for weirdnesses in the data. * Add ssl_error_prop * Add missing_dns_success * Add missing_dns_failure * Lower the minimum reported bucket size to 50 This allows us to match the EDA by Saptarshi and to have a better comparable baseline. * Document the oddities around `submission_timestamp_min`	2020-07-01 06:44:40 +02:00
Jeff Klukas	4654bbda31	Reference shared-prod views when republishing to other projects (#1105 ) Fixes https://github.com/mozilla/bigquery-etl/issues/1075	2020-06-30 12:46:23 -04:00
Frank Bertsch	dce09ef690	Set country to NULL for unknown search engines (#1104 ) * Set country to NULL for unknown search engines Previously the behavior was to error on unknown search engines in the revenue join. However this causes failures when we add new normalized search engines but don't have revenue data available for them. Instead we will set the country to NULL, and let this data fall out during the join with revenue data, which won't have that country. * Cast null to string	2020-06-30 11:28:26 -04:00
Frank Bertsch	a693d88f37	Add submission date to ltv revenue join (#1103 )	2020-06-30 09:18:58 -04:00
Jeff Klukas	6c59168b0d	Better regex for Amplitude email export (#1100 )	2020-06-26 15:28:28 -04:00
Frank Bertsch	5fb305781c	Fix onboarding events user properties (#1099 ) The experiments in user properties were previously created as a string in and of themselves; this led to a stringified array: "[\"exp-1\", \"exp-2\"]" when in fact what we want is: ["exp-1", "exp-2"] This change fixes that.	2020-06-25 17:09:56 -04:00
Frank Bertsch	fe44ae8043	Fully qualify LTV UDFs (#1096 ) * Fully qualify LTV UDFs * Reformat sql	2020-06-25 15:06:22 -04:00
Jeff Klukas	9422909cfd	Retry once on "invalid snapshot time" when publishing views (#1095 ) Fixes #1001	2020-06-25 13:44:54 -04:00
Jeff Klukas	371f47e55b	Fix view regressions (#1097 ) * Fix to events view Fixes a regression introduced in https://github.com/mozilla/bigquery-etl/pull/1092 * Additional regression fixes	2020-06-25 13:01:59 -04:00
Jeff Klukas	6969b83f49	Remove all references to derived-datasets in views (#1092 ) These references are all to views that existed in both projects or to static tables that have been copied into shared-prod.	2020-06-25 10:42:44 -04:00
Frank Bertsch	25cd0cbfcf	Update onboarding events user properties (#1093 ) * Update onboarding events user properties - Make experiments a list of experiment-branch strings rather than one property per-experiment - Update platform to just be the os name, and not the version * Reformat sql file * Remove array_concat	2020-06-24 16:52:12 -04:00
Anna Scholtz	9c6ab95b46	bqetl_clients_daily DAG	2020-06-24 12:18:04 -07:00
Anna Scholtz	49ee9b34b1	Rename bqetl_clients to bqetl_clients_daily	2020-06-24 12:18:04 -07:00
Anna Scholtz	38e6acbee6	DAGs for client queries	2020-06-24 12:18:04 -07:00
Daniel Thorn	19a00ebce3	Add shredder script to forward deletion requests to amplitude (#1082 )	2020-06-24 11:20:17 -07:00
Frank Bertsch	21766bc700	Version search contribution and add task (#1091 )	2020-06-24 13:00:44 -04:00
Frank Bertsch	33ead2d75b	Add LTV and revenue join query (#1033 ) * Add LTV and revenue join query * Add docstrings to UDFs * Add normalization step to ltv query * Fix formatting * Don't dryrun revenue query * Fix UDFs and trailing comma * Format one last time * Update udf/parquet_array_sum.sql description * Update udf/parquet_array_sum.sql	2020-06-24 12:57:02 -04:00
Anna Scholtz	ecdc77a537	Add comment to addon_names metadata Co-authored-by: Jeff Klukas <jeff@klukas.net>	2020-06-24 08:54:22 -07:00
Anna Scholtz	48430fcfe1	Move addons queries to bqetl_addons DAG	2020-06-24 08:54:22 -07:00
Ding Ding	2cba4ac96a	Fix udf test	2020-06-23 21:08:30 -04:00
Ding Ding	0a2876e133	Add search_metric_contribution	2020-06-23 21:08:30 -04:00
Ding Ding	99919ea706	Remove query to add UDF first	2020-06-23 21:08:30 -04:00
Ding Ding	87351b6572	Update search_metric_contribution and add udf function - add quantile_search_metric_contribution.sql in udf folder - some fixes on search_metric_contribution - update search_metric_contribution metadata	2020-06-23 21:08:30 -04:00
Ding Ding	d5b7a71bfd	update submission_date time period	2020-06-23 21:08:30 -04:00
Ding Ding	af777bd8e8	add search_metric_contribution	2020-06-23 21:08:30 -04:00
Jeff Klukas	7a1441cbba	Add event type to email export (#1088 )	2020-06-23 17:09:31 -04:00
Frank Bertsch	dbca0c24cf	Move fxa os to top-level user property Previously, we were adding it as a user property in the user_props json blob. However Amplitude did not correctly interpret this field as an Amplitude top-level user property. This change is paralleled with a new JSON import config that adds os to the import.	2020-06-23 11:21:23 -04:00
Jeff Klukas	a8ca6f464d	Remove queries that write to derived-datasets.telemetry (#1084 ) * Remove queries that write to derived-datasets.telemetry The kpi_dashboard query is out of date; the 2020 dashboard is implemented in Databricks and performs a modified version of the query logic. We remove this table completely and we will send out a fx-data-dev email to that effect. Otherwise, the desktop exact mau table was the last scheduled query writing to the telemetry dataset in the derived-datasets project.	2020-06-23 10:55:22 -04:00
Jeff Klukas	c5bb15ee60	Eliminate literal '{%' in comment	2020-06-23 09:22:36 -04:00
xrao2	50012d8a0b	update aggregate_search_counts (#1081 )	2020-06-22 13:29:16 -07:00
Ben Wu	d295db373d	Remove search aggregate union with v6 (#1079 )	2020-06-17 10:38:06 -04:00
Frank Bertsch	c7abda1f49	Bug 1640226 - Fix FxA Amplitude properties	2020-06-17 09:31:43 -04:00
Jeff Klukas	42fe8c7e42	Avoid syntax that conflicts with Jinja	2020-06-16 16:10:35 -04:00
Jeff Klukas	09d631cc4e	Use snapshot_date as submission_timestamp	2020-06-16 09:39:20 -04:00
Jeff Klukas	c7c63112fa	Consolidate regex	2020-06-16 09:39:20 -04:00

1 2 3 4 5 ...

1258 Коммитов Все ветки Поиск

1258 Коммитов

Все ветки