* Add aggregation by country
* Copy the initial Italy focus query
This initial commit provides a baseline for the
next commits to ease review, since this initial
code was already reviewed.
* Cleanup the country list and replace FULL OUTER with LEFT joins
* Aggregate by city for cities with more than 15k inhabitants
The actual 15k limit is enforced at ingestion time.
This further limits the resulting cities to ones with at
least 1000 active daily users.
* Produce hourly aggregates
* Move the query to the `internet_outage` dataset
* Provide automatic daily scheduling through AirFlow
* Tweak the SQL addressing review comments
This additionally changes the `CAST` to
`SAFE_CAST` to account for weirdnesses in
the data.
* Add ssl_error_prop
* Add missing_dns_success
* Add missing_dns_failure
* Lower the minimum reported bucket size to 50
This allows us to match the EDA by Saptarshi and
to have a better comparable baseline.
* Document the oddities around `submission_timestamp_min`
* Set country to NULL for unknown search engines
Previously the behavior was to error on unknown search engines
in the revenue join. However this causes failures when we
add new normalized search engines but don't have revenue data
available for them.
Instead we will set the country to NULL, and let this data
fall out during the join with revenue data, which won't
have that country.
* Cast null to string
The experiments in user properties were previously created as a
string in and of themselves; this led to a stringified array:
"[\"exp-1\", \"exp-2\"]"
when in fact what we want is:
["exp-1", "exp-2"]
This change fixes that.
* Update onboarding events user properties
- Make experiments a list of experiment-branch strings
rather than one property per-experiment
- Update platform to just be the os name, and not the version
* Reformat sql file
* Remove array_concat
Previously, we were adding it as a user property in the user_props
json blob. However Amplitude did not correctly interpret this field
as an Amplitude top-level user property. This change is paralleled
with a new JSON import config that adds os to the import.
* Remove queries that write to derived-datasets.telemetry
The kpi_dashboard query is out of date; the 2020 dashboard is implemented in
Databricks and performs a modified version of the query logic.
We remove this table completely and we will send out a fx-data-dev email
to that effect.
Otherwise, the desktop exact mau table was the last scheduled query writing
to the telemetry dataset in the derived-datasets project.