* Add first_seen_date to core_clients_daily and last seen
Supports KPI work for iOS and Focus apps.
See https://docs.google.com/document/d/1-sifTuu3lWd5umvaUmncFrdBIK6eKVTPzmGDLv6GDak/edit?ts=6078667e#
* Update tests
* Add new_profiles to mobile_usage
* Make sure is_new_profile reflects only current day
* Remove is_new_profile from core_clients_last_seen
This field could be confusing.
If we do `COUNTIF(is_new_profile)`,
we'll overcount since a client that appears on a single day will continue
to appear in clients_last_seen with is_new_profile=True carried over from the
original day of observation.
* Remove is_new_profile from core_clients_last_seen query
* bugfixes
* DAG change
* Update contributes_to_2021_kpi values
We are tracking a more limited set of mobile apps for 2021:
- Firefox for Android
- Firefox for iOS
- Firefox Focus for Android
- Firefox Focus for iOS
* Mark Lockwise for iOS as false
* Add migration script for joining against first seen table
* Update logic for is_new_profile
* Update templates to use DDL with partitioning/clustering
* Fix output of migrate tables to backfill-8
* Add instructions for backfilling
* Fix linting errors
* Add first_seen_date and related test fixtures
* Use is_new_profile instead of baseline_first_seen
* Update view for baseline_clients_first_seen
* Fix yamllint issues
* Set is_new_profile when submission matches first seen
* Include AS in table alias
* Nit: capitalize AS
* Update bigquery_etl/glean_usage/templates/baseline_clients_daily_v1.sql
Co-authored-by: Jeff Klukas <jklukas@mozilla.com>
* Update bigquery_etl/glean_usage/templates/baseline_clients_daily_v1.sql
Co-authored-by: Jeff Klukas <jklukas@mozilla.com>
* Update clustering specification
Co-authored-by: Jeff Klukas <jklukas@mozilla.com>
* Add ACL metadata for poucave access to live events view
* Remove table-level mozilla-confidential events live access for now
Semantically this makes sense but will dataset-level access is already provided
for this workgroup and automation is in place this may be come stale.
* Bug 1703362 Purge security.unexpectedload events
See https://bugzilla.mozilla.org/show_bug.cgi?id=1703362
The fix that reduces the quantity of events is scheduled to be included
in Firefox 88, released in late April. From 88 on, the rate of events
is expected to be reasonable, and the filter here will allow them again.
I will run a backfill for this table, starting from 2020-12-15.
* Refactor with CTE
* Move argument parser into shared function
* Move shared main entrypoint into common
* Update example script to include other usage queries
* Commit generated queries for example usage queries
* Parallelize generation of example queries
* Add docstring
* Remove ios example queries for daily and last seen
* Fix pydocstyle linting
* Add update_example_glean_usage to CI
* Add initial boilerplate for clients_first_seen
* Remove submission_timestamp as a field
* [wip] Join data against legacy fennec id if applicable
* Remove user facing view
* Revert "Remove user facing view"
This reverts commit a728a7882170eadad5413c7a7046c0f38297bb87.
* Add flag for fennec_id
* Update logic to limit rows in partitions to submission_date
* Add all sql in glean_usage to format ignores
* Separate init and query
* Add default encoders for testing sql
* Add test for initialization of baseline clients first seen in fenix
* Update query to update over previous history
* Add test for aggregation
* Add generated sql and tests for simple baseline clients first seen
* Add dry-run exceptions for clients first seen tables
* Add clients first seen to generated sql
* Update bigquery_etl/glean_usage/templates/baseline_clients_first_seen.metadata.yaml
Co-authored-by: Jeff Klukas <jklukas@mozilla.com>
* Update bigquery_etl/glean_usage/templates/baseline_clients_first_seen.metadata.yaml
Co-authored-by: Jeff Klukas <jklukas@mozilla.com>
* Group by sample id instead of min
* Add submission_date as baseline first seen date
Co-authored-by: Jeff Klukas <jklukas@mozilla.com>
.vscode's settings.json can get cluttered with installation-specific
settings, which can lead to confusing pull requests. Instead, let's use
the approach outlined here:
https://stackoverflow.com/a/48387809
This provides a set of defaults that people can use, which we can
extend over time.
This is a concrete step to allow querying events data over longer
time ranges. See https://bugzilla.mozilla.org/show_bug.cgi?id=1701712
Currently, Redash will reject longitudinal queries over `events` as being
too expensive, but this cannot take into account the benefits of the clustering
applied on the underlying tables. This 1% table sidesteps the issue for cases
where a 1% sample is sufficient.
We can consider expanding the time range beyond six months if we see clear use
cases for it.
* DAG scheduling and docs cleanup
This adjusts some DAG schedules in an attempt to minimize the number of
BQSensor rescheduling emails we receive under normal circumstances.
Of note, the `copy_deduplicate` DAG is often taking a little longer than an
hour to complete, meaning that DAGs starting at 02:00 are likely to hit
reschedules. We do not address that problem here in hopes that performance
improvements to copy_deduplicate can bring performance back under 1 hour.
We also make some documentation fixups, including consistently using `|`
rather than `>` or `>-` as our multi-line string indicator. This preserves
whitespace that is relevant for markdown processing.
* Update stripe schedule