* Create tables that have state values per day
* Change Airflow DAG
* Move markov states to cols rather than array
* Move bot/bad client filter to materialized table
* Add install_source and consecutive_days_seen features
* Add field to CTE
* Use jinja vars instead of sql variables
* Use correct UDF incantation
* added firefox_ios_derived.clients_activation_v1 and corresponding view
* fixing a missing seperator in firefox_ios_derived.clients_activation_v1 checks
* adding firefox_ios_derived.clients_activation_v1 to shredder configuration
* removed is_suspicious_device_client as it should not be there, thanks bani for pointing this out
* fixed black formatting error inside shredder/config.py
* applied bqetl formatting
* minor styling tweak as suggested by bani in PR#4631
* deleting fenix_derived/firefox_android_clients_v2, v1 will remain the active model
* removed fenix_derived.firefox_android_clients_v2 from shredder config
* Add support for assigning Airflow tasks to task groups
* Generate separate Airflow tasks for glean_usage
* Remove Airflow dependencies from old glean_usage tasks
* add code for cohort_daily_statistics using clients_first_seen_v2 with new columns from clients_first_seen_v2
* take out extra sample_id
* Update sql/moz-fx-data-shared-prod/telemetry_derived/cohort_daily_stats_clients_frst_seen_v2/query.sql
switching column names - original was swapped
Co-authored-by: Alexander <anicholson@mozilla.com>
* update column names- change cohort_date to first_seen_date, make more descriptive; take out client_id and sample_id in the final table; take out extraneous columns that are not used in final table
* fix group by - days_seen_bits not days_interacted_bits
* take out second_seen_date, irrelevant
* change date _activity to submission_date
* replace submission_date_activity with client_activity
* add new line at end of schema.yaml file
* refactor code to use clients_first_seen_v2, originally commited cohorts_daily_statistics_v1 code in the v2 file
* add cohort_daily_statistics_v2 job to DAG
* add cohort_daily_statistics_v2 job to DAG, take out submission_date and add activity_date to query.sql
* delete now needless dags folder
* correct alias of table
* change submission_date to activity_date
* fix column name apple_model to apple_model_id
* add days_seen_dau_bits and other calculations based on this
* add attribution_dlsource to table
* take out underscore from column name, attribution_dlsource
* revise comment - 196 days not 180 days
* add all the other columns from clients_first_seen_v2, update schema.yaml file with new columns
* take out sample_id, fix schema
* take out document_id, dl_token, app_build_id columns, rename activity_date to submission_date, rename cohort_date to first_seen_date to match clients_first_seen_28_days_later
* move files from cohort_daily_statistics_v2 to desktop_cohort_daily_retention_v1 to reflect name change, take out extraneous colums such as xpcom_abi, attribution_dlsource, engine_data columns
---------
Co-authored-by: Alexander <anicholson@mozilla.com>
* Add desktop_acquisition_funnel view
* Update reference
* Update view.sql
Took out some of the TODO comments around naming to stay consistent with the table it is reading as well as reduce effort to make changes to the spoke-default view that is currently setup with test data.
---------
Co-authored-by: gkabbz <gkabbz@gmail.com>
* Added a filter to only include playstore data
In keeping the bottom of the funnel consistent with the upper funnel, we have to only include installs from play store in the bottom of the funnel metrics
* for fenix_derived.funnel_retention_clients_week_* tables making sure we only include playstore users
* updating the changes as requested by soGaussian to expose to users the install_source field to enable filtering
---------
Co-authored-by: richard baffour <baffour345@gmail.com>
* Glam - fix legacy windows & release probes' sample count going fwd
* Glam FOG accounts for sampling when calculating total_sample for windows & release probes
* fog - fix client count and sample count
* Add channel filtering for fog
Previously, NULL values in the join keys didn't join, resulting
in duplicate rows. This change will coalesce those to empty
strings and NULLIFY them in the view.
* Add ga_clients_v1 table & view
- Query from ga_sessions
- Fix tests
* Use correct scheduling parameters
Co-authored-by: Alexander <anicholson@mozilla.com>
* Move HAVING clause to WHERE
Co-authored-by: Alexander <anicholson@mozilla.com>
* Change CTE name
Co-authored-by: Alexander <anicholson@mozilla.com>
---------
Co-authored-by: Alexander <anicholson@mozilla.com>
* migrates old pingcentre onboarding artifacts to new firefox_desktop view
* generate event rollup dag
* generate review checker dag
* update messaging system dag
* incl project in table names
---------
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
* Add gclid_conversions table & view
This table will support the desktop conversion events.
Each valid GCLID will have any associated conversion events.
See the decision brief:
https://docs.google.com/document/d/1T8ArA9r8HDMTj1ES9NHfJFv2gUWo7w0MjG07iXtuUOI
* Use correct table name
* Use new stub attribution dataset; clarify activity_date
* Use correct date_partition_parameter
Co-authored-by: Alexander <anicholson@mozilla.com>
* Include activity_date as parameter
* Use INNER instead of LEFT joins
* Update doc strings to clarify GCLID vs GA Session
---------
Co-authored-by: Alexander <anicholson@mozilla.com>
* Add derived stub attribution logs
This table keeps triplets from the stub attribution logs.
The triplet of (dl_token, ga_client_id, stub_session_id)
will only ever appear once here.
See the associated decision brief:
https://docs.google.com/document/d/1L4vOR0nCGawwSRPA9xiR8Hmu_8ozCGUecXAtBWmGGA0/edit
* Move stub attribution table to new dataset
In order to ensure limited access to the stub attribution service
data without significantly decreasing developer velocity, we
move these tables to a new dataset. That dataset has the defaults
we want for all stub attribution log data:
- Defaults to just read access to data-science/DUET workgroup
- No read/write access for DE
We will backfill via the bqetl_backfill DAG.
* Rename view
* Use correct dataset name in view
* Skip dryrun; no access
* Fix checks to filter on partitions
* Don't print "missing checks file" on success
Previously, the statement that checks.sql files
were missing was printed on any execution of the for
statement. ("else" clauses after "for"s execute after
completion of the "for" clause).
Instead, we want to print only when there are no files.