Граф коммитов

186 Коммитов

Автор SHA1 Сообщение Дата
Eduardo Filho 9fbfac3ac0
GLAM historical aggregates tables (#4795) 2024-01-10 16:29:35 -05:00
Katie Windau 1c4102abed
Update bqetl_google_analytics_derived_ga4 yaml configs (#4780)
* Update bqetl_google_analytics_derived_ga4 yaml configs

* Switch to using countifs for more readability

* reformat and switch to countifs for readability
2024-01-04 15:44:28 -06:00
Katie Windau 6220025220
Add new table www_site_metrics_summary_v2 using new GA4 data (#4776)
* Initial draft

* adding new GA4 dag

* fixing source for browser

* work in progress

* change group by to be explicit

* adding non_fx_sessions

* adding non-fx-sessions

* adding campaign to query

* adding new column to group by

* added ad_content column

* Adding downloads column

* Adding non fx downloads

* reformat the SQL

* Add owner to new DAG

* Updata data types in the schema file

* Add missing comma

* Update query.sql

* Update start date and reduce from tier1 -> tier2

* Update sql/moz-fx-data-marketing-prod/ga_derived/www_site_metrics_summary_v2/metadata.yaml

Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>

* Update sql/moz-fx-data-marketing-prod/ga_derived/www_site_metrics_summary_v2/metadata.yaml

Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>

* Update sql/moz-fx-data-marketing-prod/ga_derived/www_site_metrics_summary_v2/query.sql

Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>

* Update sql/moz-fx-data-marketing-prod/ga_derived/www_site_metrics_summary_v2/query.sql

Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>

* Update sql/moz-fx-data-marketing-prod/ga_derived/www_site_metrics_summary_v2/query.sql

Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>

* update source and medium columns to use last touch attribution instead of first touch attribution

* update group by

---------

Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>
2024-01-04 15:01:47 -06:00
Katie Windau 30a4e9ef30
Fixing DAG start time (#4762) 2023-12-29 11:17:45 -06:00
Katie Windau cb31189be4
DENG-2262 - update DAG run time from 8am UTC daily to 6:30pm UTC daily (#4756) 2023-12-28 14:35:02 -06:00
Katie Windau ebe5b4cfb4
DENG-2262 - Creating new desktop_installs_v1 table (#4754)
* DENG-2262 - add new DAG bqetl_desktop_installs_v1

* Initial commit for DENG-2262

* DENG-2262 - reformatted query.sql

* DENG-2262 add empty new line to end of view.sql
2023-12-28 13:15:04 -06:00
Leli c05aec0f9b
DENG-1728 adding glean_app metrics to bigquery-etl (#4720)
* adding telemetry_dev_cycle_derived to bigquery_etl

* Update dags.yaml

Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>

* changes after code review

* move to _external dataset

* rename table

* fix defaults

* schema from file

---------

Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>
2023-12-20 20:14:11 +01:00
Alexander d96618dce2
Bug 1815242 - Update bqetl_pocket triage notes (#4733) 2023-12-20 12:52:54 -05:00
rzhao b1c6c6d7c8
feat(): Added new data to mobile feature usage tables (logins data for iOS and Fenix) (#4648)
* update

* This is the modified codes following suggestions in https://github.com/mozilla/bigquery-etl/pull/4467. This pull request aims to update codes that create four mobile feature usage tables

* updated codes following comments from kik in 72d6d71910

* update feature usage table codes following suggestions

* added new data to mobile feature usage table (logins data for iOS) and modified to add unnest for nested values

* update codes based on suggestion from https://github.com/mozilla/bigquery-etl/pull/4648/files

* update metadata.yaml and dags.yaml according to suggestions

* update metadata.yaml and dags.yaml according to suggestions

* remove clustering and references in all the metadata files, add LEFT JOIN to all the UNNEST, modify SQL files to folllow the incremental submission date format, and rename dau as events(metrics)_ping_client_count

* updated schema to reflect the name change for dau

* update according to comments on dec 15

* update the distinct client count name

* fix the yaml errors

* fix no new line error fordags.yaml

---------

Co-authored-by: Ruoxi Zhao <rzhao@rzhao-37509.local>
Co-authored-by: Ruoxi Zhao <rzhao@rzhao-37509.lan>
2023-12-15 15:08:58 -08:00
Katie Windau d5ffd4fe35
Fb 1866834 new use counter data (#4713)
* 1866834 - add new DAG bqetl_use_counter_analysis

* 1866834 - adding fenix & firefox use_counters_v1 tables and telemetry use_counters view

* 1866834 - fixing yaml file formatting

* 1866834 - fix dags.yaml format - remove trailing space

* 1866834 - add cast to numeric for rate to match schema for rate

* 1866834 - Remove unnecessary alias

* 1866834 - remove unnecessary alias

* Update and rename view.sql to view.sql

* 1866834 - test removing view

* 1866834 - update table names

* 1866834 - fixing fenix_and_firefox_use_counters view name and source table references
2023-12-15 13:13:36 -06:00
Daniel Thorn ffe8e304e2
Remove dthorn as dag owner (#4695) 2023-12-15 10:19:42 -08:00
Daniel Thorn b0bfc65052
DENG-965 - symbolicate and signaturize crash pings (#4642) 2023-12-12 08:57:52 -08:00
Anna Scholtz e8f3f759d5
Public GLAM datasets (#4606)
* Public GLAM datasets

* Remove Fenix GLAM datasets
2023-12-07 14:56:55 -08:00
kik-kik 377685cac9
fixing broken test for firefox_ios_derived.baseline_clients_yearly_v1 (#4645) 2023-12-04 10:48:56 -05:00
Anna Scholtz 7087dbff30
Separate Airflow tasks for glean_usage (#4588)
* Add support for assigning Airflow tasks to task groups

* Generate separate Airflow tasks for glean_usage

* Remove Airflow dependencies from old glean_usage tasks
2023-11-30 09:48:17 -08:00
Lucia 3db53758d2
Correct DAG description as DAG is currently active. (#4596)
Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
2023-11-22 16:06:45 +01:00
Frank Bertsch cbb843e455
Add ga_sessions_v1 table & view (#4554)
* Add ga_sessions_v1 table & view

This table aggregates session-level data from GA.

* Rename nullify string func

* Apply suggestions from code review

Co-authored-by: Alexander <anicholson@mozilla.com>

* Add upstream backfill deps

* Move depends_on to correct section

---------

Co-authored-by: Alexander <anicholson@mozilla.com>
2023-11-16 15:58:33 -05:00
Lucia b3abfc42ce
Update scheduler of aggregates to run after upstreams. (#4503)
* Update scheduler of aggregates to run after upstreams.

* Update dags for new scheduler of analytics_aggregates

* Update dag bqetl_search

* Remove DAG.

---------

Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
2023-11-06 17:55:12 +01:00
Rebecca BurWei 73a5535b67
Urlbar events: nested (long) instead of wide (#4373)
* feat: urlbar events final release

* feat: new result types

* feat: add interaction and group

* fix: date

* fix: use BQ builtin for UUIDs

* Add the view_v2'

* Add new table to the DAG

* fix CI error

fix ci error

* remove teon brooks

* Incorporate feedback by Curtis

Incorporate feedback from Curtis

---------

Co-authored-by: Alekhya Kommasani <akommasani@mozilla.com>
Co-authored-by: Alekhya <88394696+alekhyamoz@users.noreply.github.com>
2023-10-25 12:55:00 -04:00
akkomar 66729aa702
FXA-6721 Setup import of accounts table from FxA production CloudSQL (#4423) 2023-10-25 09:50:25 +02:00
Sergio E. Betancourt 2b2697e8f3
[RS-826] New job to calculate newtab visits -> activity stream sessions (#4387)
* New job to calculate newtab visits -> activity stream sessions

* Removing newline chars at end of file

* Removing newline chars at end of file

* Removing newline chars at end of file

* Addressing comment suggestions

* Format

* Add bqetl_ads DAG

* Add ACL to nt_visits_to_sessions_conversion_factors_daily_v1

* Add metadata files

* Add view to dry_run skip list

* Oops, fix the view

---------

Co-authored-by: Curtis Morales <cmorales@mozilla.com>
2023-10-24 12:51:14 -04:00
Alekhya 6f3d34ba67
DS3244 - Add derived datasets for review checker data (#4447)
* Add review checker derived datasets

* Add bqetl_review_checker dag

Fix

* Fix CI validate dag step

* Incorporate feedback from Alex

* Fix CI

* change client last seen to clients first seen

change client last seen to clients first seen

* fix dag
2023-10-18 16:57:15 -04:00
Anna Scholtz 35ae323487
Funnel generators POC (#4390)
* Add funnel generation logic

* Example funnel config

* Fix funnel columns

* funnel generation dimensions

* Optimize segmenting generated funnels

* Add funnel generation docs

* Schedule generated funnels

* Skip DAGs with no tasks

* Add background info funnel generator

* Add funnel generation tests

* Fix join_previous_step_on

* Add funnel example config
2023-10-12 14:05:08 -07:00
Frank Bertsch 79fa5487c3
Bug 1852517 - Add execution delta to metadata & dag (#4239)
* Add execution delta to metdata & dag

* Spelling fix

* Update execution hour

---------

Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com>
2023-10-12 18:18:24 +02:00
Alekhya 880f386fbf
Revert "DENG1546 -Add a derived dataset for serp events (#4325)" (#4406)
This reverts commit f0b6089b86.
2023-10-10 21:06:12 -04:00
Alekhya f0b6089b86
DENG1546 -Add a derived dataset for serp events (#4325)
* Add a derived dataset for serp events

* Fix CI issues
2023-10-10 14:40:44 -04:00
Daniel Thorn de71eac14d
DENG-476 - Create derived table for HCM clients dashboards (#4372)
so that they no longer need to depend on main summary
2023-10-02 14:41:12 -07:00
akkomar e3208aeecc
FXA-6721 Setup import of accounts table from FxA stage CloudSQL (#4327)
* FXA-6721 Setup import of accounts table from FxA stage CloudSQL

* Fix typo
2023-09-22 15:27:07 +02:00
Curtis Morales d2052cff47
RS-587 Add macroeconomic_indices table (#4323)
* Add macroeconomic_indices table

* Fix schema

* Remove empty clustering entry
2023-09-20 11:55:09 -04:00
kik-kik d4a8bf927e
updated bqetl_kpis_shredder DAG description to indicate why it is paused (#4280) 2023-09-12 15:15:23 +02:00
Leli c68587a7fc
DENG-797 change airflow DAG schedule (#4267) 2023-09-08 17:53:07 +02:00
Lucia e3013e8d5c
DS-2947. Update DAG bqetl_defaul description to remove unavailable option. (#4220)
Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
2023-08-31 12:15:15 +02:00
Lucia 27262acdfd
Default DAG for bqetl queries (#4143)
* DENG-1314 Implement changes to bqetl and create default DAG.

* DENG-1314. Update Documentation.

* DENG-1314. Dummy query to enable generating DAG and run tests.

* DENG-1314. Update tests.

* Update bigquery_etl/cli/query.py

Raise exception when scheduling information is missing.

Co-authored-by: Daniel Thorn <dthorn@mozilla.com>

* DENG-1314. Update tests.

* DS-3054. Update query creation to set bqetl_default as default value for --dag. Update tests.

* Default task and tests update.

* Default task and tests update.

* 3650 - Remove default DAG option, update DAG template comment & tests.

* 3650 - Condition for DAG warning.

* 3650 - Update docs.

* Clarification on sql/moz-fx-data-shared-prod/analysis/bqetl_default_task_v1/metadata.yaml

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* Update docs/cookbooks/creating_a_derived_dataset.md

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

---------

Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
Co-authored-by: Daniel Thorn <dthorn@mozilla.com>
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-08-29 14:32:52 +02:00
kik-kik bcb2a66ad8
chore(): updating DAG owner for kignasiak to kik to keep it consistent (#4178)
* updating DAG owner for kignasiak to kik to keep it consistent

* regenerated dags
2023-08-14 16:34:12 +02:00
Lucia d0d67d5592
Shredder prototype (#3977)
* Setup query and metadata templates to generate the queries for active_users_aggregates_deletion_request tables.

* Separate the active_users KPIs from the calculation based on deletion requests.

* Move mobile view outside of the browsers loop.

* Update DAG, move partition_date to the end of the query.

* Update DAG, move partition_date to the end of the query, change from parameters in the metadata to filter dates in the query.

* Revert change to fenix_derived.

* Remove 7-day window update and update tests. Fix indentation.

* Set query parameters using jija template.

* Set query parameters using jija template.

* Using parameters in the query.

* Formatting.

* Update sql_generators/active_users_deletion_requests/templates/mobile_deletion_request_query.sql

Co-authored-by: Daniel Thorn <dthorn@mozilla.com>

* Formatting.

* Search table's date filter.

* Query only searches for clients with deletion request. Add required filter on submission_timestamp for table deletion_request.

* Format

---------

Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
Co-authored-by: Daniel Thorn <dthorn@mozilla.com>
2023-07-31 19:47:15 +02:00
kik-kik e9b75e4734
bug(1844385): bqetl_acoustic_contact_export contacts_raw_v1 failing due to invalid schema update error. (#4142)
* ensuring double_opt_in field is of type INTEGER and reduced the number of retries for this job

* rerun dag creation

* change retry count in metadata and rerun dag creation

* changed the number of retries for acoustic DAGs to 1

---------

Co-authored-by: Leli Schiestl <lschiestl@mozilla.com>
2023-07-31 18:12:09 +02:00
Marlene Hirose 1592e337cf
take rbaffourawuah@mozilla.com off of email list for DAG (#4034)
* take rbaffourawuah@mozilla.com off of email list for DAG

* remove rbaffourawuah@mozilla.com from dags.yaml

* fix formatting and remove another instance of rbaffourawuah@mozilla.com from DAG

* fix order of emails

* modify name to be more explicity

* rename view and folder

* update view name

* update view location

* update view name
2023-07-07 22:25:44 -07:00
Marlene Hirose d8ce0307a5
Ds 2944 create external adjust table redux (#3907)
* add metadata, schema yamls and query.py

* created adjust_derived namespace

* add query.py, metadata, schema, dataset for testing

* delete extraneous file, update DAG name

* modify bqetl_adjust DAG redux

* update DAG name, take out '_derived'

* update table name in view

* standardize table names across files

* regenerate DAG

* update schema in both locations

* add query.py, metadata, schema yaml files

* take put extraneous print statements, update datasets to be 'adjust' or 'adjust_derived'

* add submission date to date_partition_parameter

* update table name to be just one table

* add DAG for adjust_derived

* add bq_etl adjust_derived DAG to yaml file

* add note about API token

* revert changes to bqetl.adjust.py

* use proper tast_id

* fix start dates

* add python command and docker image

* add python command and docker image

* delete extraneous code

* comment out docker part in old adjust dag

* add whitespace, delete extraneous code

* Update sql/moz-fx-data-shared-prod/adjust/adjust_derived/view.sql

Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com>

* Update sql/moz-fx-data-shared-prod/adjust_derived/adjust_derived_v1/query.py

Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com>

* Update sql/moz-fx-data-shared-prod/adjust_derived/adjust_derived_v1/query.py

Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>

* updated logic to check if response dictionary is not empty, moved view out of nested folder, added token ownership statement to metadata file, turned off email retry in dags.yaml, separated out clean up of json to its own function

* take out extraneous if statement and move else statement

* reorder where comment is to make more sense

* more description as to why we're using mhirose's API token

* take out periods

* Update sql/moz-fx-data-shared-prod/adjust_derived/adjust_derived_v1/metadata.yaml

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* combine adjust DAGs

* change logic for query_export check loop continuance, adapt metadata.yamls

* add blank parameters test

* Update sql/moz-fx-data-shared-prod/adjust_derived/adjust_derived_v1/metadata.yaml

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

* add arguments to metadata.yaml

* remove external table reference

* refactor to add date parameter

* refactor based on Circle CI's advice

* Update sql/moz-fx-data-shared-prod/adjust_derived/adjust_derived_v1/query.py

Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>

* Update sql/moz-fx-data-shared-prod/adjust_derived/adjust_derived_v1/query.py

Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>

* take out TODO comment

---------

Co-authored-by: kik-kik <kignasiak@mozilla.com>
Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com>
Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-07-07 15:32:03 -07:00
Frank Bertsch 8d08cec820
Copy App Store tables from Fivetran (#4030)
* Copy App Store tables from Fivetran

* Move dryrun to config

* Generate DAG

* Dont dryrun views

* Add schemas
2023-07-07 14:24:03 -04:00
kik-kik 97d34d46be
reassigning DAG ownership to kik as per "adopt a DAG" doc (#3862)
* reassigning DAG owners as per "adopt a DAG" doc

* removed jeff from bqetl_addons queries
2023-05-31 11:49:11 +02:00
Anna Scholtz b6bcc5143f
Reassign DAG and query ownerships to ascholtz and anicholson (#3867) 2023-05-30 10:05:40 -07:00
kik-kik 46daa24670
feat(DENG-789): making apple ads data accessible (#3847)
* added apple_ads_derived for copying over apple_ad data from the fivetran dataset, and apple_ads views now read from it

* added bqetl_fivetran_apple_ads.py DAG responsible for copying apple_ads data from the fivetran project over to moz-fx-data-shared-prod

* now dryrun skips apple_ads_derived instead of apple_ads as the query now accesses restricted dataset

* added schema files for apple_ads_derived datasets

* added descriptions to schema.yaml files for apple_ads_derived namespace

* added dataset_metadata for apple_ads_derived to include a link to the dbt transformations

* fixed apple_ads view definitions

* removed application label and referenced_tables section inside metadata.yaml for apple_ads as requsted by srose in PR#3847

* corrected source project for apple_ads views

* renamed apple_ads_derived to apple_ads_external

* added * to apple_ads_external namespace name to skip in the dryrun due to integration test deployment

* made tweaks to apple_ads and apple_ads_external datasets/namespaces as requested by whd

* updated apple_ads_external skip rule to the way it is meant to be defined, this will work once a fix is rolled out for dryrun

* fixed dag bqetl_fivetran_apple_ads description and updated the schedule to run once a day
2023-05-26 17:32:19 +02:00
Frank Bertsch 6146d9cdcf
DENG-871 - Add installs_by_country for Fenix (#3767)
* Add installs_by_country for Fenix

* Dont dryrun the query

* Dont dryrun view

* Add chanel to query
2023-05-08 22:24:10 -04:00
richard baffour e5aede7947
Firefox mobile installs from Adjust (#3735) 2023-05-03 21:07:16 -07:00
Glenda Leonard a31072d408
DENG-775 downloads_with_attribution_v2 (#3716)
* DENG-775 Added session_id to JOIN between GA data and stub_attr.stdout.  Also expanded date range on GA session data to [download_date - 2 days, download_date + 1 day]

* Updated query to handle missing GA download_session_id.  It effectively applies V1 logic to the MISSING_GA_CLIENT dl_tokens.
2023-04-26 14:47:34 -04:00
wil stuckey 034e7d8426
Add support for `table_partition_template` in dag task generation (#3710)
* Update domain metadata dag.

* Remove from triage with tags
* Remove telemetry-alerts email
* Add date formatting for monthly partition id

* Add support for `table_partition_format` in dag generation

* Don't add partition format if there's already a destination table

* use the correct name

* Add partition templates for all time partitioning types

* lint fixes

* more docs

* update all dags to include `table_partition_template` parameter.

* don't set if we have a partition offset

* don't add the parameter for the default 'day' partitioning scheme
2023-04-12 10:28:19 -05:00
Anna Scholtz 4e98a40b1e
[Bug 1826618] bqetl status checks (#3706)
* Add check for running SQL query

* Add check for monitoring python script runs
2023-04-05 15:24:09 -07:00
Claas Augner 9964679e48
fix: replace MDN contact email address (#3678)
`mdn@mozilla.com` does not actually exist.
2023-03-27 15:02:25 -04:00
Sean Rose 77eadac067
Retry `bqetl_pocket` DAG tasks for 10 hours rather than 1 hour. (#3635)
Because the files from Pocket may not always be available on time (e.g. bug 1818043).
2023-03-03 15:00:19 -08:00
kik-kik 636cf6bf25
feat(MP-267): Move popularities export to Airflow (#3598)
* added mdn_yari_derived namespace along with mdn_popularities_v1 query to support mdn_popularities DAG inside telemetry-airflow

* added query.py as an alternative to exporting the data

* removed query.sql for mdn_yari.mdn_popularities_v1

* Updated query.py for mdn_popularities_v1 to move the blob to target location and clean up after

* made changes as requested in PR#3598 by akkomar
2023-02-23 10:38:12 +01:00