* take rbaffourawuah@mozilla.com off of email list for DAG
* remove rbaffourawuah@mozilla.com from dags.yaml
* fix formatting and remove another instance of rbaffourawuah@mozilla.com from DAG
* fix order of emails
* modify name to be more explicity
* rename view and folder
* update view name
* update view location
* update view name
* add metadata, schema yamls and query.py
* created adjust_derived namespace
* add query.py, metadata, schema, dataset for testing
* delete extraneous file, update DAG name
* modify bqetl_adjust DAG redux
* update DAG name, take out '_derived'
* update table name in view
* standardize table names across files
* regenerate DAG
* update schema in both locations
* add query.py, metadata, schema yaml files
* take put extraneous print statements, update datasets to be 'adjust' or 'adjust_derived'
* add submission date to date_partition_parameter
* update table name to be just one table
* add DAG for adjust_derived
* add bq_etl adjust_derived DAG to yaml file
* add note about API token
* revert changes to bqetl.adjust.py
* use proper tast_id
* fix start dates
* add python command and docker image
* add python command and docker image
* delete extraneous code
* comment out docker part in old adjust dag
* add whitespace, delete extraneous code
* Update sql/moz-fx-data-shared-prod/adjust/adjust_derived/view.sql
Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com>
* Update sql/moz-fx-data-shared-prod/adjust_derived/adjust_derived_v1/query.py
Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com>
* Update sql/moz-fx-data-shared-prod/adjust_derived/adjust_derived_v1/query.py
Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>
* updated logic to check if response dictionary is not empty, moved view out of nested folder, added token ownership statement to metadata file, turned off email retry in dags.yaml, separated out clean up of json to its own function
* take out extraneous if statement and move else statement
* reorder where comment is to make more sense
* more description as to why we're using mhirose's API token
* take out periods
* Update sql/moz-fx-data-shared-prod/adjust_derived/adjust_derived_v1/metadata.yaml
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
* combine adjust DAGs
* change logic for query_export check loop continuance, adapt metadata.yamls
* add blank parameters test
* Update sql/moz-fx-data-shared-prod/adjust_derived/adjust_derived_v1/metadata.yaml
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
* add arguments to metadata.yaml
* remove external table reference
* refactor to add date parameter
* refactor based on Circle CI's advice
* Update sql/moz-fx-data-shared-prod/adjust_derived/adjust_derived_v1/query.py
Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>
* Update sql/moz-fx-data-shared-prod/adjust_derived/adjust_derived_v1/query.py
Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>
* take out TODO comment
---------
Co-authored-by: kik-kik <kignasiak@mozilla.com>
Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com>
Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
* added apple_ads_derived for copying over apple_ad data from the fivetran dataset, and apple_ads views now read from it
* added bqetl_fivetran_apple_ads.py DAG responsible for copying apple_ads data from the fivetran project over to moz-fx-data-shared-prod
* now dryrun skips apple_ads_derived instead of apple_ads as the query now accesses restricted dataset
* added schema files for apple_ads_derived datasets
* added descriptions to schema.yaml files for apple_ads_derived namespace
* added dataset_metadata for apple_ads_derived to include a link to the dbt transformations
* fixed apple_ads view definitions
* removed application label and referenced_tables section inside metadata.yaml for apple_ads as requsted by srose in PR#3847
* corrected source project for apple_ads views
* renamed apple_ads_derived to apple_ads_external
* added * to apple_ads_external namespace name to skip in the dryrun due to integration test deployment
* made tweaks to apple_ads and apple_ads_external datasets/namespaces as requested by whd
* updated apple_ads_external skip rule to the way it is meant to be defined, this will work once a fix is rolled out for dryrun
* fixed dag bqetl_fivetran_apple_ads description and updated the schedule to run once a day
* DENG-775 Added session_id to JOIN between GA data and stub_attr.stdout. Also expanded date range on GA session data to [download_date - 2 days, download_date + 1 day]
* Updated query to handle missing GA download_session_id. It effectively applies V1 logic to the MISSING_GA_CLIENT dl_tokens.
* Update domain metadata dag.
* Remove from triage with tags
* Remove telemetry-alerts email
* Add date formatting for monthly partition id
* Add support for `table_partition_format` in dag generation
* Don't add partition format if there's already a destination table
* use the correct name
* Add partition templates for all time partitioning types
* lint fixes
* more docs
* update all dags to include `table_partition_template` parameter.
* don't set if we have a partition offset
* don't add the parameter for the default 'day' partitioning scheme
* added mdn_yari_derived namespace along with mdn_popularities_v1 query to support mdn_popularities DAG inside telemetry-airflow
* added query.py as an alternative to exporting the data
* removed query.sql for mdn_yari.mdn_popularities_v1
* Updated query.py for mdn_popularities_v1 to move the blob to target location and clean up after
* made changes as requested in PR#3598 by akkomar
* Add and incrementally populate a table for google ads campaign cost metrics
* Register dag in dags.yaml
* make two strings match that apparently have to match
* Consistentify another thing in the dag
* Reformat a sql file
* Update sql/moz-fx-data-shared-prod/fenix_derived/google_ads_campaign_cost_breakdowns_v1/metadata.yaml
Add update dependency on upstream table
Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com>
* Small adjustments apropos of review
* Attempt to fix verify-dags-up-to-date
* Evidently there are too many blank lines in the metadata file
* Stop dry running an access denied table
Also raise the line length limit on the linter, which otherwise prevented the needed change
* represent micros more accurately
* Update dags.yaml to include Frank as maintainer
Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>
* rename ad_clicks
* Update sql/moz-fx-data-shared-prod/fenix_derived/google_ads_campaign_cost_breakdowns_v1/query.sql
Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>
* Add conversions to columns of output table
* Group stats table by campaign id and date
* Add revenue generating ad clicks and disambiguate that number from marketing ad clicks
* SO IT TURNS OUT, campaigns with the same ID can have different names over time. This associates the appropriate names with each id so we can sum up metrics by campaign rather than by NAME of campaign.
* ./bqetl format /Users/chelseatroy/mozilla/bigquery-etl/sql/moz-fx-data-shared-prod/fenix_derived/google_ads_campaign_cost_breakdowns_v1/query.sql
* Document the fenix campaign identifier
* ./bqetl format
* ./bqetl dag generate bqetl_campaign_cost_breakdowns
* ./bqetl dag generate bqetl_org_mozilla_firefox_derived
Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com>
Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>
* Create table, view, DAG for Firefox Android Clients.
* Normalization of adjust_network and install_source in the view, typo. Move dataset from telemetry_derived to fenix_derived.
* DENG-178_ Update logic to use fenix.first_session instead of org_mozilla_firefox, consider pings received after client's first seen date. Use first_run_date as datetime for comparison with metrics datetimes. Remove condition of metrics ping without adjust data to differentiate when a ping is not received and when is empty. Health check date types. Update clustering as adjust data is more likely to be null. Find deterministic values for fisr_session ping data. Remove uniqueness validation of clients in different metrics pings. Create UDF for the logic of finding the first adjust (value,datetime) pair.
* DENG-178_ Refactor query for readibility and consistency with existing datasets.
* Update clustering and description in metadata. Collect only `first seen` clients & metrics ping data for channel release. Update LEAST for COALESCE to avoid NULL returned. Collect core dimensions from `baseline_clients_first_seen` for data completeness when first_session ping is not reported.
* Delete schema file.
* Reducing duplicated logic, readability. Adding sample_id to the table and the clustering.
* Compare to find first value also for first_seen_date,submission_date, first_run_date, first_reported_country, first_reported_isp and channel.
* DAG partition parameter NULL. FUll outer join in init to collect ping's data for not yet first_seen clients.
Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
* Create clients_yearly table for Fenix clients
This table contains a year of history for Fenix.
It utilizes similar logic to search.search_clients_last_seen,
but the naming specifies the year-long byte field.
Currently only looks at Fenix data, which migrated
~2020-09-01; so full data is only available beginning 2021-09-01.
* Add schema for baseline_clients_daily
* Move to referenced tables
* Remove duplicates from baseline_clients_daily
* Add dependency on baseline_clients_daily
* sponsored_tiles_clients_daily
Client-level code for desktop + mobile. Issue with UNION ALL for desktop and mobile tables as the experiments structure is different between the devices
* Delete .pre-commit-config.yaml
* fix: experiments
* initial commit for sponsored tiles clients daily
* Add submission_date
* Clean up dag imports
* Revert import change, add dag to main summary
* Nan's edits
Co-authored-by: Rebecca BurWei <rburwei@mozilla.com>
Co-authored-by: Perry McManis <pmcmanis@mozilla.com>
Co-authored-by: Wil Stuckey <wstuckey@mozilla.com>
Co-authored-by: Curtis Morales <cmorales@mozilla.com>
* New dataset for collecting domain metadata and new top_domains table
* f lint
* Add dag, schema and update to monthly schedule
* lint
* Add required tag and partition column
* format fix
* Explicit column names in the final query
* Fix schema issues
* added sql logic for monitoring_airflow
* bqetl_monitoring_airflow added to dags.yaml
* added .probe_cache/ to gitignore
* generated bqetl_monitoring_airflow dag
* added monitoring_airflow_derived to dryrun ignore as it fails to access fivetran data referenced in this dataset
* moved airflow views and sql files to monitoring dataset
* manually triggering fivetran load of airflow metadata as suggested by @ascholtzan in PR#3204
* added schemas as requested by @scholtzan in PR#3204
* added descriptions to airflow_monitoring datasets
* fixed airflow_dag view folder name
* corrected dryrun ignore for monitoring_derived/airflow*
* Set owners for Contextual Services tasks
I made some guesses here about who would make the most sense to own individual
pieces; treat this as a starting point and we can discuss in PR comments to
get this to final state.
* Generate DAGs after rebase
* Respond to review comments
Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>
Co-authored-by: whd <whd@users.noreply.github.com>
* decision support metrics for fog initial commit
* reverting overwrite
* init.sql to initialize table
* changing to main_1pct for the time being
* init sql correction
* missing as keyword
* updating dag order
* trying main 1pct
* removed init.sql, added schema.yaml
* update dag to reflect different table wait
* re-generated DAG
* fix tag
* fix tags in dag
* no clustering
* Bug 1768419 - Add note to subplat dag docs for airflow triage
* enriched the bqetl_subplat DAG description with a bit more detail regarding expected failures
Co-authored-by: kik-kik <kignasiak01@gmail.com>
* Create derived dataset and DAG for the aggregation of active users.
* Change APROX_DISTINCT_COUNT for COUNT(DISTINCT)
* Remove join with country lookup to avoid dependencies. Added search measures. Reorder fields for clarity.
* Improve descriptions
* Add clustering by channel
* DAG update
* Clustering based on users most common filtering
* Add notification for analytics DAG
* agg_active_users to query from telemetry_derived.
* Add query to create agg_active_users
* Update sql/moz-fx-data-shared-prod/telemetry_derived/agg_active_users_v1/init.sql
Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>
* Add date to init.sql
* Update sql/moz-fx-data-shared-prod/telemetry_derived/agg_active_users_v1/init.sql
Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>
* Update sql/mozfun/bytes/zero_right/metadata.yaml
Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>
* Modify metadata.yaml for agg_active_users
* Update sql/moz-fx-data-shared-prod/telemetry_derived/agg_active_users_v1/query.sql
Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>
* Modify metadata.yaml for agg_active_users
* Remove CAST from init.sql for agg_active_users
* Update sql/moz-fx-data-shared-prod/telemetry_derived/agg_active_users_v1/init.sql
* Update project in ini.sql to `moz-fx-data-shared-prod`.
* Update name of aggregation and DAGS for consistency
* Add active_users_aggregates_v1 to align with current naming convention
* Update to active_users_aggregates
* Update query for active_users_aggregates
* Add uri_count and active_hours query for active_users_aggregates
* Format query, update DAG to remove agg_active_users_v1
* Generate DAG to correct CI error.
Co-authored-by: Lucia Vargas <lvargas@mozilla.com>
Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>
* Bug 1753489 Remove ETL for Firefox Reality
Since this product is no longer maintained by Mozilla.
See https://bugzilla.mozilla.org/show_bug.cgi?id=1753489
This is just the first cleanup step. We can remove these datasets and all
content once this PR is merged. But the live/stable tables will require
a separate effort.