Граф коммитов

148 Коммитов

Автор SHA1 Сообщение Дата
Anna Scholtz e7b2b56c01 KPI dashboard generated Airflow DAG 2020-05-28 14:12:24 -07:00
Anna Scholtz d5822b952d Generate error aggregates DAG 2020-05-28 14:12:24 -07:00
Anna Scholtz 368020f37d Generate Airflow code for tasks 2020-05-28 14:12:24 -07:00
Anna Scholtz cd3eda596c timedelta formatting 2020-05-28 14:12:24 -07:00
Anna Scholtz 7775558e4c Custom formatters for Jinja templates 2020-05-28 14:12:24 -07:00
Anna Scholtz 49c1dd981e DAG validation 2020-05-28 14:12:24 -07:00
Anna Scholtz 2a56ff5060 Improve DAG parsing 2020-05-28 14:12:24 -07:00
Anna Scholtz ac6cb9ddaa Create tasks with validation 2020-05-28 14:12:24 -07:00
Anna Scholtz 22f1c39b5b Error aggregates as scheduled query 2020-05-28 14:12:24 -07:00
Anna Scholtz 1116be16b6 Pull in telemetry-airflow 2020-05-28 14:12:24 -07:00
Anna Scholtz c4e65e362c Simplify template generation 2020-05-28 14:12:24 -07:00
Anna Scholtz 499810f1a2 Basic Airflow DAG generation 2020-05-28 14:12:24 -07:00
Anthony Miyaguchi 1f7c218729
[glam-etl] Use transpose logic for fenix extracts (#1011) 2020-05-28 11:57:55 -07:00
Anthony Miyaguchi 2e68006911
[glam-etl] Use static combinations and avoid double counting (#1002)
* Use compact notation for static combinations

* Avoid double counting by moving cubing into bucket counts

* Modify udf_merged_user_data for reuse outside of clients aggregates

* Duplicate rows from all combos by client id
2020-05-28 11:53:03 -07:00
Daniel Thorn 636a8fefae
Delete sync data in shredder (#1012) 2020-05-27 13:42:40 -07:00
Anthony Miyaguchi afe7732fc8
Bug 1639345 - Fix ping type using static name from table id (#999)
* Add literal for ping type and remove channel filter for fenix

* Add static ping-type and remove channel filter

* Add docstring for utility function
2020-05-21 13:12:52 -07:00
Anthony Miyaguchi a46197d268
Change naming convention of sql/glam_etl (#912)
* Rename all queries to use org_mozilla_fenix__

* Change all references within jobs to use new prefix

* Update run_glam_sql for new naming convention

* Rename probe counts and histogram counts

* Rename generate_fenix_sql to generate_sql

* Fix lint error and add new views to publish_view ignore

* Add updated clients daily queries

* Add consistent usage of header for better BigQuery job logs

* Add parameters for product to generate SQL for.

* Generate schemas in parallel

* Rename generate_sql to generate_glean_sql
2020-05-21 11:54:36 -07:00
Daniel Thorn a08fa06b69
update dependencies (#994)
also fix unregistered marker warnings
also fix new lint errors
2020-05-20 15:55:17 -07:00
Daniel Thorn 2faaf4f960
Automatically detect glean tables at runtime in shredder_delete (#990) 2020-05-20 13:39:48 -07:00
Daniel Thorn f2761a8c31
Remove unnecessary distinct from shredder delete query (#989) 2020-05-19 14:44:12 -07:00
Daniel Thorn d34bf1b63f
Add activity stream tables to shredder (#985) 2020-05-19 10:56:09 -07:00
Anna Scholtz 3259a127ea public-project-id argument in publish_json script 2020-05-13 11:53:58 -07:00
Anna Scholtz 33183abff2 files_uri field in public metadata 2020-05-12 13:09:08 -07:00
Anna Scholtz 6a9a0ee601 Write files metadata to /files 2020-05-12 13:09:08 -07:00
Anna Scholtz 2b81967ecd Show warning when 50 or more task dependencies 2020-05-11 14:12:32 -07:00
Anna Scholtz 1d914ca1e5 Update task tests 2020-05-11 14:12:32 -07:00
Anna Scholtz 9893c58cd0 Add integration test for determining multiple task dependencies 2020-05-11 14:12:32 -07:00
Anna Scholtz e5882d8058 Get local task dependencies 2020-05-11 14:12:32 -07:00
Frank Bertsch bd7be1606c
Bug 1632635 - FxA Amplitude export for active events (#941)
* WIP: Initial implementation of FxA Amplitude export

* Use submission_date parameter

* Add hmac-sha256 SQL implementation

* Escape language column name

Co-Authored-By: Jeff Klukas <jeff@klukas.net>

* Use hmac_sha256; update for review feedback

* Reformat sql files

* Add docs for HMAC implementation

* Validate hmac_sha256 against NIST test vectors

* Add filepath as from_text arg

Co-authored-by: Daniel Thorn <dthorn@mozilla.com>

* Explicitly use named argument

Co-authored-by: Daniel Thorn <dthorn@mozilla.com>

* Add docs for hmac validation

* WIP: Derive os_used_week/month as incremental query

* Retrieve hmac_key from encrypted keys table

Co-authored-by: Jeff Klukas <jeff@klukas.net>

* Remove fxa_hmac param

* Reformat SQL files

* Use bytes_seen for os_used

* Rename udfs

* Format UDF sql

* Don't include NULL os values

* Don't include NULL user properties

* Update comment for UDF

* Use fully-named datasets, not fxa*

* Cast key_id to bytes

* Fix failing tests

* Fix test failures

* Use new dataset for view query

* Add access denied exception for secret access

* Remove flake8 changes

* Update description of fxa_amplitude_export

Co-authored-by: Jeff Klukas <jeff@klukas.net>

* Remove version suffix from view

Co-authored-by: Jeff Klukas <jeff@klukas.net>
Co-authored-by: Daniel Thorn <dthorn@mozilla.com>
2020-05-05 22:37:26 -04:00
Daniel Thorn bba858ad31
Skip NotFound tables in shredder (#951)
Also remove deleted tables from config

Also add --partition-limit option for faster --dry-run
2020-05-04 12:20:44 -07:00
Anna Scholtz 6b3aa3969a Fix gzip content-encoding 2020-05-04 08:57:37 -07:00
Anna Scholtz 2e7b92e797 Update tests and remove gz ending 2020-05-01 10:48:01 -07:00
Anna Scholtz daebf6d1e7 Remove .gz from exported JSON files 2020-05-01 10:48:01 -07:00
Anna Scholtz c63b185bb9 Set content-type for public data to json and encoding to gzip 2020-05-01 10:48:01 -07:00
Daniel Thorn 17310d06f0
Add script for reporting shredder cost (#936) 2020-04-30 10:23:07 -07:00
Anna Scholtz 74b119a826 Metadata validation refactoring 2020-04-29 14:18:53 -07:00
Anna Scholtz e7bd2fb50c Move all metadata related scripts to a metadata directory 2020-04-29 14:18:53 -07:00
Anna Scholtz 37266e08b2 Add metadata validation tests 2020-04-29 14:18:53 -07:00
Anna Scholtz f49b7920d2 Script for validating metadata files 2020-04-29 14:18:53 -07:00
Anna Scholtz 1d5f4d4b91 Add tests for parsing scheduling configs 2020-04-22 13:48:04 -07:00
Anna Scholtz 25ad61d703 Rename Dags to DagCollection 2020-04-22 13:48:04 -07:00
Anna Scholtz c46b8c800d Add docs for scheduled queries parsing 2020-04-22 13:48:04 -07:00
Anna Scholtz 23c173d6ef Move Dags to separate file 2020-04-22 13:48:04 -07:00
Anna Scholtz 082a0202ab Test DAG config parsing 2020-04-22 13:48:04 -07:00
Anna Scholtz 69b98e3df0 Script for generating Airflow DAGs 2020-04-22 13:48:04 -07:00
Anna Scholtz e7c9162888 Set up DAGs and tasks configs 2020-04-22 13:48:04 -07:00
Anna Scholtz c700e5f8e3 Fix code comment 2020-04-17 09:59:54 -07:00
Anna Scholtz 95b3f71eee last_updated timestamp as JSON 2020-04-17 09:54:16 -07:00
Anna Scholtz 257f95e6b4 Set content type for last_updated files 2020-04-17 09:54:16 -07:00
Anna Scholtz e17a089672 Public endpoint URL 2020-04-16 17:08:55 -07:00