Граф коммитов

39 Коммитов

Автор SHA1 Сообщение Дата
Anthony Miyaguchi a1d3623542
Merge pull request #2 from mozilla/dependabot/pip/bleach-3.3.0
Bump bleach from 3.2.3 to 3.3.0
2021-02-02 16:27:36 -08:00
dependabot[bot] a073283be8
Bump bleach from 3.2.3 to 3.3.0
Bumps [bleach](https://github.com/mozilla/bleach) from 3.2.3 to 3.3.0.
- [Release notes](https://github.com/mozilla/bleach/releases)
- [Changelog](https://github.com/mozilla/bleach/blob/master/CHANGES)
- [Commits](https://github.com/mozilla/bleach/compare/v3.2.3...v3.3.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-02-02 23:17:31 +00:00
Anthony Miyaguchi 3e414c6d6f
Merge pull request #1 from mozilla/docker
Add dockerfile for project
2021-02-01 10:05:54 -08:00
Anthony Miyaguchi ef028e7e63 Export POSTGRES variables instead of running from within container 2021-01-28 17:16:10 -08:00
Anthony Miyaguchi 487e3c2896 Update README 2021-01-28 17:10:32 -08:00
Anthony Miyaguchi c509e2a5f1 Use usr/bin/env and add py extension back for spark-submit 2021-01-28 17:10:01 -08:00
Anthony Miyaguchi 06d721dd44 Merge scripts into bin 2021-01-28 16:58:54 -08:00
Anthony Miyaguchi 8c731a6288 Upgrade to pyspark 3 2021-01-28 16:48:24 -08:00
Anthony Miyaguchi 4884bcb46f Fix issues with python3 for spark, POSTGRES_HOST location 2021-01-28 16:47:48 -08:00
Anthony Miyaguchi 4dd3769abf Force start_ds and end_ds to be configured by the job 2021-01-28 16:09:08 -08:00
Anthony Miyaguchi f0b31ffa6c Add initial dockerfile and docker-compose 2021-01-28 16:08:50 -08:00
Anthony Miyaguchi ae0ca52ed7 Add script to export environment variables for aws access 2021-01-28 16:08:15 -08:00
Anthony Miyaguchi 716e3790e4 Add .env file with variables 2021-01-28 15:13:12 -08:00
Anthony Miyaguchi 4347ead942 Split dev and normal requirements 2021-01-28 15:12:51 -08:00
Anthony Miyaguchi b03a99815b Add updated dependencies using pip-tools 2021-01-28 15:07:38 -08:00
Anthony Miyaguchi be82395e18 Add separate script for loading aggregates into BigQuery 2020-03-16 16:25:19 -07:00
Anthony Miyaguchi ef3d082a9b Update README 2020-03-13 15:52:27 -07:00
Anthony Miyaguchi fda5aea0bf Use rm -rf in pg_dump_by_day 2020-03-12 18:27:00 -07:00
Anthony Miyaguchi 013824acab Skip missing days 2020-03-12 17:30:58 -07:00
Anthony Miyaguchi 73d883c790 Wrap rm with if 2020-03-12 17:25:41 -07:00
Anthony Miyaguchi aec03d9772 Remove unused lines 2020-03-12 17:07:44 -07:00
Anthony Miyaguchi a848b78ed9 Make start and end dates configurable 2020-03-12 17:04:20 -07:00
Anthony Miyaguchi d66bb6886f Skip BQ load by default 2020-03-11 12:44:31 -07:00
Anthony Miyaguchi e6ccf29c9f Add project to table qualifier 2020-03-11 12:43:51 -07:00
Anthony Miyaguchi 4a0c6e6d47 Add backfill process for a single week 2020-02-27 17:17:15 -08:00
Anthony Miyaguchi f9f653bec4 Add ingest_date to output parquet since aggregates change over time 2020-02-27 15:59:45 -08:00
Anthony Miyaguchi 46530e7138 Remove files if dump already exists locally 2020-02-27 15:29:08 -08:00
Anthony Miyaguchi ccb77ecd46 Add simple script to show data 2020-02-27 15:19:52 -08:00
Anthony Miyaguchi 1f5ab907c4 Fix build_ids 2020-02-27 15:19:35 -08:00
Anthony Miyaguchi f1de856eb7 Update notebook for converting into parquet 2020-02-27 15:14:30 -08:00
Anthony Miyaguchi 81c5892bf1 Update parquet processing for string aggregate 2020-02-27 14:52:42 -08:00
Anthony Miyaguchi 7863789eb9 Add functional backfill script 2020-02-27 13:48:03 -08:00
Anthony Miyaguchi 9045b6bfb0 Create a backfill script 2020-02-27 13:12:53 -08:00
Anthony Miyaguchi bd5108b737 Add section for processing pg_dump into parquet 2019-12-17 16:15:58 -08:00
Anthony Miyaguchi a3091cd8dd Add a script to process pg_dump into parquet 2019-12-17 16:11:57 -08:00
Anthony Miyaguchi 849a211bd1 Update .gitignore 2019-12-17 16:11:10 -08:00
Anthony Miyaguchi 352aae7165 Add notebook for exploration into parsing aggregates 2019-12-17 14:43:06 -08:00
Anthony Miyaguchi cb341be9ce Add notebook for converting a day of dumps into parquet 2019-12-16 17:25:42 -08:00
Anthony Miyaguchi d6d0b1d055 Add initial scripts for dumping mozaggregator 2019-12-16 15:52:08 -08:00