Граф коммитов

19 Коммитов

Автор SHA1 Сообщение Дата
Victor Ng e41df7e1f1 pile-o logging and increased the sample size by 100x 2018-02-06 09:41:03 -05:00
Victor Ng 9dbcecb462 added extra JSON serialization safeties 2018-02-06 09:27:51 -05:00
Victor Ng 4648c6cead Added extra check to filter out empty string client_id values.
Drop k/v pairs in the JSON blob where keys have empty values.
2018-02-02 00:37:42 -05:00
Victor Ng 4eac02a6ca added a Makefile so I don't have to remember how to upload to PyPI 2018-02-01 21:19:04 -05:00
Victor Ng c5a8842901 removed the call to push boto3 to spark workers 2018-02-01 21:16:46 -05:00
Victor Ng 477846f9c9 added more metadata to support PyPI 2018-02-01 21:14:13 -05:00
Victor Ng 82215f1b64 added readme 2018-02-01 20:57:27 -05:00
Victor Ng c9fee0acf7 dropped dead code module 2018-02-01 20:55:58 -05:00
Victor Ng ab5a82c427 Added lots of docstirngs to make it clear what is going on. 2018-02-01 20:53:52 -05:00
Victor Ng 4669884b51 Added the dynamo_reducer to the last stage of processing of the RDD. 2018-02-01 20:53:38 -05:00
Victor Ng 7a9948923b Removed the unnecessary `load_parquet` closure function and just inlined the relevant template code into the etl function. 2018-02-01 20:52:04 -05:00
Victor Ng 07a9772d8a Dropped reducer function in taar_dynamo as it's been moved into the filters submodule. 2018-02-01 20:40:18 -05:00
Victor Ng 9967f9a96f Added a force_write optional argument to the dynamo_reducer.
Pulled out some in-function import statements to the module top level.
2018-02-01 20:37:23 -05:00
Victor Ng 88fb87d6d1 updated airflow_job.py from atmo testing 2018-02-01 20:36:56 -05:00
Victor Ng 0b644246c0 more minor patches 2018-02-01 17:06:45 -05:00
Victor Ng 4f79867117 More refactoring of the dynamo loaders. 2018-02-01 11:39:23 -05:00
Victor Ng e0d6e329fb Refactored code so that loading package into spark nodes is possible. 2018-01-30 14:38:33 -05:00
Victor Ng 0d4d34cd6a added wheel dependencies for argparse and boto3 so that we can safely
load this code on spark nodes
2018-01-30 14:38:19 -05:00
Victor Ng 35dbeacf6e init commit.
PySpark has some edge cases where sparkContext.addPyFile() doesn't
properly push code to the spark nodes.  This package has been created so
that the entire codebase can be pushed to all nodes.
2018-01-30 12:38:39 -05:00