Граф коммитов

478 Коммитов

Автор SHA1 Сообщение Дата
Anthony Miyaguchi 2dab9925b9 Add output of nbconvert to python on churn 2017-05-01 11:55:33 -07:00
Anthony Miyaguchi d369a7b1b8 Mock SparkSession.stop to prevent test failures 2017-05-01 11:55:33 -07:00
Anthony Miyaguchi b70534108f Use relative import for schemas 2017-05-01 10:24:01 -07:00
Anthony Miyaguchi 7075fdf5e3 Bug 1357875 - Remove report_start and rename `v4` to `topline` 2017-05-01 10:24:01 -07:00
Anthony Miyaguchi 6bfdf687e2 Bug 1357875 - Add test case for existing s3 key 2017-05-01 10:24:01 -07:00
Anthony Miyaguchi 4a11cc8d99 Mock SparkSession.stop to prevent test failures 2017-05-01 10:24:01 -07:00
Anthony Miyaguchi 93bf1087fa Stop the spark sesssion after completion 2017-05-01 10:24:01 -07:00
Anthony Miyaguchi 0622791ba4 Bug 1357875 - Implement reformat and pass tests 2017-05-01 10:24:01 -07:00
Anthony Miyaguchi ef153bf858 Bug 1357875 - Add `topline_dashboard` to python_etl
This is the initial commit of the topline dashboard that outlines the
general approach to moving the data around. The data will simply be a
union of historical data and the slightly reformatted topline data. This
uses moto for mocking boto3.
2017-05-01 10:24:01 -07:00
Anthony Miyaguchi 4c8658cf77 Move schemas into topline module namespace 2017-05-01 10:24:01 -07:00
Ryan Harter 78063a4296 Add 'pageRequestCount' to containers ETL job 2017-05-01 11:09:58 -04:00
Anthony Miyaguchi 4f94942139 Bug 1359193 - Address changes requested for review
This adds comments for testing setup and renames a test. This also makes
function names more indicative of their function and changes the
partition column `report_start` to be read as a string.
2017-04-26 15:25:23 -07:00
Anthony Miyaguchi 21b9972e0a Bug 1359193 - Add comments to implementation 2017-04-26 15:25:23 -07:00
Anthony Miyaguchi cfefed7d10 Bug 1359193 - Add output schema and backfill implementation 2017-04-26 15:25:23 -07:00
Anthony Miyaguchi 3544d29f8b Bug 1359193 - Add minor fixes to tests and rebase against mozetl 2017-04-26 15:25:23 -07:00
Anthony Miyaguchi 5ac0e505fa Bug 1359193 - Add skeleton and tests for topline historical backfill 2017-04-26 15:25:23 -07:00
Ryan Harter 4c9aee60d6 Stop SparkContext after finishing job 2017-04-25 18:00:41 +00:00
Jannis Leidel bad1e83ca9 Rename python_etl to mozetl.
- move Spark test fixture into pytest conftest.py module
- rename basic_etl to just basic
2017-04-25 12:51:27 -04:00
Jannis Leidel 3ab6a9890b Various code smell and small speed improvments.
Uses operator.itemgetter, iteritems, new style classes.
Moves some functions outside of their user functions to not have to create the inner function on every call.
2017-04-25 12:51:27 -04:00
Ryan Harter a0aec9d3fa Add simple wrapper to pulse etl job
This allows us to schedule the job from Airflow. Later today I plan on
moving this code into the package and generalizing with a click CLI
wrapper.
2017-04-24 14:52:24 -04:00
Ryan Harter d440476c07 Add testpilot ETL jobs and basic_etl library
These jobs are currently running on ATMO. We'd like to get these jobs
reviewed and running on Airflow.

basic_etl was previously in it's own repository
[here](https://github.com/harterrt/betl) but will now be maintained in
this repository.
2017-04-20 10:20:38 -04:00
Ryan Harter c3da6440f2 Add .tox/ and .coverage to gitignore 2017-04-18 15:24:47 -07:00
Ryan Harter 4fdad9e499 Merge pull request #6 from acmiyaguchi/travis
Add .travis.yml and tox.ini for automated testing
2017-04-13 16:50:33 -04:00
Anthony Miyaguchi 394d32b2ef Add .travis.yml and tox.ini for automated testing
This uses tox for setting up the environment and test-runner. The
dependencies for this are added to setup.py if you want to run pytest
manually with your own virtual environment, while tox is used for
automatically setting up the environment.

This will run tests with dependencies on spark on travis, and then send
the results to codecov.io.
2017-04-12 11:07:51 -07:00
Ryan Harter df950b5fd7 Merge pull request #2 from acmiyaguchi/fix-test
Use SparkSession instead of SparkContext
2017-04-10 18:36:01 -04:00
Anthony Miyaguchi 3308694663 Use SparkSession instead of SparkContext 2017-04-06 17:49:40 -07:00
Ryan Harter bca1326243 Add gitignore and update README 2017-03-15 15:18:38 -04:00
Ryan Harter d6261d4c98 Build boilerplate from cookiecutter-python-etl 2017-03-15 15:14:05 -04:00