distribution-viewer

История

Rob Hudson e9c588eda0 Fix #228 - Add name field to `DataSet`		2017-03-29 14:42:18 -04:00
..
Makefile	Add aggregation and import script	2016-09-08 15:21:30 -07:00
README	Fix #175 : Bail early to avoid duplicate imports	2017-03-08 14:45:24 -08:00
aggregate-and-import.py	Fix #228 - Add name field to `DataSet`	2017-03-29 14:42:18 -04:00

README

Testing on ATMO
===============

By default any Python files that are executed are run with a Jupyter driver, so
the following environment variables need to be set (or unset) to run standalone:

  export PYSPARK_DRIVER_PYTHON=/mnt/anaconda2/bin/python
  unset PYSPARK_DRIVER_PYTHON_OPTS

Secure copy ('scp') the Python file to the host machine.

Next you can submit the Python file to Spark, with the specified arguments to
more closely match how Airflow will execute jobs:

  spark-submit --executor-cores 8 --master yarn --deploy-mode client "./aggregate-and-import.py"