adjusting notebook folder names per comments, removing gpu references, adding pyspark troubleshooting, cleaning up benchmarking

2018-12-05 08:23:45 -05:00 · 2018-12-05 08:23:45 -05:00 · e4f90fa9d3
--- a/README.md
+++ b/README.md
@ -1,8 +1,8 @@
 # Recommenders 

 This repository provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on four key tasks: 
-1. [Data Prep](notebooks/01_data/README.md): Preparing and loading data for each recommender algorithm
-2. [Model](notebooks/02_modeling/README.md): Building models using various recommender algorithms such as Smart Adaptive Recommendation (SAR), Alternating Least Square (ALS), etc.
+1. [Data Prep](notebooks/01_prepare_data/README.md): Preparing and loading data for each recommender algorithm
+2. [Model](notebooks/02_model/README.md): Building models using various recommender algorithms such as Smart Adaptive Recommendation (SAR), Alternating Least Square (ALS), etc.
 3. [Evalute](notebooks/03_evaluate/README.md): Evaluating algorithms with offline metrics
 4. [Operationalize](notebooks/04_operationalize/README.md): Operationalizing models in a production environment on Azure

@ -35,11 +35,11 @@ To setup on your local machine:
 | [als_pyspark_movielens](notebooks/00_quick_start/als_pyspark_movielens.ipynb) | Utilizing the ALS algorithm to power movie ratings in a PySpark environment.
 | [sar_python_cpu_movielens](notebooks/00_quick_start/sar_single_node_movielens.ipynb) | Utilizing the Smart Adaptive Recommendations (SAR) algorithm to power movie ratings in a Python+CPU environment.

-[Data Notebooks](notebooks/01_data) detail how to prepare and split data properly for recommendation systems
+[Data Notebooks](notebooks/01_prepare_data) detail how to prepare and split data properly for recommendation systems

 | Notebook | Description | 
 | --- | --- | 
-| [data_split](notebooks/01_data/data_split.ipynb) | Details on splitting data (randomly, chronologically, etc).
+| [data_split](notebooks/01_prepare_data/data_split.ipynb) | Details on splitting data (randomly, chronologically, etc).

 The [Modeling Notebooks](notebooks/02_modeling) deep dive into implemetnations of different recommender algorithms

--- a/SETUP.md
+++ b/SETUP.md
@ -98,6 +98,11 @@ We can register our created conda environment to appear as a kernel in the Jupyt
 ### Troubleshooting for the DSVM

 * We found that there could be problems if the Spark version of the machine is not the same as the one in the conda file. You will have to adapt the conda file to your machine. 
+* When running Spark on a single local node it is possible to run out of disk space as temporary files are written to the user's home directory. To avoid this we attached an additional disk to the DSVM and made modifications to the Spark configuration. This is done by including the following lines in the file at `/dsvm/tools/spark/current/conf/spark-env.sh`.
+```
+SPARK_LOCAL_DIRS=/mnt/.spark/scratch
+SPARK_MASTER_OPTS="-Dspark.worker.cleanup.enabled=true
+```

 ## Setup guide for Azure Databricks

--- a/notebooks/01_prepare_data/README.md
+++ b/notebooks/01_prepare_data/README.md
@ -1,5 +1,6 @@
-# Data Preperation
+# Data Preparation

 In this directory, notebooks are provided to illustrate [utility functions](../../reco_utils) for
 data operations such as data import / export, data transformation, data split, etc., which are frequent
 data preparation tasks witnessed in recommendation system development.
+
--- a/notebooks/01_prepare_data/data_split.ipynb
+++ b/notebooks/01_prepare_data/data_split.ipynb
--- a/notebooks/02_modeling/README.md
+++ b/notebooks/02_modeling/README.md
@ -1,4 +1,4 @@
-# Modeling
+# Model

 In this directory, notebooks are provided to deep dive into theoretical and / or implementation-wise
 details of recommendation algorithms like ALS, SAR, etc., which can be found in [utility functions](../../reco_utils).
--- a/notebooks/02_modeling/als_deep_dive.ipynb
+++ b/notebooks/02_modeling/als_deep_dive.ipynb
--- a/notebooks/02_modeling/surprise_svd_deep_dive.ipynb
+++ b/notebooks/02_modeling/surprise_svd_deep_dive.ipynb
--- a/tests/conftest.py
+++ b/tests/conftest.py
@ -140,12 +140,12 @@ def notebooks():
        "als_pyspark": os.path.join(
            folder_notebooks, "00_quick_start", "als_pyspark_movielens.ipynb"
        ),
-        "data_split": os.path.join(folder_notebooks, "01_data", "data_split.ipynb"),
+        "data_split": os.path.join(folder_notebooks, "01_prepare_data", "data_split.ipynb"),
        "als_deep_dive": os.path.join(
-            folder_notebooks, "02_modeling", "als_deep_dive.ipynb"
+            folder_notebooks, "02_model", "als_deep_dive.ipynb"
        ),
        "surprise_svd_deep_dive": os.path.join(
-            folder_notebooks, "02_modeling", "surprise_svd_deep_dive.ipynb"
+            folder_notebooks, "02_model", "surprise_svd_deep_dive.ipynb"
        ),
        "evaluation": os.path.join(folder_notebooks, "03_evaluate", "evaluation.ipynb"),
    }