adjusting notebook folder names per comments, removing gpu references, adding pyspark troubleshooting, cleaning up benchmarking

This commit is contained in:
Scott Graham 2018-12-05 08:23:45 -05:00
Родитель fad27af27a
Коммит e4f90fa9d3
8 изменённых файлов: 15 добавлений и 9 удалений

Просмотреть файл

@ -1,8 +1,8 @@
# Recommenders
This repository provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on four key tasks:
1. [Data Prep](notebooks/01_data/README.md): Preparing and loading data for each recommender algorithm
2. [Model](notebooks/02_modeling/README.md): Building models using various recommender algorithms such as Smart Adaptive Recommendation (SAR), Alternating Least Square (ALS), etc.
1. [Data Prep](notebooks/01_prepare_data/README.md): Preparing and loading data for each recommender algorithm
2. [Model](notebooks/02_model/README.md): Building models using various recommender algorithms such as Smart Adaptive Recommendation (SAR), Alternating Least Square (ALS), etc.
3. [Evalute](notebooks/03_evaluate/README.md): Evaluating algorithms with offline metrics
4. [Operationalize](notebooks/04_operationalize/README.md): Operationalizing models in a production environment on Azure
@ -35,11 +35,11 @@ To setup on your local machine:
| [als_pyspark_movielens](notebooks/00_quick_start/als_pyspark_movielens.ipynb) | Utilizing the ALS algorithm to power movie ratings in a PySpark environment.
| [sar_python_cpu_movielens](notebooks/00_quick_start/sar_single_node_movielens.ipynb) | Utilizing the Smart Adaptive Recommendations (SAR) algorithm to power movie ratings in a Python+CPU environment.
[Data Notebooks](notebooks/01_data) detail how to prepare and split data properly for recommendation systems
[Data Notebooks](notebooks/01_prepare_data) detail how to prepare and split data properly for recommendation systems
| Notebook | Description |
| --- | --- |
| [data_split](notebooks/01_data/data_split.ipynb) | Details on splitting data (randomly, chronologically, etc).
| [data_split](notebooks/01_prepare_data/data_split.ipynb) | Details on splitting data (randomly, chronologically, etc).
The [Modeling Notebooks](notebooks/02_modeling) deep dive into implemetnations of different recommender algorithms

Просмотреть файл

@ -98,6 +98,11 @@ We can register our created conda environment to appear as a kernel in the Jupyt
### Troubleshooting for the DSVM
* We found that there could be problems if the Spark version of the machine is not the same as the one in the conda file. You will have to adapt the conda file to your machine.
* When running Spark on a single local node it is possible to run out of disk space as temporary files are written to the user's home directory. To avoid this we attached an additional disk to the DSVM and made modifications to the Spark configuration. This is done by including the following lines in the file at `/dsvm/tools/spark/current/conf/spark-env.sh`.
```
SPARK_LOCAL_DIRS=/mnt/.spark/scratch
SPARK_MASTER_OPTS="-Dspark.worker.cleanup.enabled=true
```
## Setup guide for Azure Databricks

Просмотреть файл

@ -1,5 +1,6 @@
# Data Preperation
# Data Preparation
In this directory, notebooks are provided to illustrate [utility functions](../../reco_utils) for
data operations such as data import / export, data transformation, data split, etc., which are frequent
data preparation tasks witnessed in recommendation system development.

Просмотреть файл

@ -1,4 +1,4 @@
# Modeling
# Model
In this directory, notebooks are provided to deep dive into theoretical and / or implementation-wise
details of recommendation algorithms like ALS, SAR, etc., which can be found in [utility functions](../../reco_utils).

Просмотреть файл

@ -140,12 +140,12 @@ def notebooks():
"als_pyspark": os.path.join(
folder_notebooks, "00_quick_start", "als_pyspark_movielens.ipynb"
),
"data_split": os.path.join(folder_notebooks, "01_data", "data_split.ipynb"),
"data_split": os.path.join(folder_notebooks, "01_prepare_data", "data_split.ipynb"),
"als_deep_dive": os.path.join(
folder_notebooks, "02_modeling", "als_deep_dive.ipynb"
folder_notebooks, "02_model", "als_deep_dive.ipynb"
),
"surprise_svd_deep_dive": os.path.join(
folder_notebooks, "02_modeling", "surprise_svd_deep_dive.ipynb"
folder_notebooks, "02_model", "surprise_svd_deep_dive.ipynb"
),
"evaluation": os.path.join(folder_notebooks, "03_evaluate", "evaluation.ipynb"),
}