diff --git a/SETUP.md b/SETUP.md index 8d8dd183..cf8124d8 100644 --- a/SETUP.md +++ b/SETUP.md @@ -145,7 +145,7 @@ SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true, -Dspark.worker.cleanup.a * Python 3 ### Repository installation -You can setup the repository as a library on Databricks either manually or by running an [installation script](scripts/databricks_install.sh). Both options assume you have access to a provisioned Databricks workspace and cluster and that you have appropriate permissions to install libraries. +You can setup the repository as a library on Databricks either manually or by running an [installation script](scripts/databricks_install.py). Both options assume you have access to a provisioned Databricks workspace and cluster and that you have appropriate permissions to install libraries.
Quick install @@ -169,19 +169,23 @@ This option utilizes an installation script to do the setup, and it requires add > ```{shell} > databricks clusters start --cluster-id ` > ``` -> * The script also requires the `zip` command line utility, which may not be installed. You can install it with: -> ```{shell} -> sudo apt-get update -> sudo apt-get install zip -> ``` + Once you have confirmed the databricks cluster is *RUNNING*, install the modules within this repository with the following commands: ```{shell} cd Recommenders -./scripts/databricks_install.sh +./scripts/databricks_install.py ``` +**Note** If you are planning on running through the sample code for operationalization [here](notebooks//05_operationalize/als_movie_o16n.ipynb), you need to prepare the cluster for operationalization. You can do so by adding an additional option to the script run: + +```{shell} +./scripts/databricks_install.py --prepare-o16n +``` + +See below for details. +
@@ -220,24 +224,16 @@ import reco_utils ## Prepare Azure Databricks for Operationalization -This repository includes an end-to-end example notebook that uses Azure Datbaricks to estimate a recommendation model using Alternating Least Squares, writes pre-computed recommendations to Azure Cosmos DB, and then creates a real-time scoring service that retrieves the recommendations from Cosmos DB. In order to execute that [notebook](notebooks//05_operationalize/als_movie_o16n.ipynb), you must install the Recommenders repository as a library (as described above), **AND** you must also install some additional dependencies. Similar to above, you can do so either manually or via an installation [script](scripts/prepare_databricks_for_o16n.sh). +This repository includes an end-to-end example notebook that uses Azure Datbaricks to estimate a recommendation model using Alternating Least Squares, writes pre-computed recommendations to Azure Cosmos DB, and then creates a real-time scoring service that retrieves the recommendations from Cosmos DB. In order to execute that [notebook](notebooks//05_operationalize/als_movie_o16n.ipynb), you must install the Recommenders repository as a library (as described above), **AND** you must also install some additional dependencies. With the *Quick install* method, you just need to pass an additional option to the `scripts/databricks_install.py` script.
Quick install -This option utilizes an installation script to do the setup, and it requires the same dependencies as the databricks installation script (see above). - -Once you have: - -* Installed and configured the databricks CLI -* Confirmed that the appropriate cluster is *RUNNING* -* Installed the Recommenders egg as described above -* Confirmed you are in the root directory of the Recommenders repository - -you can install additional dependencies for operationalization with: +This option utilizes the installation script to do the setup. Just run the installation script +with an additional option. If you have already run the script once to upload and install the `Recommenders.egg` library, you can also add an `--overwrite` option: ```{shell} -scripts/prepare_databricks_for_o16n.sh +scripts/databricks_install.py --overwrite --prepare-o16n ``` This script does all of the steps described in the *Manual setup* section below. @@ -249,9 +245,9 @@ This script does all of the steps described in the *Manual setup* section below. You must install three packages as libraries from PyPI: -* `azure-cli` -* `azureml-sdk[databricks]` -* `pydocumentdb` +* `azure-cli==2.0.56` +* `azureml-sdk[databricks]==1.0.8` +* `pydocumentdb==2.3.3` You can follow instructions [here](https://docs.azuredatabricks.net/user-guide/libraries.html#install-a-library-on-a-cluster) for details on how to install packages from PyPI.