updated documentation

2019-02-27 20:45:47 -07:00 · 2019-02-27 20:45:47 -07:00 · 248f91bb5e
--- a/SETUP.md
+++ b/SETUP.md
@ -145,7 +145,7 @@ SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true, -Dspark.worker.cleanup.a
 * Python 3

 ### Repository installation
-You can setup the repository as a library on Databricks either manually or by running an [installation script](scripts/databricks_install.sh). Both options assume you have access to a provisioned Databricks workspace and cluster and that you have appropriate permissions to install libraries.
+You can setup the repository as a library on Databricks either manually or by running an [installation script](scripts/databricks_install.py). Both options assume you have access to a provisioned Databricks workspace and cluster and that you have appropriate permissions to install libraries.

 <details>
 <summary><strong><em>Quick install</em></strong></summary>
@ -169,19 +169,23 @@ This option utilizes an installation script to do the setup, and it requires add
 >        ```{shell}
 >        databricks clusters start --cluster-id <CLUSTER_ID>`
 >        ```
-> * The script also requires the `zip` command line utility, which may not be installed. You can install it with:
->     ```{shell}
->     sudo apt-get update
->     sudo apt-get install zip
->     ```
+

 Once you have confirmed the databricks cluster is *RUNNING*, install the modules within this repository with the following commands:

 ```{shell}
 cd Recommenders
-./scripts/databricks_install.sh <CLUSTER_ID>
+./scripts/databricks_install.py <CLUSTER_ID>
 ```

+**Note** If you are planning on running through the sample code for operationalization [here](notebooks//05_operationalize/als_movie_o16n.ipynb), you need to prepare the cluster for operationalization. You can do so by adding an additional option to the script run:
+
+```{shell}
+./scripts/databricks_install.py --prepare-o16n <CLUSTER_ID>
+```
+
+See below for details.
+
 </details>

 <details>
@ -220,24 +224,16 @@ import reco_utils

 ## Prepare Azure Databricks for Operationalization

-This repository includes an end-to-end example notebook that uses Azure Datbaricks to estimate a recommendation model using Alternating Least Squares, writes pre-computed recommendations to Azure Cosmos DB, and then creates a real-time scoring service that retrieves the recommendations from Cosmos DB. In order to execute that [notebook](notebooks//05_operationalize/als_movie_o16n.ipynb), you must install the Recommenders repository as a library (as described above), **AND** you must also install some additional dependencies. Similar to above, you can do so either manually or via an installation [script](scripts/prepare_databricks_for_o16n.sh).
+This repository includes an end-to-end example notebook that uses Azure Datbaricks to estimate a recommendation model using Alternating Least Squares, writes pre-computed recommendations to Azure Cosmos DB, and then creates a real-time scoring service that retrieves the recommendations from Cosmos DB. In order to execute that [notebook](notebooks//05_operationalize/als_movie_o16n.ipynb), you must install the Recommenders repository as a library (as described above), **AND** you must also install some additional dependencies. With the *Quick install* method, you just need to pass an additional option to the `scripts/databricks_install.py` script.

 <details>
 <summary><strong><em>Quick install</em></strong></summary>

-This option utilizes an installation script to do the setup, and it requires the same dependencies as the databricks installation script (see above).
-
-Once you have:
-
-* Installed and configured the databricks CLI
-* Confirmed that the appropriate cluster is *RUNNING*
-* Installed the Recommenders egg as described above
-* Confirmed you are in the root directory of the Recommenders repository
-
-you can install additional dependencies for operationalization with:
+This option utilizes the installation script to do the setup. Just run the installation script
+with an additional option. If you have already run the script once to upload and install the `Recommenders.egg` library, you can also add an `--overwrite` option:

 ```{shell}
-scripts/prepare_databricks_for_o16n.sh <CLUSTER_ID>
+scripts/databricks_install.py --overwrite --prepare-o16n <CLUSTER_ID>
 ```

 This script does all of the steps described in the *Manual setup* section below.
@ -249,9 +245,9 @@ This script does all of the steps described in the *Manual setup* section below.

 You must install three packages as libraries from PyPI:

-* `azure-cli`
-* `azureml-sdk[databricks]`
-* `pydocumentdb`
+* `azure-cli==2.0.56`
+* `azureml-sdk[databricks]==1.0.8`
+* `pydocumentdb==2.3.3`

 You can follow instructions [here](https://docs.azuredatabricks.net/user-guide/libraries.html#install-a-library-on-a-cluster) for details on how to install packages from PyPI.