updated documentation
This commit is contained in:
Родитель
993d3d54b0
Коммит
248f91bb5e
40
SETUP.md
40
SETUP.md
|
@ -145,7 +145,7 @@ SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true, -Dspark.worker.cleanup.a
|
|||
* Python 3
|
||||
|
||||
### Repository installation
|
||||
You can setup the repository as a library on Databricks either manually or by running an [installation script](scripts/databricks_install.sh). Both options assume you have access to a provisioned Databricks workspace and cluster and that you have appropriate permissions to install libraries.
|
||||
You can setup the repository as a library on Databricks either manually or by running an [installation script](scripts/databricks_install.py). Both options assume you have access to a provisioned Databricks workspace and cluster and that you have appropriate permissions to install libraries.
|
||||
|
||||
<details>
|
||||
<summary><strong><em>Quick install</em></strong></summary>
|
||||
|
@ -169,19 +169,23 @@ This option utilizes an installation script to do the setup, and it requires add
|
|||
> ```{shell}
|
||||
> databricks clusters start --cluster-id <CLUSTER_ID>`
|
||||
> ```
|
||||
> * The script also requires the `zip` command line utility, which may not be installed. You can install it with:
|
||||
> ```{shell}
|
||||
> sudo apt-get update
|
||||
> sudo apt-get install zip
|
||||
> ```
|
||||
|
||||
|
||||
Once you have confirmed the databricks cluster is *RUNNING*, install the modules within this repository with the following commands:
|
||||
|
||||
```{shell}
|
||||
cd Recommenders
|
||||
./scripts/databricks_install.sh <CLUSTER_ID>
|
||||
./scripts/databricks_install.py <CLUSTER_ID>
|
||||
```
|
||||
|
||||
**Note** If you are planning on running through the sample code for operationalization [here](notebooks//05_operationalize/als_movie_o16n.ipynb), you need to prepare the cluster for operationalization. You can do so by adding an additional option to the script run:
|
||||
|
||||
```{shell}
|
||||
./scripts/databricks_install.py --prepare-o16n <CLUSTER_ID>
|
||||
```
|
||||
|
||||
See below for details.
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
|
@ -220,24 +224,16 @@ import reco_utils
|
|||
|
||||
## Prepare Azure Databricks for Operationalization
|
||||
|
||||
This repository includes an end-to-end example notebook that uses Azure Datbaricks to estimate a recommendation model using Alternating Least Squares, writes pre-computed recommendations to Azure Cosmos DB, and then creates a real-time scoring service that retrieves the recommendations from Cosmos DB. In order to execute that [notebook](notebooks//05_operationalize/als_movie_o16n.ipynb), you must install the Recommenders repository as a library (as described above), **AND** you must also install some additional dependencies. Similar to above, you can do so either manually or via an installation [script](scripts/prepare_databricks_for_o16n.sh).
|
||||
This repository includes an end-to-end example notebook that uses Azure Datbaricks to estimate a recommendation model using Alternating Least Squares, writes pre-computed recommendations to Azure Cosmos DB, and then creates a real-time scoring service that retrieves the recommendations from Cosmos DB. In order to execute that [notebook](notebooks//05_operationalize/als_movie_o16n.ipynb), you must install the Recommenders repository as a library (as described above), **AND** you must also install some additional dependencies. With the *Quick install* method, you just need to pass an additional option to the `scripts/databricks_install.py` script.
|
||||
|
||||
<details>
|
||||
<summary><strong><em>Quick install</em></strong></summary>
|
||||
|
||||
This option utilizes an installation script to do the setup, and it requires the same dependencies as the databricks installation script (see above).
|
||||
|
||||
Once you have:
|
||||
|
||||
* Installed and configured the databricks CLI
|
||||
* Confirmed that the appropriate cluster is *RUNNING*
|
||||
* Installed the Recommenders egg as described above
|
||||
* Confirmed you are in the root directory of the Recommenders repository
|
||||
|
||||
you can install additional dependencies for operationalization with:
|
||||
This option utilizes the installation script to do the setup. Just run the installation script
|
||||
with an additional option. If you have already run the script once to upload and install the `Recommenders.egg` library, you can also add an `--overwrite` option:
|
||||
|
||||
```{shell}
|
||||
scripts/prepare_databricks_for_o16n.sh <CLUSTER_ID>
|
||||
scripts/databricks_install.py --overwrite --prepare-o16n <CLUSTER_ID>
|
||||
```
|
||||
|
||||
This script does all of the steps described in the *Manual setup* section below.
|
||||
|
@ -249,9 +245,9 @@ This script does all of the steps described in the *Manual setup* section below.
|
|||
|
||||
You must install three packages as libraries from PyPI:
|
||||
|
||||
* `azure-cli`
|
||||
* `azureml-sdk[databricks]`
|
||||
* `pydocumentdb`
|
||||
* `azure-cli==2.0.56`
|
||||
* `azureml-sdk[databricks]==1.0.8`
|
||||
* `pydocumentdb==2.3.3`
|
||||
|
||||
You can follow instructions [here](https://docs.azuredatabricks.net/user-guide/libraries.html#install-a-library-on-a-cluster) for details on how to install packages from PyPI.
|
||||
|
||||
|
|
Загрузка…
Ссылка в новой задаче