updated documentation
This commit is contained in:
Родитель
993d3d54b0
Коммит
248f91bb5e
40
SETUP.md
40
SETUP.md
|
@ -145,7 +145,7 @@ SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true, -Dspark.worker.cleanup.a
|
||||||
* Python 3
|
* Python 3
|
||||||
|
|
||||||
### Repository installation
|
### Repository installation
|
||||||
You can setup the repository as a library on Databricks either manually or by running an [installation script](scripts/databricks_install.sh). Both options assume you have access to a provisioned Databricks workspace and cluster and that you have appropriate permissions to install libraries.
|
You can setup the repository as a library on Databricks either manually or by running an [installation script](scripts/databricks_install.py). Both options assume you have access to a provisioned Databricks workspace and cluster and that you have appropriate permissions to install libraries.
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary><strong><em>Quick install</em></strong></summary>
|
<summary><strong><em>Quick install</em></strong></summary>
|
||||||
|
@ -169,19 +169,23 @@ This option utilizes an installation script to do the setup, and it requires add
|
||||||
> ```{shell}
|
> ```{shell}
|
||||||
> databricks clusters start --cluster-id <CLUSTER_ID>`
|
> databricks clusters start --cluster-id <CLUSTER_ID>`
|
||||||
> ```
|
> ```
|
||||||
> * The script also requires the `zip` command line utility, which may not be installed. You can install it with:
|
|
||||||
> ```{shell}
|
|
||||||
> sudo apt-get update
|
|
||||||
> sudo apt-get install zip
|
|
||||||
> ```
|
|
||||||
|
|
||||||
Once you have confirmed the databricks cluster is *RUNNING*, install the modules within this repository with the following commands:
|
Once you have confirmed the databricks cluster is *RUNNING*, install the modules within this repository with the following commands:
|
||||||
|
|
||||||
```{shell}
|
```{shell}
|
||||||
cd Recommenders
|
cd Recommenders
|
||||||
./scripts/databricks_install.sh <CLUSTER_ID>
|
./scripts/databricks_install.py <CLUSTER_ID>
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Note** If you are planning on running through the sample code for operationalization [here](notebooks//05_operationalize/als_movie_o16n.ipynb), you need to prepare the cluster for operationalization. You can do so by adding an additional option to the script run:
|
||||||
|
|
||||||
|
```{shell}
|
||||||
|
./scripts/databricks_install.py --prepare-o16n <CLUSTER_ID>
|
||||||
|
```
|
||||||
|
|
||||||
|
See below for details.
|
||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
|
@ -220,24 +224,16 @@ import reco_utils
|
||||||
|
|
||||||
## Prepare Azure Databricks for Operationalization
|
## Prepare Azure Databricks for Operationalization
|
||||||
|
|
||||||
This repository includes an end-to-end example notebook that uses Azure Datbaricks to estimate a recommendation model using Alternating Least Squares, writes pre-computed recommendations to Azure Cosmos DB, and then creates a real-time scoring service that retrieves the recommendations from Cosmos DB. In order to execute that [notebook](notebooks//05_operationalize/als_movie_o16n.ipynb), you must install the Recommenders repository as a library (as described above), **AND** you must also install some additional dependencies. Similar to above, you can do so either manually or via an installation [script](scripts/prepare_databricks_for_o16n.sh).
|
This repository includes an end-to-end example notebook that uses Azure Datbaricks to estimate a recommendation model using Alternating Least Squares, writes pre-computed recommendations to Azure Cosmos DB, and then creates a real-time scoring service that retrieves the recommendations from Cosmos DB. In order to execute that [notebook](notebooks//05_operationalize/als_movie_o16n.ipynb), you must install the Recommenders repository as a library (as described above), **AND** you must also install some additional dependencies. With the *Quick install* method, you just need to pass an additional option to the `scripts/databricks_install.py` script.
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary><strong><em>Quick install</em></strong></summary>
|
<summary><strong><em>Quick install</em></strong></summary>
|
||||||
|
|
||||||
This option utilizes an installation script to do the setup, and it requires the same dependencies as the databricks installation script (see above).
|
This option utilizes the installation script to do the setup. Just run the installation script
|
||||||
|
with an additional option. If you have already run the script once to upload and install the `Recommenders.egg` library, you can also add an `--overwrite` option:
|
||||||
Once you have:
|
|
||||||
|
|
||||||
* Installed and configured the databricks CLI
|
|
||||||
* Confirmed that the appropriate cluster is *RUNNING*
|
|
||||||
* Installed the Recommenders egg as described above
|
|
||||||
* Confirmed you are in the root directory of the Recommenders repository
|
|
||||||
|
|
||||||
you can install additional dependencies for operationalization with:
|
|
||||||
|
|
||||||
```{shell}
|
```{shell}
|
||||||
scripts/prepare_databricks_for_o16n.sh <CLUSTER_ID>
|
scripts/databricks_install.py --overwrite --prepare-o16n <CLUSTER_ID>
|
||||||
```
|
```
|
||||||
|
|
||||||
This script does all of the steps described in the *Manual setup* section below.
|
This script does all of the steps described in the *Manual setup* section below.
|
||||||
|
@ -249,9 +245,9 @@ This script does all of the steps described in the *Manual setup* section below.
|
||||||
|
|
||||||
You must install three packages as libraries from PyPI:
|
You must install three packages as libraries from PyPI:
|
||||||
|
|
||||||
* `azure-cli`
|
* `azure-cli==2.0.56`
|
||||||
* `azureml-sdk[databricks]`
|
* `azureml-sdk[databricks]==1.0.8`
|
||||||
* `pydocumentdb`
|
* `pydocumentdb==2.3.3`
|
||||||
|
|
||||||
You can follow instructions [here](https://docs.azuredatabricks.net/user-guide/libraries.html#install-a-library-on-a-cluster) for details on how to install packages from PyPI.
|
You can follow instructions [here](https://docs.azuredatabricks.net/user-guide/libraries.html#install-a-library-on-a-cluster) for details on how to install packages from PyPI.
|
||||||
|
|
||||||
|
|
Загрузка…
Ссылка в новой задаче