This commit is contained in:
Jeremy Reynolds 2019-02-27 20:45:47 -07:00
Родитель 993d3d54b0
Коммит 248f91bb5e
1 изменённых файлов: 18 добавлений и 22 удалений

Просмотреть файл

@ -145,7 +145,7 @@ SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true, -Dspark.worker.cleanup.a
* Python 3 * Python 3
### Repository installation ### Repository installation
You can setup the repository as a library on Databricks either manually or by running an [installation script](scripts/databricks_install.sh). Both options assume you have access to a provisioned Databricks workspace and cluster and that you have appropriate permissions to install libraries. You can setup the repository as a library on Databricks either manually or by running an [installation script](scripts/databricks_install.py). Both options assume you have access to a provisioned Databricks workspace and cluster and that you have appropriate permissions to install libraries.
<details> <details>
<summary><strong><em>Quick install</em></strong></summary> <summary><strong><em>Quick install</em></strong></summary>
@ -169,19 +169,23 @@ This option utilizes an installation script to do the setup, and it requires add
> ```{shell} > ```{shell}
> databricks clusters start --cluster-id <CLUSTER_ID>` > databricks clusters start --cluster-id <CLUSTER_ID>`
> ``` > ```
> * The script also requires the `zip` command line utility, which may not be installed. You can install it with:
> ```{shell}
> sudo apt-get update
> sudo apt-get install zip
> ```
Once you have confirmed the databricks cluster is *RUNNING*, install the modules within this repository with the following commands: Once you have confirmed the databricks cluster is *RUNNING*, install the modules within this repository with the following commands:
```{shell} ```{shell}
cd Recommenders cd Recommenders
./scripts/databricks_install.sh <CLUSTER_ID> ./scripts/databricks_install.py <CLUSTER_ID>
``` ```
**Note** If you are planning on running through the sample code for operationalization [here](notebooks//05_operationalize/als_movie_o16n.ipynb), you need to prepare the cluster for operationalization. You can do so by adding an additional option to the script run:
```{shell}
./scripts/databricks_install.py --prepare-o16n <CLUSTER_ID>
```
See below for details.
</details> </details>
<details> <details>
@ -220,24 +224,16 @@ import reco_utils
## Prepare Azure Databricks for Operationalization ## Prepare Azure Databricks for Operationalization
This repository includes an end-to-end example notebook that uses Azure Datbaricks to estimate a recommendation model using Alternating Least Squares, writes pre-computed recommendations to Azure Cosmos DB, and then creates a real-time scoring service that retrieves the recommendations from Cosmos DB. In order to execute that [notebook](notebooks//05_operationalize/als_movie_o16n.ipynb), you must install the Recommenders repository as a library (as described above), **AND** you must also install some additional dependencies. Similar to above, you can do so either manually or via an installation [script](scripts/prepare_databricks_for_o16n.sh). This repository includes an end-to-end example notebook that uses Azure Datbaricks to estimate a recommendation model using Alternating Least Squares, writes pre-computed recommendations to Azure Cosmos DB, and then creates a real-time scoring service that retrieves the recommendations from Cosmos DB. In order to execute that [notebook](notebooks//05_operationalize/als_movie_o16n.ipynb), you must install the Recommenders repository as a library (as described above), **AND** you must also install some additional dependencies. With the *Quick install* method, you just need to pass an additional option to the `scripts/databricks_install.py` script.
<details> <details>
<summary><strong><em>Quick install</em></strong></summary> <summary><strong><em>Quick install</em></strong></summary>
This option utilizes an installation script to do the setup, and it requires the same dependencies as the databricks installation script (see above). This option utilizes the installation script to do the setup. Just run the installation script
with an additional option. If you have already run the script once to upload and install the `Recommenders.egg` library, you can also add an `--overwrite` option:
Once you have:
* Installed and configured the databricks CLI
* Confirmed that the appropriate cluster is *RUNNING*
* Installed the Recommenders egg as described above
* Confirmed you are in the root directory of the Recommenders repository
you can install additional dependencies for operationalization with:
```{shell} ```{shell}
scripts/prepare_databricks_for_o16n.sh <CLUSTER_ID> scripts/databricks_install.py --overwrite --prepare-o16n <CLUSTER_ID>
``` ```
This script does all of the steps described in the *Manual setup* section below. This script does all of the steps described in the *Manual setup* section below.
@ -249,9 +245,9 @@ This script does all of the steps described in the *Manual setup* section below.
You must install three packages as libraries from PyPI: You must install three packages as libraries from PyPI:
* `azure-cli` * `azure-cli==2.0.56`
* `azureml-sdk[databricks]` * `azureml-sdk[databricks]==1.0.8`
* `pydocumentdb` * `pydocumentdb==2.3.3`
You can follow instructions [here](https://docs.azuredatabricks.net/user-guide/libraries.html#install-a-library-on-a-cluster) for details on how to install packages from PyPI. You can follow instructions [here](https://docs.azuredatabricks.net/user-guide/libraries.html#install-a-library-on-a-cluster) for details on how to install packages from PyPI.