Databricks installation script

Add installation sh
Update SETUP.md accordingly
This commit is contained in:
Jun Ki Min 2019-01-28 13:16:13 -05:00
Родитель b3b10a12b9
Коммит 0b70706649
2 изменённых файлов: 74 добавлений и 17 удалений

Просмотреть файл

@ -12,9 +12,8 @@ In this guide we show how to setup all the dependencies to run the notebooks of
* [Troubleshooting for the DSVM](#troubleshooting-for-the-dsvm)
* [Setup guide for Azure Databricks](#setup-guide-for-azure-databricks)
* [Requirements of Azure Databricks](#requirements-of-azure-databricks)
* [Repository upload](#repository-upload)
* [Repository installation](#repository-installation)
* [Troubleshooting for Azure Databricks](#troubleshooting-for-azure-databricks)
</details>
## Compute environments
@ -134,24 +133,57 @@ SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true, -Dspark.worker.cleanup.a
* Runtime version 4.1 (Apache Spark 2.3.0, Scala 2.11)
* Python 3
### Repository upload
We need to zip and upload the repository to be used in Databricks, the steps are the following:
* Clone Microsoft Recommenders repo in your local computer.
* Zip the contents inside the Recommenders folder (Azure Databricks requires compressed folders to have the .egg suffix, so we don't use the standard .zip):
```
cd Recommenders
zip -r Recommenders.egg .
```
* Once your cluster has started, go to the Databricks home workspace, then go to your user and press import.
* In the next menu there is an option to import a library, it says: `To import a library, such as a jar or egg, click here`. Press click here.
* Then, at the first drop-down menu, mark the option `Upload Python egg or PyPI`.
* Then press on `Drop library egg here to upload` and select the the file `Recommenders.egg` you just created.
* Then press `Create library`. This will upload the zip and make it available in your workspace.
* Finally, in the next menu, attach the library to your cluster.
### Repository installation
You can setup the repository as a library on Databricks either manually or by simply running an installation script.
To make sure it works, you can now create a new notebook and import the utilities:
<details>
<summary><strong><em>Quick install</em></strong></summary>
> This method only works for **Azure** Databricks.
Prerequisite
* Install [Azure Databricks CLI (command-line interface)](https://docs.azuredatabricks.net/user-guide/dev-tools/databricks-cli.html#install-the-cli)
and setup CLI [authentication](https://docs.azuredatabricks.net/user-guide/dev-tools/databricks-cli.html#set-up-authentication).
1. Start a target cluster and copy the target cluster id. Cluster id can be found with following script:
```
databricks clusters list
<CLUSTER_ID> <CLUSTER_NAME> <STATUS>
...
```
2. Run following commands replacing CLUSTER_ID with the id you copied from the previous step:
```
cd Recommenders
./scripts/databricks_install.sh CLUSTER_ID
```
</details>
<details>
<summary><strong><em>Manual setup</em></strong></summary>
To install the repo manually onto Databricks, follow the steps:
1. Clone Microsoft Recommenders repo in your local computer.
2. Zip the contents inside the Recommenders folder (Azure Databricks requires compressed folders to have the .egg suffix, so we don't use the standard .zip):
```
cd Recommenders
zip -r Recommenders.egg .
```
3. Once your cluster has started, go to the Databricks home workspace, then go to your user and press import.
4. In the next menu there is an option to import a library, it says: `To import a library, such as a jar or egg, click here`. Press click here.
5. Then, at the first drop-down menu, mark the option `Upload Python egg or PyPI`.
6. Then press on `Drop library egg here to upload` and select the the file `Recommenders.egg` you just created.
7. Then press `Create library`. This will upload the zip and make it available in your workspace.
8. Finally, in the next menu, attach the library to your cluster.
</details>
To make sure it works, you can now create a new notebook and import the utilities from Databricks:
```
import reco_utils
...
```
### Troubleshooting for Azure Databricks

25
scripts/databricks_install.sh Executable file
Просмотреть файл

@ -0,0 +1,25 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# ---------------------------------------------------------
# This script installs Recommenders into Azure Databricks
echo "Preparing Recommenders library file (egg)..."
zip -r -q Recommenders.egg . -i *.py -x tests/\* scripts/\*
echo "Uploading to databricks..."
dbfs cp --overwrite Recommenders.egg dbfs:/FileStore/Recommenders.egg
# Cluster id should be passed by the first argument
# Cluster id can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#/setting/clusters/$CLUSTER_ID/configuration.
CLUSTER_ID=$1
echo "Installing the library onto databricks cluster $CLUSTER_ID..."
databricks libraries install --cluster-id $CLUSTER_ID --egg dbfs:/FileStore/Recommenders.egg
databricks libraries cluster-status --cluster-id $CLUSTER_ID
# Restart cluster to make the library active (need when upgrade the library)
echo "Restarting the cluster... will take few seconds. Please check the result from Databricks workspace"
databricks clusters restart --cluster-id $CLUSTER_ID
rm Recommenders.egg