Specify databricks version limit

2020-09-10 14:43:22 -04:00 · 2020-09-10 14:43:22 -04:00 · e5b0982400
--- a/SETUP.md
+++ b/SETUP.md
@ -51,13 +51,13 @@ conda update anaconda        # use 'conda install anaconda' if the package is no
 We provide a script, [generate_conda_file.py](tools/generate_conda_file.py), to generate a conda-environment yaml file
 which you can use to create the target environment using the Python version 3.6 with all the correct dependencies.

-**NOTE** the `xlearn` package has dependency on `cmake`. If one uses the `xlearn` related notebooks or scripts, make sure `cmake` is installed in the system. The easiest way to install on Linux is with apt-get: `sudo apt-get install -y build-essential cmake`. Detailed instructions for installing `cmake` from source can be found [here](https://cmake.org/install/). 
+**NOTE** the `xlearn` package has dependency on `cmake`. If one uses the `xlearn` related notebooks or scripts, make sure `cmake` is installed in the system. The easiest way to install on Linux is with apt-get: `sudo apt-get install -y build-essential cmake`. Detailed instructions for installing `cmake` from source can be found [here](https://cmake.org/install/).

 Assuming the repo is cloned as `Recommenders` in the local system, to install **a default (Python CPU) environment**:

    cd Recommenders
    python tools/generate_conda_file.py
-    conda env create -f reco_base.yaml 
+    conda env create -f reco_base.yaml

 You can specify the environment name as well with the flag `-n`.

@ -70,7 +70,7 @@ Assuming that you have a GPU machine, to install the Python GPU environment:

    cd Recommenders
    python tools/generate_conda_file.py --gpu
-    conda env create -f reco_gpu.yaml 
+    conda env create -f reco_gpu.yaml

 </details>

@ -128,7 +128,7 @@ First, get the path of the environment `reco_pyspark` is installed:
    for /f "delims=" %A in ('conda env list ^| grep reco_pyspark ^| awk "{print $NF}"') do set "RECO_ENV=%A"

 Then, create the file `%RECO_ENV%\etc\conda\activate.d\env_vars.bat` and add:
- 
+
    @echo off
    for /f "delims=" %%A in ('conda env list ^| grep reco_pyspark ^| awk "{print $NF}"') do set "RECO_ENV=%%A"
    set PYSPARK_PYTHON=%RECO_ENV%\python.exe
@ -149,7 +149,7 @@ create the file `%RECO_ENV%\etc\conda\deactivate.d\env_vars.bat` and add:
    set SPARK_HOME_BACKUP=
    set PYTHONPATH=%PYTHONPATH_BACKUP%
    set PYTHONPATH_BACKUP=
- 
+
 </details>

 </details>
@ -176,7 +176,7 @@ We can register our created conda environment to appear as a kernel in the Jupyt

    conda activate my_env_name
    python -m ipykernel install --user --name my_env_name --display-name "Python (my_env_name)"
-    
+
 If you are using the DSVM, you can [connect to JupyterHub](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/dsvm-ubuntu-intro#jupyterhub-and-jupyterlab) by browsing to `https://your-vm-ip:8000`.

 ### Troubleshooting for the DSVM
@ -204,7 +204,7 @@ sudo update-alternatives --config java

 ### Requirements of Azure Databricks

-* Databricks Runtime version 4.3 (Apache Spark 2.3.1, Scala 2.11) or greater
+* Databricks Runtime version >= 4.3 (Apache Spark 2.3.1, Scala 2.11) and <= 5.5 (Apache Spark 2.4.3, Scala 2.11)
 * Python 3

 An example of how to create an Azure Databricks workspace and an Apache Spark cluster within the workspace can be found from [here](https://docs.microsoft.com/en-us/azure/azure-databricks/quickstart-create-databricks-workspace-portal). To utilize deep learning models and GPUs, you may setup GPU-enabled cluster. For more details about this topic, please see [Azure Databricks deep learning guide](https://docs.azuredatabricks.net/applications/deep-learning/index.html).
@ -242,7 +242,7 @@ The installation script has a number of options that can also deal with differen
 python tools/databricks_install.py -h
 ```

-Once you have confirmed the databricks cluster is *RUNNING*, install the modules within this repository with the following commands. 
+Once you have confirmed the databricks cluster is *RUNNING*, install the modules within this repository with the following commands.

 ```{shell}
 cd Recommenders
@ -339,7 +339,7 @@ Additionally, you must install the [spark-cosmosdb connector](https://docs.datab

 ## Install the utilities via PIP

-A [setup.py](setup.py) file is provided in order to simplify the installation of the utilities in this repo from the main directory. 
+A [setup.py](setup.py) file is provided in order to simplify the installation of the utilities in this repo from the main directory.

 This still requires the conda environment to be installed as described above. Once the necessary dependencies are installed, you can use the following command to install `reco_utils` as a python package.