Specify databricks version limit
This commit is contained in:
Родитель
d67b528552
Коммит
e5b0982400
18
SETUP.md
18
SETUP.md
|
@ -51,13 +51,13 @@ conda update anaconda # use 'conda install anaconda' if the package is no
|
|||
We provide a script, [generate_conda_file.py](tools/generate_conda_file.py), to generate a conda-environment yaml file
|
||||
which you can use to create the target environment using the Python version 3.6 with all the correct dependencies.
|
||||
|
||||
**NOTE** the `xlearn` package has dependency on `cmake`. If one uses the `xlearn` related notebooks or scripts, make sure `cmake` is installed in the system. The easiest way to install on Linux is with apt-get: `sudo apt-get install -y build-essential cmake`. Detailed instructions for installing `cmake` from source can be found [here](https://cmake.org/install/).
|
||||
**NOTE** the `xlearn` package has dependency on `cmake`. If one uses the `xlearn` related notebooks or scripts, make sure `cmake` is installed in the system. The easiest way to install on Linux is with apt-get: `sudo apt-get install -y build-essential cmake`. Detailed instructions for installing `cmake` from source can be found [here](https://cmake.org/install/).
|
||||
|
||||
Assuming the repo is cloned as `Recommenders` in the local system, to install **a default (Python CPU) environment**:
|
||||
|
||||
cd Recommenders
|
||||
python tools/generate_conda_file.py
|
||||
conda env create -f reco_base.yaml
|
||||
conda env create -f reco_base.yaml
|
||||
|
||||
You can specify the environment name as well with the flag `-n`.
|
||||
|
||||
|
@ -70,7 +70,7 @@ Assuming that you have a GPU machine, to install the Python GPU environment:
|
|||
|
||||
cd Recommenders
|
||||
python tools/generate_conda_file.py --gpu
|
||||
conda env create -f reco_gpu.yaml
|
||||
conda env create -f reco_gpu.yaml
|
||||
|
||||
</details>
|
||||
|
||||
|
@ -128,7 +128,7 @@ First, get the path of the environment `reco_pyspark` is installed:
|
|||
for /f "delims=" %A in ('conda env list ^| grep reco_pyspark ^| awk "{print $NF}"') do set "RECO_ENV=%A"
|
||||
|
||||
Then, create the file `%RECO_ENV%\etc\conda\activate.d\env_vars.bat` and add:
|
||||
|
||||
|
||||
@echo off
|
||||
for /f "delims=" %%A in ('conda env list ^| grep reco_pyspark ^| awk "{print $NF}"') do set "RECO_ENV=%%A"
|
||||
set PYSPARK_PYTHON=%RECO_ENV%\python.exe
|
||||
|
@ -149,7 +149,7 @@ create the file `%RECO_ENV%\etc\conda\deactivate.d\env_vars.bat` and add:
|
|||
set SPARK_HOME_BACKUP=
|
||||
set PYTHONPATH=%PYTHONPATH_BACKUP%
|
||||
set PYTHONPATH_BACKUP=
|
||||
|
||||
|
||||
</details>
|
||||
|
||||
</details>
|
||||
|
@ -176,7 +176,7 @@ We can register our created conda environment to appear as a kernel in the Jupyt
|
|||
|
||||
conda activate my_env_name
|
||||
python -m ipykernel install --user --name my_env_name --display-name "Python (my_env_name)"
|
||||
|
||||
|
||||
If you are using the DSVM, you can [connect to JupyterHub](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/dsvm-ubuntu-intro#jupyterhub-and-jupyterlab) by browsing to `https://your-vm-ip:8000`.
|
||||
|
||||
### Troubleshooting for the DSVM
|
||||
|
@ -204,7 +204,7 @@ sudo update-alternatives --config java
|
|||
|
||||
### Requirements of Azure Databricks
|
||||
|
||||
* Databricks Runtime version 4.3 (Apache Spark 2.3.1, Scala 2.11) or greater
|
||||
* Databricks Runtime version >= 4.3 (Apache Spark 2.3.1, Scala 2.11) and <= 5.5 (Apache Spark 2.4.3, Scala 2.11)
|
||||
* Python 3
|
||||
|
||||
An example of how to create an Azure Databricks workspace and an Apache Spark cluster within the workspace can be found from [here](https://docs.microsoft.com/en-us/azure/azure-databricks/quickstart-create-databricks-workspace-portal). To utilize deep learning models and GPUs, you may setup GPU-enabled cluster. For more details about this topic, please see [Azure Databricks deep learning guide](https://docs.azuredatabricks.net/applications/deep-learning/index.html).
|
||||
|
@ -242,7 +242,7 @@ The installation script has a number of options that can also deal with differen
|
|||
python tools/databricks_install.py -h
|
||||
```
|
||||
|
||||
Once you have confirmed the databricks cluster is *RUNNING*, install the modules within this repository with the following commands.
|
||||
Once you have confirmed the databricks cluster is *RUNNING*, install the modules within this repository with the following commands.
|
||||
|
||||
```{shell}
|
||||
cd Recommenders
|
||||
|
@ -339,7 +339,7 @@ Additionally, you must install the [spark-cosmosdb connector](https://docs.datab
|
|||
|
||||
## Install the utilities via PIP
|
||||
|
||||
A [setup.py](setup.py) file is provided in order to simplify the installation of the utilities in this repo from the main directory.
|
||||
A [setup.py](setup.py) file is provided in order to simplify the installation of the utilities in this repo from the main directory.
|
||||
|
||||
This still requires the conda environment to be installed as described above. Once the necessary dependencies are installed, you can use the following command to install `reco_utils` as a python package.
|
||||
|
||||
|
|
Загрузка…
Ссылка в новой задаче