Update spark SETUP
Spark env variable to be more visible Backup and recover SPARK_HOME
This commit is contained in:
Родитель
e184a11ef3
Коммит
42daeee158
44
SETUP.md
44
SETUP.md
|
@ -86,30 +86,12 @@ Additionally, if you want to test a particular version of spark, you may pass th
|
|||
|
||||
python scripts/generate_conda_file.py --pyspark-version 2.4.0
|
||||
|
||||
**NOTE** - for a PySpark environment, we need to set the environment variables `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` to point to the conda python executable.
|
||||
|
||||
To set these variables every time the environment is activated, we can follow the steps of this [guide](https://conda.io/docs/user-guide/tasks/manage-environments.html#macos-and-linux). Assuming that we have installed the environment in `/anaconda/envs/reco_pyspark`, we create the file `/anaconda/envs/reco_pyspark/etc/conda/activate.d/env_vars.sh` and add:
|
||||
|
||||
```bash
|
||||
#!/bin/sh
|
||||
export PYSPARK_PYTHON=/anaconda/envs/reco_pyspark/bin/python
|
||||
export PYSPARK_DRIVER_PYTHON=/anaconda/envs/reco_pyspark/bin/python
|
||||
unset SPARK_HOME
|
||||
```
|
||||
|
||||
This will export the variables every time we do `conda activate reco_pyspark`. To unset these variables when we deactivate the environment, we create the file `/anaconda/envs/reco_pyspark/etc/conda/deactivate.d/env_vars.sh` and add:
|
||||
|
||||
```bash
|
||||
#!/bin/sh
|
||||
unset PYSPARK_PYTHON
|
||||
unset PYSPARK_DRIVER_PYTHON
|
||||
```
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><strong><em>All environments</em></strong></summary>
|
||||
|
||||
To install all three environments:
|
||||
To install the PySpark GPU environment:
|
||||
|
||||
cd Recommenders
|
||||
python scripts/generate_conda_file.py --gpu --pyspark
|
||||
|
@ -117,6 +99,30 @@ To install all three environments:
|
|||
|
||||
</details>
|
||||
|
||||
|
||||
> **NOTE** - for PySpark environments (`reco_pyspark` and `reco_full`), we need to set the environment variables `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` to point to the conda python executable.
|
||||
>
|
||||
> To set these variables every time the environment is activated, we can follow the steps of this [guide](https://conda.io/docs/user-guide/tasks/manage-environments.html#macos-and-linux). Assuming that we have installed the environment in `/anaconda/envs/reco_pyspark`, we create the file `/anaconda/envs/reco_pyspark/etc/conda/activate.d/env_vars.sh` and add:
|
||||
>
|
||||
> ```bash
|
||||
> #!/bin/sh
|
||||
> export PYSPARK_PYTHON=/anaconda/envs/reco_pyspark/bin/python
|
||||
> export PYSPARK_DRIVER_PYTHON=/anaconda/envs/reco_pyspark/bin/python
|
||||
> export SPARK_HOME_BACKUP=$SPARK_HOME
|
||||
> unset SPARK_HOME
|
||||
> ```
|
||||
>
|
||||
> This will export the variables every time we do `conda activate reco_pyspark`. To unset these variables when we deactivate the environment, we create the file `/anaconda/envs/reco_pyspark/etc/conda/deactivate.d/env_vars.sh` and add:
|
||||
>
|
||||
> ```bash
|
||||
> #!/bin/sh
|
||||
> unset PYSPARK_PYTHON
|
||||
> unset PYSPARK_DRIVER_PYTHON
|
||||
> export SPARK_HOME=$SPARK_HOME_BACKUP
|
||||
> unset $SPARK_HOME_BACKUP
|
||||
> ```
|
||||
|
||||
|
||||
### Register the conda environment as a kernel in Jupyter
|
||||
|
||||
We can register our created conda environment to appear as a kernel in the Jupyter notebooks.
|
||||
|
|
Загрузка…
Ссылка в новой задаче