updated documentation
This commit is contained in:
Родитель
a71779d474
Коммит
2409fcfd30
116
INSTALL.md
116
INSTALL.md
|
@ -1,6 +1,63 @@
|
|||
# Installation and setup
|
||||
|
||||
Here we present the instructions for setting up the project on an [Ubuntu Azure VM](https://azure.microsoft.com/en-us/services/virtual-machines/). The VM we used for the CPU experiments is a Standard DS15 v2 with 20 cores and 140Gb of memory. For the GPU experiments we used a NV24 with 4 NVIDIA M60 GPUs. In both machines the OS was Ubuntu 16.04.
|
||||
Here we present the instructions for setting up the project on an [Ubuntu Azure VM](https://azure.microsoft.com/en-us/services/virtual-machines/). The VM we used for the CPU experiments is a Standard DS15 v2 with 20 cores and 140Gb of memory. For the GPU experiments we used a NV24 with 4 NVIDIA M60 GPUs. In both machines the OS was Ubuntu 16.04. We recommend to use the [Azure Data Science VM](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/microsoft-ads.standard-data-science-vm) which comes with many machine learning tools already installed.
|
||||
|
||||
## Setting up the environment
|
||||
|
||||
Clone this repo to your desired location
|
||||
```bash
|
||||
git clone https://github.com/Azure/fast_retraining.git
|
||||
```
|
||||
|
||||
Create a conda environment if you haven't already done so. The command below creates a python 3 environment called sbsa.
|
||||
```bash
|
||||
conda create --name strata python=3 anaconda
|
||||
```
|
||||
|
||||
Edit [activate_env_vars.sh](environment/activate_env_vars.sh ) and [deactivate_env_vars.sh](environment/deactivate_env_vars.sh )
|
||||
so that they contain the correct information.
|
||||
|
||||
Install command line jason parser
|
||||
```bash
|
||||
apt-get install jq
|
||||
```
|
||||
|
||||
Activate the conda environment and install the requirements.
|
||||
```bash
|
||||
source activate strata
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
Get the currently activated environment and assign it to env_path.
|
||||
Get info of current env and output to json | look for default_prefix element in JSON | remove all quotes
|
||||
```bash
|
||||
env_path=$(conda info --json | jq '.default_prefix' | tr -d '"')
|
||||
```
|
||||
|
||||
Make sure you are in the environment folder of the project and run the following
|
||||
```bash
|
||||
activate_script_path=$(readlink -f activate_env_vars.sh)
|
||||
deactivate_script_path=$(readlink -f deactivate_env_vars.sh)
|
||||
```
|
||||
|
||||
Then we create the activation and deactivation scripts and make sure they point to our now modified activation
|
||||
and deactivation scripts in our environment folder
|
||||
```bash
|
||||
mkdir -p $env_path/etc/conda/activate.d
|
||||
mkdir -p $env_path/etc/conda/deactivate.d
|
||||
echo 'source '$activate_script_path >> $env_path/etc/conda/activate.d/env_vars.sh
|
||||
echo 'source '$deactivate_script_path >> $env_path/etc/conda/deactivate.d/env_vars.sh
|
||||
```
|
||||
|
||||
Exit the environment
|
||||
```bash
|
||||
source deactivate
|
||||
```
|
||||
|
||||
Enter the environment again
|
||||
```bash
|
||||
source activate strata
|
||||
```
|
||||
|
||||
## Installation of boosted tree libraries
|
||||
|
||||
|
@ -52,60 +109,3 @@ Finally, to check that the libraries are correctly installed, try to load them f
|
|||
|
||||
python -c "import xgboost; import lightgbm"
|
||||
|
||||
|
||||
|
||||
Clone this repo to your desired location
|
||||
```bash
|
||||
git clone https://github.com/Azure/fast_retraining.git
|
||||
```
|
||||
|
||||
Create a conda environment if you haven't already done so. The command below creates a python 3 environment called sbsa.
|
||||
```bash
|
||||
conda create --name strata python=3 anaconda
|
||||
```
|
||||
|
||||
|
||||
Edit [activate_env_vars.sh](environment/activate_env_vars.sh ) and [deactivate_env_vars.sh](environment/deactivate_env_vars.sh )
|
||||
so that they contain the correct information.
|
||||
|
||||
Install command line jason parser
|
||||
```bash
|
||||
apt-get install jq
|
||||
```
|
||||
|
||||
Activate the conda environment
|
||||
```bash
|
||||
source activate strata
|
||||
```
|
||||
|
||||
Get the currently activated environment and assign it to env_path.
|
||||
Get info of current env and output to json | look for default_prefix element in JSON | remove all quotes
|
||||
```bash
|
||||
env_path=$(conda info --json | jq '.default_prefix' | tr -d '"')
|
||||
```
|
||||
|
||||
Make sure you are in the environemnt folder of the project and run the following
|
||||
```bash
|
||||
activate_script_path=$(readlink -f activate_env_vars.sh)
|
||||
deactivate_script_path=$(readlink -f deactivate_env_vars.sh)
|
||||
```
|
||||
|
||||
Then we create the activation and deactivation scripts and make sure they point to our now modified activation
|
||||
and deactivation scripts in our environment folder
|
||||
```bash
|
||||
mkdir -p $env_path/etc/conda/activate.d
|
||||
mkdir -p $env_path/etc/conda/deactivate.d
|
||||
echo 'source '$activate_script_path >> $env_path/etc/conda/activate.d/env_vars.sh
|
||||
echo 'source '$deactivate_script_path >> $env_path/etc/conda/deactivate.d/env_vars.sh
|
||||
```
|
||||
|
||||
Exit the environment
|
||||
```bash
|
||||
source deactivate
|
||||
```
|
||||
|
||||
Enter the environment again
|
||||
```bash
|
||||
source activate strata
|
||||
```
|
||||
|
||||
|
|
75
README.md
75
README.md
|
@ -2,9 +2,23 @@
|
|||
|
||||
This repo shows how to perform fast retraining with LightGBM in different business cases.
|
||||
|
||||
In this repo we show we compare two of the fastest boosted decision tree libraries: [XGBoost](https://github.com/dmlc/xgboost) and [LightGBM](https://github.com/microsoft/LightGBM). We will evaluate them across datasets of several domains and different sizes.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
The installation instructions can be found [here](./INSTALL.md).
|
||||
|
||||
## Project
|
||||
|
||||
In the folder [experiments](./experiments) you can find the different experiments of the project.
|
||||
In the folder [experiments](./experiments) you can find the different experiments of the project. We developed 5 experiments with the CPU versions of the libraries and 2 experiments with the GPU version.
|
||||
|
||||
* Airline
|
||||
* BCI
|
||||
* Football
|
||||
* Planet Amazon
|
||||
* Fraud Detection
|
||||
* Airline (GPU version)
|
||||
* HIGGS (GPU version)
|
||||
|
||||
In the folder [experiment/libs](./experiment/libs) there is the common code for the project.
|
||||
|
||||
|
@ -12,62 +26,3 @@ In the folder [experiment/libs](./experiment/libs) there is the common code for
|
|||
|
||||
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
|
||||
|
||||
|
||||
### Setup
|
||||
Instruction for setting up on a Linux Microsoft DSVM
|
||||
|
||||
Clone this repo to your desired location
|
||||
```bash
|
||||
git clone https://github.com/Azure/fast_retraining.git
|
||||
```
|
||||
|
||||
Create a conda environment if you haven't already done so. The command below creates a python 3 environment called sbsa.
|
||||
```bash
|
||||
conda create --name strata python=3 anaconda
|
||||
```
|
||||
|
||||
|
||||
Edit [activate_env_vars.sh](environment/activate_env_vars.sh ) and [deactivate_env_vars.sh](environment/deactivate_env_vars.sh )
|
||||
so that they contain the correct information.
|
||||
|
||||
Install command line jason parser
|
||||
```bash
|
||||
apt-get install jq
|
||||
```
|
||||
|
||||
Activate the conda environment
|
||||
```bash
|
||||
source activate strata
|
||||
```
|
||||
|
||||
Get the currently activated environment and assign it to env_path.
|
||||
Get info of current env and output to json | look for default_prefix element in JSON | remove all quotes
|
||||
```bash
|
||||
env_path=$(conda info --json | jq '.default_prefix' | tr -d '"')
|
||||
```
|
||||
|
||||
Make sure you are in the environemnt folder of the project and run the following
|
||||
```bash
|
||||
activate_script_path=$(readlink -f activate_env_vars.sh)
|
||||
deactivate_script_path=$(readlink -f deactivate_env_vars.sh)
|
||||
```
|
||||
|
||||
Then we create the activation and deactivation scripts and make sure they point to our now modified activation
|
||||
and deactivation scripts in our environment folder
|
||||
```bash
|
||||
mkdir -p $env_path/etc/conda/activate.d
|
||||
mkdir -p $env_path/etc/conda/deactivate.d
|
||||
echo 'source '$activate_script_path >> $env_path/etc/conda/activate.d/env_vars.sh
|
||||
echo 'source '$deactivate_script_path >> $env_path/etc/conda/deactivate.d/env_vars.sh
|
||||
```
|
||||
|
||||
Exit the environment
|
||||
```bash
|
||||
source deactivate
|
||||
```
|
||||
|
||||
Enter the environment again
|
||||
```bash
|
||||
source activate strata
|
||||
```
|
||||
|
||||
|
|
|
@ -13,8 +13,6 @@ export PYTHONPATH=$PYTHONPATH:$PROJECTPATH # Adds the repository to the python p
|
|||
export OLD_PATH=$PATH
|
||||
export PATH=$PATH:PROJECTPATH
|
||||
|
||||
export MOUNT_POINT=/fileshare # The mounting location for the data
|
||||
export CACHE_DIR=$MOUNT_POINT'/cache'
|
||||
export STORAGE_URI=//migonzadldsvm2storage.file.core.windows.net/strata-fileshare
|
||||
|
||||
# The mounting location for the data
|
||||
export MOUNT_POINT=/fileshare
|
||||
echo Me Gusta!
|
||||
|
|
|
@ -4,6 +4,4 @@ export PYTHONPATH=$OLD_PYTHON_PATH
|
|||
export PATH=$OLD_PATH
|
||||
export MOUNT_POINT=
|
||||
export CACHE_DIR=
|
||||
export STORAGE_URI=
|
||||
export PROJECTPATH=
|
||||
echo Noooooooooooooooo
|
Загрузка…
Ссылка в новой задаче