This commit is contained in:
Miguel Fierro 2017-06-20 15:05:55 +01:00
Родитель a71779d474
Коммит 2409fcfd30
4 изменённых файлов: 75 добавлений и 124 удалений

Просмотреть файл

@ -1,6 +1,63 @@
# Installation and setup
Here we present the instructions for setting up the project on an [Ubuntu Azure VM](https://azure.microsoft.com/en-us/services/virtual-machines/). The VM we used for the CPU experiments is a Standard DS15 v2 with 20 cores and 140Gb of memory. For the GPU experiments we used a NV24 with 4 NVIDIA M60 GPUs. In both machines the OS was Ubuntu 16.04.
Here we present the instructions for setting up the project on an [Ubuntu Azure VM](https://azure.microsoft.com/en-us/services/virtual-machines/). The VM we used for the CPU experiments is a Standard DS15 v2 with 20 cores and 140Gb of memory. For the GPU experiments we used a NV24 with 4 NVIDIA M60 GPUs. In both machines the OS was Ubuntu 16.04. We recommend to use the [Azure Data Science VM](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/microsoft-ads.standard-data-science-vm) which comes with many machine learning tools already installed.
## Setting up the environment
Clone this repo to your desired location
```bash
git clone https://github.com/Azure/fast_retraining.git
```
Create a conda environment if you haven't already done so. The command below creates a python 3 environment called sbsa.
```bash
conda create --name strata python=3 anaconda
```
Edit [activate_env_vars.sh](environment/activate_env_vars.sh ) and [deactivate_env_vars.sh](environment/deactivate_env_vars.sh )
so that they contain the correct information.
Install command line jason parser
```bash
apt-get install jq
```
Activate the conda environment and install the requirements.
```bash
source activate strata
pip install -r requirements.txt
```
Get the currently activated environment and assign it to env_path.
Get info of current env and output to json | look for default_prefix element in JSON | remove all quotes
```bash
env_path=$(conda info --json | jq '.default_prefix' | tr -d '"')
```
Make sure you are in the environment folder of the project and run the following
```bash
activate_script_path=$(readlink -f activate_env_vars.sh)
deactivate_script_path=$(readlink -f deactivate_env_vars.sh)
```
Then we create the activation and deactivation scripts and make sure they point to our now modified activation
and deactivation scripts in our environment folder
```bash
mkdir -p $env_path/etc/conda/activate.d
mkdir -p $env_path/etc/conda/deactivate.d
echo 'source '$activate_script_path >> $env_path/etc/conda/activate.d/env_vars.sh
echo 'source '$deactivate_script_path >> $env_path/etc/conda/deactivate.d/env_vars.sh
```
Exit the environment
```bash
source deactivate
```
Enter the environment again
```bash
source activate strata
```
## Installation of boosted tree libraries
@ -52,60 +109,3 @@ Finally, to check that the libraries are correctly installed, try to load them f
python -c "import xgboost; import lightgbm"
Clone this repo to your desired location
```bash
git clone https://github.com/Azure/fast_retraining.git
```
Create a conda environment if you haven't already done so. The command below creates a python 3 environment called sbsa.
```bash
conda create --name strata python=3 anaconda
```
Edit [activate_env_vars.sh](environment/activate_env_vars.sh ) and [deactivate_env_vars.sh](environment/deactivate_env_vars.sh )
so that they contain the correct information.
Install command line jason parser
```bash
apt-get install jq
```
Activate the conda environment
```bash
source activate strata
```
Get the currently activated environment and assign it to env_path.
Get info of current env and output to json | look for default_prefix element in JSON | remove all quotes
```bash
env_path=$(conda info --json | jq '.default_prefix' | tr -d '"')
```
Make sure you are in the environemnt folder of the project and run the following
```bash
activate_script_path=$(readlink -f activate_env_vars.sh)
deactivate_script_path=$(readlink -f deactivate_env_vars.sh)
```
Then we create the activation and deactivation scripts and make sure they point to our now modified activation
and deactivation scripts in our environment folder
```bash
mkdir -p $env_path/etc/conda/activate.d
mkdir -p $env_path/etc/conda/deactivate.d
echo 'source '$activate_script_path >> $env_path/etc/conda/activate.d/env_vars.sh
echo 'source '$deactivate_script_path >> $env_path/etc/conda/deactivate.d/env_vars.sh
```
Exit the environment
```bash
source deactivate
```
Enter the environment again
```bash
source activate strata
```

Просмотреть файл

@ -2,9 +2,23 @@
This repo shows how to perform fast retraining with LightGBM in different business cases.
In this repo we show we compare two of the fastest boosted decision tree libraries: [XGBoost](https://github.com/dmlc/xgboost) and [LightGBM](https://github.com/microsoft/LightGBM). We will evaluate them across datasets of several domains and different sizes.
## Installation and Setup
The installation instructions can be found [here](./INSTALL.md).
## Project
In the folder [experiments](./experiments) you can find the different experiments of the project.
In the folder [experiments](./experiments) you can find the different experiments of the project. We developed 5 experiments with the CPU versions of the libraries and 2 experiments with the GPU version.
* Airline
* BCI
* Football
* Planet Amazon
* Fraud Detection
* Airline (GPU version)
* HIGGS (GPU version)
In the folder [experiment/libs](./experiment/libs) there is the common code for the project.
@ -12,62 +26,3 @@ In the folder [experiment/libs](./experiment/libs) there is the common code for
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
### Setup
Instruction for setting up on a Linux Microsoft DSVM
Clone this repo to your desired location
```bash
git clone https://github.com/Azure/fast_retraining.git
```
Create a conda environment if you haven't already done so. The command below creates a python 3 environment called sbsa.
```bash
conda create --name strata python=3 anaconda
```
Edit [activate_env_vars.sh](environment/activate_env_vars.sh ) and [deactivate_env_vars.sh](environment/deactivate_env_vars.sh )
so that they contain the correct information.
Install command line jason parser
```bash
apt-get install jq
```
Activate the conda environment
```bash
source activate strata
```
Get the currently activated environment and assign it to env_path.
Get info of current env and output to json | look for default_prefix element in JSON | remove all quotes
```bash
env_path=$(conda info --json | jq '.default_prefix' | tr -d '"')
```
Make sure you are in the environemnt folder of the project and run the following
```bash
activate_script_path=$(readlink -f activate_env_vars.sh)
deactivate_script_path=$(readlink -f deactivate_env_vars.sh)
```
Then we create the activation and deactivation scripts and make sure they point to our now modified activation
and deactivation scripts in our environment folder
```bash
mkdir -p $env_path/etc/conda/activate.d
mkdir -p $env_path/etc/conda/deactivate.d
echo 'source '$activate_script_path >> $env_path/etc/conda/activate.d/env_vars.sh
echo 'source '$deactivate_script_path >> $env_path/etc/conda/deactivate.d/env_vars.sh
```
Exit the environment
```bash
source deactivate
```
Enter the environment again
```bash
source activate strata
```

Просмотреть файл

@ -13,8 +13,6 @@ export PYTHONPATH=$PYTHONPATH:$PROJECTPATH # Adds the repository to the python p
export OLD_PATH=$PATH
export PATH=$PATH:PROJECTPATH
export MOUNT_POINT=/fileshare # The mounting location for the data
export CACHE_DIR=$MOUNT_POINT'/cache'
export STORAGE_URI=//migonzadldsvm2storage.file.core.windows.net/strata-fileshare
# The mounting location for the data
export MOUNT_POINT=/fileshare
echo Me Gusta!

Просмотреть файл

@ -4,6 +4,4 @@ export PYTHONPATH=$OLD_PYTHON_PATH
export PATH=$OLD_PATH
export MOUNT_POINT=
export CACHE_DIR=
export STORAGE_URI=
export PROJECTPATH=
echo Noooooooooooooooo