* Base Reorg

* Readme

* Reorgs

* Reorg stuff

* Reorg stuff

* Revert some fast changes

* Update 00_emi_lstm_example.ipynb

* Update 01_emi_fastgrnn_example.ipynb

* Update 02_emi_lstm_initialization_and_restoring.ipynb

* Dependenceies

* Dependencies

* Requirments

* Paper

* NIPS

* Rname c++ to cpp

* Update README.md

* Update README.md
This commit is contained in:
Harsha Vardhan Simhadri 2018-09-05 14:41:13 +05:30 коммит произвёл GitHub
Родитель b18d9da92a
Коммит efc4ad2be3
1694 изменённых файлов: 172 добавлений и 120 удалений

121
README.md
Просмотреть файл

@ -2,114 +2,31 @@
This repository provides code for machine learning algorithms for edge devices developed at [Microsoft Research India](https://www.microsoft.com/en-us/research/project/resource-efficient-ml-for-the-edge-and-endpoint-iot-devices/).
Machine learning models for edge devices need to have a small footprint in terms of storage, prediction latency and energy. One example of a ubiquitous real-world application where such models are desirable is resource-scarce devices and sensors in the Internet of Things (IoT) setting. Making real-time predictions locally on IoT devices without connecting to the cloud requires models that fit in a few kilobytes.
Machine learning models for edge devices need to have a small footprint in terms of storage, prediction latency, and energy. One example of a ubiquitous real-world application where such models are desirable is resource-scarce devices and sensors in the Internet of Things (IoT) setting. Making real-time predictions locally on IoT devices without connecting to the cloud requires models that fit in a few kilobytes.
This repository contains two such algorithms **Bonsai** and **ProtoNN** that shine in this setting. These algorithms can train models for classical supervised learning problems with memory requirements that are orders of magnitude lower than other modern ML algorithms. The trained models can be loaded onto edge devices such as IoT devices/sensors, and used to make fast and accurate predictions completely offline.
For details, please see our [wiki page](https://github.com/Microsoft/EdgeML/wiki/) and our ICML'17 publications on [Bonsai](publications/Bonsai.pdf) and [ProtoNN](publications/ProtoNN.pdf) algorithms.
This repository contains algorithms that shine in this setting in terms of both model size and compute, namely:
- **Bonsai**: Strong and shallow non-linear tree based classifier.
- **ProtoNN**: **Proto**type based k-nearest neighbors (k**NN**) classifier.
- **EMI-RNN**: Training routine to recover the critical signature from time series data for faster and accurate RNN predictions.
- **Fast(G)RNN**: **F**ast, **A**ccurate, **S**table and **T**iny (**G**ated) RNN cells.
Initial Code Contributors: [Chirag Gupta](https://aigen.github.io/), [Aditya Kusupati](https://adityakusupati.github.io/), [Ashish Kumar](https://ashishkumar1993.github.io/), and [Harsha Simhadri](http://harsha-simhadri.org).
These algorithms can train models for classical supervised learning problems with memory requirements that are orders of magnitude lower than other modern ML algorithms. The trained models can be loaded onto edge devices such as IoT devices/sensors, and used to make fast and accurate predictions completely offline.
We welcome contributions, comments and criticism. For questions, please [email Harsha](mailto:harshasi@microsoft.com).
The tf directrory contains code, examples and scripts for all these algorithms in TensorFlow. The cpp directory has training and inference code for Bonsai and ProtoNN algorithms in C++. Please see install/run instruction in the Readme pages within these directories.
For details, please see our [wiki page](https://github.com/Microsoft/EdgeML/wiki/) and our ICML'17 publications on [Bonsai](docs/publications/Bonsai.pdf) and [ProtoNN](docs/publications/ProtoNN.pdf) algorithms, NIPS'18 publications on [EMI-RNN](docs/publications/EMI-RNN.pdf) and [Fast(G)RNN](docs/publications/FastGRNN.pdf).
Core Contributors:
- [Aditya Kusupati](https://adityakusupati.github.io/)
- [Ashish Kumar](https://ashishkumar1993.github.io/)
- [Chirag Gupta](https://aigen.github.io/)
- [Don Dennis](https://dkdennis.xyz)
- [Harsha Vardhan Simhadri](http://harsha-simhadri.org)
We welcome contributions, comments, and criticism. For questions, please [email Harsha](mailto:harshasi@microsoft.com).
[People](https://github.com/Microsoft/EdgeML/wiki/People/) who have contributed to this [project](https://www.microsoft.com/en-us/research/project/resource-efficient-ml-for-the-edge-and-endpoint-iot-devices/).
### Requirements
* Linux:
* gcc version 5.4. Other gcc versions above 5.0 could also work.
* We developed the code on Ubuntu 16.04LTS. Other linux versions could also work.
* You can either use the Makefile in the root, or cmake via the build directory (see below).
* Windows 10:
* Visual Studio 2015. Use cmake (see below).
* For Anniversary Update or later, one can use the Windows Subsystem for Linux, and the instructions for Linux build.
* On both Linux and Windows 10, you need an implementation of BLAS, sparseBLAS and vector math calls.
We link with the implementation provided by the [Intel(R) Math Kernel Library](https://software.intel.com/en-us/mkl).
Please download later versions (2017v3+) of MKL as far as possible.
The code can be made to work with other math libraries with a few modifications.
### Building using Makefile
After cloning this repository, set compiler and flags appropriately in `config.mk`. Then execute the following in bash:
```bash
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<MKL_PATH>:<EDGEML_ROOT>
make -Bj
```
Typically, MKL_PATH = /opt/intel/mkl/lib/intel64_lin/, and EDGEML_ROOT is '.'.
This will build four executables _BonsaiTrain_, _BonsaiPredict_, _ProtoNNTrain_ and _ProtoNNPredict_ in <EDGEML_ROOT>.
Sample data to try these executables is not included in this repository, but instructions to do so are given below.
### Building using CMake
For Linux, in the <EDGEML_ROOT> directory:
```bash
mkdir build
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<MKL_PATH>
cd build
cmake ..
make -Bj
```
For Windows 10, in the <EDGEML_ROOT> directory, modify `CMakeLists.txt` file to change <MKL_ROOT> by changing the
line
```set(MKL_ROOT "<MKL_ROOT>")```
Then, generate Visual Studio 2015 solution using:
```mkdir build
cd build
cmake -G "Visual Studio 14 2015 Win64" -DCMAKE_BUILD_TYPE=Release ..
```
Finally, open `EdgeML.sln` in VS2015, build and run.
For both Linux and Windows10, cmake builds will generate four executables _BonsaiTrain_, _BonsaiPredict_, _ProtoNNTrain_ and _ProtoNNPredict_ in <EDGEML_ROOT>.
### Download a sample dataset
Follow the bash commands given below to download a sample dataset, USPS10, to the root of the repository. Bonsai and ProtoNN come with sample scripts to run on the usps10 dataset. EDGEML_ROOT is defined in the previous section.
```bash
cd <EDGEML_ROOT>
mkdir usps10
cd usps10
wget http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/usps.bz2
wget http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/usps.t.bz2
bzip2 -d usps.bz2
bzip2 -d usps.t.bz2
mv usps train.txt
mv usps.t test.txt
mkdir ProtoNNResults
cd <EDGEML_ROOT>
```
This will create a sample train and test dataset, on which
you can train and test Bonsai and ProtoNN algorithms. As specified, we create an output folder for ProtoNN. Bonsai on the other hand creates its own output folder.
For instructions to actually run the algorithms, see [Bonsai Readme](docs/README_BONSAI_OSS.md) and [ProtoNN Readme](docs/README_PROTONN_OSS.ipynb).
### Makefile flags
You could change the behavior of the code by setting these flags in `config.mk` and rebuilding with `make -Bj` when building with the default Makefile in <EDGEML_ROOT>. When building with CMake, change these flags in `CMakeLists.txt` in <EDGEML_ROOT>. All these flags can be set for both ProtoNN and Bonsai.
The following are supported currently by both ProtoNN and Bonsai.
SINGLE/DOUBLE: Single/Double precision floating-point. Single is most often sufficient. Double might help with reproducibility.
ZERO_BASED_IO: Read datasets with 0-based labels and indices instead of the default 1-based.
TIMER: Timer logs. Print running time of various calls.
CONCISE: To be used with TIMER to limit the information printed to those deltas above a threshold.
The following currently only change the behavior of ProtoNN, but one can write corresponding code for Bonsai.
LOGGER: Debugging logs. Currently prints min, max and norm of matrices.
LIGHT_LOGGER: Less verbose version of LOGGER. Can be used to track call flow.
XML: Enable training with large sparse datasets with many labels. This is in beta.
VERBOSE: Print additional informative output to stdout.
DUMP: Dump models after each optimization iteration instead of just in the end.
VERIFY: Legacy verification code for comparison with Matlab version.
Additionally, there is one of two flags that has to be set in the Makefile:
MKL_PAR_LDFLAGS: Linking with parallel version of MKL.
MKL_SEQ_LDFLAGS: Linking with sequential version of MKL.
### Microsoft Open Source Code of Conduct
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.

Просмотреть файл

Просмотреть файл

Просмотреть файл

107
cpp/README.md Normal file
Просмотреть файл

@ -0,0 +1,107 @@
## Edge Machine Learning - C++ Library
This library consists of two machine learning algortihms **Bonsai** and **ProtoNN** implemented in C++ for speed and scalability.
### Requirements
* Linux:
* gcc version 5.4. Other gcc versions above 5.0 could also work.
* We developed the code on Ubuntu 16.04LTS. Other linux versions could also work.
* You can either use the Makefile in the root, or cmake via the build directory (see below).
* Windows 10:
* Visual Studio 2015. Use cmake (see below).
* For Anniversary Update or later, one can use the Windows Subsystem for Linux, and the instructions for Linux build.
* On both Linux and Windows 10, you need an implementation of BLAS, sparseBLAS and vector math calls.
We link with the implementation provided by the [Intel(R) Math Kernel Library](https://software.intel.com/en-us/mkl).
Please download later versions (2017v3+) of MKL as far as possible.
The code can be made to work with other math libraries with a few modifications.
### Building using Makefile
After cloning this repository, set compiler and flags appropriately in `config.mk`. Then execute the following in bash:
```bash
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<MKL_PATH>:<EDGEML_ROOT>
make -Bj
```
Typically, MKL_PATH = /opt/intel/mkl/lib/intel64_lin/, and EDGEML_ROOT is '.'.
This will build four executables _BonsaiTrain_, _BonsaiPredict_, _ProtoNNTrain_ and _ProtoNNPredict_ in <EDGEML_ROOT>.
Sample data to try these executables is not included in this repository, but instructions to do so are given below.
### Building using CMake
For Linux, in the <EDGEML_ROOT> directory:
```bash
mkdir build
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<MKL_PATH>
cd build
cmake ..
make -Bj
```
For Windows 10, in the <EDGEML_ROOT> directory, modify `CMakeLists.txt` file to change <MKL_ROOT> by changing the
line
```set(MKL_ROOT "<MKL_ROOT>")```
Then, generate Visual Studio 2015 solution using:
```mkdir build
cd build
cmake -G "Visual Studio 14 2015 Win64" -DCMAKE_BUILD_TYPE=Release ..
```
Finally, open `EdgeML.sln` in VS2015, build and run.
For both Linux and Windows10, cmake builds will generate four executables _BonsaiTrain_, _BonsaiPredict_, _ProtoNNTrain_ and _ProtoNNPredict_ in <EDGEML_ROOT>.
### Download a sample dataset
Follow the bash commands given below to download a sample dataset, USPS10, to the root of the repository. Bonsai and ProtoNN come with sample scripts to run on the usps10 dataset. EDGEML_ROOT is defined in the previous section.
```bash
cd <EDGEML_ROOT>
mkdir usps10
cd usps10
wget http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/usps.bz2
wget http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/usps.t.bz2
bzip2 -d usps.bz2
bzip2 -d usps.t.bz2
mv usps train.txt
mv usps.t test.txt
mkdir ProtoNNResults
cd <EDGEML_ROOT>
```
This will create a sample train and test dataset, on which
you can train and test Bonsai and ProtoNN algorithms. As specified, we create an output folder for ProtoNN. Bonsai on the other hand creates its own output folder.
For instructions to actually run the algorithms, see [Bonsai Readme](docs/README_BONSAI_OSS.md) and [ProtoNN Readme](docs/README_PROTONN_OSS.ipynb).
### Makefile flags
You could change the behavior of the code by setting these flags in `config.mk` and rebuilding with `make -Bj` when building with the default Makefile in <EDGEML_ROOT>. When building with CMake, change these flags in `CMakeLists.txt` in <EDGEML_ROOT>. All these flags can be set for both ProtoNN and Bonsai.
The following are supported currently by both ProtoNN and Bonsai.
SINGLE/DOUBLE: Single/Double precision floating-point. Single is most often sufficient. Double might help with reproducibility.
ZERO_BASED_IO: Read datasets with 0-based labels and indices instead of the default 1-based.
TIMER: Timer logs. Print running time of various calls.
CONCISE: To be used with TIMER to limit the information printed to those deltas above a threshold.
The following currently only change the behavior of ProtoNN, but one can write corresponding code for Bonsai.
LOGGER: Debugging logs. Currently prints min, max and norm of matrices.
LIGHT_LOGGER: Less verbose version of LOGGER. Can be used to track call flow.
XML: Enable training with large sparse datasets with many labels. This is in beta.
VERBOSE: Print additional informative output to stdout.
DUMP: Dump models after each optimization iteration instead of just in the end.
VERIFY: Legacy verification code for comparison with Matlab version.
Additionally, there is one of two flags that has to be set in the Makefile:
MKL_PAR_LDFLAGS: Linking with parallel version of MKL.
MKL_SEQ_LDFLAGS: Linking with sequential version of MKL.
### Microsoft Open Source Code of Conduct
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT license.

Просмотреть файл

Просмотреть файл

Просмотреть файл

@ -1,6 +1,6 @@
# Bonsai
[Bonsai](publications/Bonsai.pdf) is a novel tree based algorithm for efficient prediction on IoT devices – such as those based on the Arduino Uno board having an 8 bit ATmega328P microcontroller operating at 16 MHz with no native floating point support, 2 KB RAM and 32 KB read-only flash.
[Bonsai](../../docs/publications/Bonsai.pdf) is a novel tree based algorithm for efficient prediction on IoT devices – such as those based on the Arduino Uno board having an 8 bit ATmega328P microcontroller operating at 16 MHz with no native floating point support, 2 KB RAM and 32 KB read-only flash.
Bonsai maintains prediction accuracy while minimizing model size and prediction costs by:

Просмотреть файл

Просмотреть файл

@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# ProtoNN: Compressed and accurate KNN for resource-constrained devices ([paper](publications/ProtoNN.pdf))\n",
"# ProtoNN: Compressed and accurate KNN for resource-constrained devices ([paper](../../docs/publications/ProtoNN.pdf))\n",
"ProtoNN is an algorithm developed for binary, multiclass and multilabel supervised learning. ProtoNN models are time and memory efficient and are thus ideal for resource-constrained scenarios like Internet of Things (IoT). \n",
"\n",
"## Overview of algorithm\n",
@ -123,7 +123,7 @@
"- **-O**: Folder to store output (see output section below). \n",
"- **-F**: Input format for data (same as described in training section).\n",
"- **-e**: Number of test points.\n",
"- **-b**: [Optional] If unspecified, testing happens for each data-point separately (to simulate a real-world scenario). For faster prediction when prototyping, use the parameter to specify a batch on which prediction happens in one go. \n",
"- **-b**: [Optional] If unspecified, testing happens for each data-point separately (to simulate a real-world scenario). For faster prediction when prototyping, use the parameter to specify a batch on which prediction happens in one go. \n",
"\t\n",
"\n",
"\n",
@ -135,7 +135,6 @@
"\n",
"## Interpreting the output\n",
"##### Output of ProtoNNTrainer:\n",
"- The following information is printed to **std::cout**: \n",
" - The chosen value of $\\gamma$.\n",
" - **Training, testing accuracy, and training objective value**, thrice for each iteration, once after optimizing each parameter. For multilabel problems, **prec@1** is output instead. \n",
@ -158,7 +157,6 @@
"\n",
"##### Output of ProtoNNPredictor:\n",
"On execution, the test accuracy, or precision@1,3,5 will be output to stdout. Additionally, a folder will be created in the output directory that will indicate to the user the list of parameters with which the model model to be tested was trained. In this folder, there will be one file detailedPrediction. This file contains for each test point the true labels of that point as well as the scores of the top 5 predicted labels. \n",
"\n",
"## Choosing hyperparameters\n",
"##### Model size as a function of hyperparameters\n",
@ -176,7 +174,7 @@
"Suppose each value is a single-precision floating point (4 bytes), then the total space required by ProtoNN is $4\\cdot(S_W + S_B + S_Z)$. This value is computed and output to screen on running ProtoNN. \n",
"\n",
"##### Pointers on choosing hyperparameters\n",
"Choosing the right hyperparameters may seem to be a daunting task in the beginning but becomes much easier with a little bit of thought. To get an idea of default parameters on some sample datasets, see the ([paper](publications/ProtoNN.pdf)). Few rules of thumb:\n",
"Choosing the right hyperparameters may seem to be a daunting task in the beginning but becomes much easier with a little bit of thought. To get an idea of default parameters on some sample datasets, see the ([paper](../../docs/publications/ProtoNN.pdf)). Few rules of thumb:\n",
"- $S_B$ is typically small, and hence $\\lambda_B \\approx 1.0$. \n",
"- One can set $m$ to $min(10\\cdot L, 0.01\\cdot numTrainingPoints)$, and $d$ to $15$ for an initial experiment. Typically, you want to cross-validate for $m$ and $d$. \n",
"- Depending on $L$ and $D$, $S_W$ or $S_Z$ is the biggest contributors to model size. $\\lambda_W$ and $\\lambda_Z$ can be adjusted accordingly or cross-validated for. \n",
@ -218,7 +216,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
"version": "3.5.5"
}
},
"nbformat": 4,

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

Некоторые файлы не были показаны из-за слишком большого количества измененных файлов Показать больше