|
@ -1,5 +1,10 @@
|
|||
# Multiverso Torch/Lua Binding
|
||||
|
||||
## Introduction
|
||||
Multiverso is a parameter server framework for distributed machine learning.
|
||||
This package can leverage multiple machines and GPUs to speed up the torch
|
||||
programs.
|
||||
|
||||
## Requirements
|
||||
Build multiverso successfully by following the [README > build](../../README.md#build).
|
||||
|
||||
|
|
|
@ -30,13 +30,11 @@ Perform CIFAR-10 classification with torch resnet implementation.
|
|||
|
||||
## Results
|
||||
|
||||
| Code Name | #Process | #GPU per Process | Use multiverso | Seconds per epoch | Best Model |
|
||||
| :-------: | :------: | :--------------: | :------------: | :---------------: | :-------------------: |
|
||||
| 1P1G0M | 1 | 1 | 0 | 20.366 | top1:7.288 top5:0.218 |
|
||||
| 1P4G0M | 1 | 4 | 0 | 10.045 | top1:7.704 top5:0.289 |
|
||||
| 4P1G1M | 4 | 1 | 1 | 6.303 | top1:8.610 top5:0.273 |
|
||||
| Code Name | #Process(es) | #GPU(s) per Process | Use multiverso | Seconds per epoch | Best Model |
|
||||
| :-------: | :----------: | :-----------------: | :------------: | :---------------: | :--------: |
|
||||
| 1P1G0M | 1 | 1 | 0 | 20.366 | 92.712 % |
|
||||
| 1P4G0M | 1 | 4 | 0 | 10.045 | 92.296 % |
|
||||
| 4P1G1M | 4 | 1 | 1 | 6.303 | 91.390 % |
|
||||
|
||||
![top1error_vs_epoch](./imgs/top1error_vs_epoch.png)
|
||||
![top5error_vs_epoch](./imgs/top5error_vs_epoch.png)
|
||||
![top1error_vs_runningtime](./imgs/top1error_vs_runningtime.png)
|
||||
![top5error_vs_runningtime](./imgs/top5error_vs_runningtime.png)
|
||||
|
|
Двоичные данные
binding/lua/docs/imgs/top1error_vs_epoch.png
До Ширина: | Высота: | Размер: 39 KiB После Ширина: | Высота: | Размер: 92 KiB |
Двоичные данные
binding/lua/docs/imgs/top1error_vs_runningtime.png
До Ширина: | Высота: | Размер: 31 KiB После Ширина: | Высота: | Размер: 75 KiB |
|
@ -1,19 +1,26 @@
|
|||
# Installation
|
||||
|
||||
## On linux
|
||||
Please follow the [README](../../README.md#build) to build and install multiverso.
|
||||
|
||||
## On windows
|
||||
You need MSBuild.exe installed and make sure your system can find it in the $PATH. Then you should run [build_dll.bat](../../src/build_dll.bat) to build the .dll file and install the .dll. Multiverso doesn't have auto-installer for windows now, you have to copy the .dll to either system $PATH or the multiverso package folder.
|
||||
# Multiverso Torch/Lua Binding
|
||||
|
||||
|
||||
# Run Unit Tests
|
||||
## Introduction
|
||||
Multiverso is a parameter server framework for distributed machine learning. This package can leverage multiple machines and GPUs to speed up the python programs.
|
||||
|
||||
|
||||
## Installation
|
||||
|
||||
1. (For GPU support only) Install CUDA, cuDNN according to this [guide](https://github.com/Microsoft/fb.resnet.torch/blob/multiverso/INSTALL.md). You just need finish the steps before [Install Torch](https://github.com/Microsoft/fb.resnet.torch/blob/multiverso/INSTALL.md#install-torch).
|
||||
1. Install the multiverso
|
||||
* On linux: Please follow the [README](../../README.md#build) to build and install multiverso.
|
||||
* On windows: You need MSBuild.exe installed and make sure your system can find it in the $PATH. Then you should run [build_dll.bat](../../src/build_dll.bat) to build the .dll file and install the .dll. There isn't auto-installer for windows now, so you have to copy the .dll to either system $PATH or the multiverso package folder.
|
||||
1. Install python binding with the command `sudo python setup.py install`
|
||||
|
||||
|
||||
## Run Unit Tests
|
||||
```
|
||||
nosetests
|
||||
```
|
||||
|
||||
|
||||
# Documentation
|
||||
* [Experiments](./docs/EXPERIMENTS.md)
|
||||
## Documentation
|
||||
* [Tutorial](./docs/TUTORIAL.md)
|
||||
* Api documents are written as docstrings in the python source code.
|
||||
* [Benchmark](./docs/BENCHMARK.md)
|
||||
|
|
|
@ -1,11 +1,11 @@
|
|||
# Experiments
|
||||
|
||||
## Codebase
|
||||
[Deep_Residual_Learning_CIFAR-10](../examples/theano/lasagne/Deep_Residual_Learning_CIFAR-10.py)
|
||||
# Multiverso Python Binding Benchmark
|
||||
|
||||
## Task Description
|
||||
Perform CIFAR-10 classification with residual networks implementation based on Lasagne.
|
||||
|
||||
## Codebase
|
||||
[Deep_Residual_Learning_CIFAR-10](../examples/theano/lasagne/Deep_Residual_Learning_CIFAR-10.py)
|
||||
|
||||
## Hardware
|
||||
|||
|
||||
| -------- |:--------:|
|
||||
|
@ -49,13 +49,12 @@ Clarification
|
|||
# The results
|
||||
The results of 4 experiments with different configurations are shown as following.
|
||||
|
||||
|Short Name | With multiverso | Number of Process(es) | Number of GPU(s) | Sync every X minibatches | Best model validation accuracy | Time per epoch / s |
|
||||
|Short Name | # Process(es) | #GPU(s) per Process | Use multiverso | Sync every X minibatches | Seconds per epoch | Best model validation accuracy |
|
||||
| :---- | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: |
|
||||
| 0M-1G | 0 | 1 | 1 | --| 92.61 % | 100.02|
|
||||
|1M-1G-1S | 1 | 1 | 1 | 1 | 92.61 % | 109.78|
|
||||
|1M-4G-1S | 1 | 4 | 4 | 1 | 92.15 % | 29.38|
|
||||
|1M-4G-3S | 1 | 4 | 4 | 3 | 89.61 % | 27.46|
|
||||
| 1P1G0M | 1 | 1 | 0 | -- | 100.02 | 92.61 % |
|
||||
|1P1G1M1S | 1 | 1 | 1 | 1 | 109.78 | 93.03 % |
|
||||
|4P1G1M1S | 4 | 1 | 1 | 1 | 29.38 | 92.15 % |
|
||||
|4P1G1M3S | 4 | 1 | 1 | 3 | 27.46 | 89.61 % |
|
||||
|
||||
![accuracy_epoch](./imgs/accuracy_epoch.png)
|
||||
![accuracy_time](./imgs/accuracy_time.png)
|
||||
![time_epoch](./imgs/time_epoch.png)
|
Двоичные данные
binding/python/docs/imgs/accuracy_epoch.png
До Ширина: | Высота: | Размер: 66 KiB После Ширина: | Высота: | Размер: 75 KiB |
Двоичные данные
binding/python/docs/imgs/accuracy_time.png
До Ширина: | Высота: | Размер: 66 KiB После Ширина: | Высота: | Размер: 64 KiB |
Двоичные данные
binding/python/docs/imgs/time_epoch.png
До Ширина: | Высота: | Размер: 51 KiB |