Back-merge documentation and fixes

format installation docs, add links
Add hdf5 requirements to 10.9 notes, drop cmake (not linked)
fix im2col height/width bound check bug (issue #284 identified by @kmatzen)
strip confusing confusing comment about shuffling files
add /etc/rc.local hint for boot configuration of gpus
Include k40 images per day benchmark
drop caffe presentation in favor of dropbox link
make build_docs.sh script work from anywhere
proofread, fix dead link, standardize NVIDIA capitalization
Added Link in index.md to perfomance_hardware.md
Added Performance and Hardware Tips
imagenet fix: ilvsrc -> ilsvrc
This commit is contained in:
Evan Shelhamer 2014-04-07 22:13:13 -07:00
Родитель 9f638cbb66 b3cd950dd4
Коммит efacd98413
9 изменённых файлов: 94 добавлений и 36 удалений

Просмотреть файл

@ -12,8 +12,8 @@ parameters in the code. Python and Matlab wrappers are provided.
At the same time, Caffe fits industry needs, with blazing fast C++/Cuda code for
GPU computation. Caffe is currently the fastest GPU CNN implementation publicly
available, and is able to process more than **20 million images per day** on a
single Tesla K20 machine \*.
available, and is able to process more than **40 million images per day** on a
single NVIDIA K40 GPU (or 20 million per day on a K20)\*.
Caffe also provides **seamless switching between CPU and GPU**, which allows one
to train models with fast GPUs and then deploy them on non-GPU clusters with one

Двоичные данные
docs/caffe-presentation.pdf

Двоичный файл не отображается.

Просмотреть файл

@ -16,10 +16,10 @@ Caffe aims to provide computer vision scientists and practitioners with a **clea
For example, network structure is easily specified in separate config files, with no mess of hard-coded parameters in the code.
At the same time, Caffe fits industry needs, with blazing fast C++/CUDA code for GPU computation.
Caffe is currently the fastest GPU CNN implementation publicly available, and is able to process more than **20 million images per day** on a single Tesla K20 machine \*.
Caffe is currently the fastest GPU CNN implementation publicly available, and is able to process more than **40 million images per day** with a single NVIDIA K40 or Titan GPU (or 20 million images per day on a K20 GPU)\*. That's 192 images per second during training and 500 images per second during test.
Caffe also provides **seamless switching between CPU and GPU**, which allows one to train models with fast GPUs and then deploy them on non-GPU clusters with one line of code: `Caffe::set_mode(Caffe::CPU)`.
Even in CPU mode, computing predictions on an image takes only 20 ms when images are processed in batch mode.
Even in CPU mode, computing predictions on an image takes only 20 ms when images are processed in batch mode. While in GPU mode, computing predictions on an image takes only 2 ms when images are processed in batch mode.
## Documentation
@ -55,7 +55,7 @@ Please kindly cite Caffe in your publications if it helps your research:
### Acknowledgements
Yangqing would like to thank the NVidia Academic program for providing K20 GPUs, and [Oriol Vinyals](http://www1.icsi.berkeley.edu/~vinyals/) for various discussions along the journey.
Yangqing would like to thank the NVIDIA Academic program for providing K20 GPUs, and [Oriol Vinyals](http://www1.icsi.berkeley.edu/~vinyals/) for various discussions along the journey.
A core set of BVLC members have contributed lots of new functionality and fixes since the original release (alphabetical by first name):
@ -74,4 +74,4 @@ If you'd like to contribute, read [this](development.html).
---
\*: When measured with the [SuperVision](http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf) model that won the ImageNet Large Scale Visual Recognition Challenge 2012.
\*: When measured with the [SuperVision](http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf) model that won the ImageNet Large Scale Visual Recognition Challenge 2012. See [performance and hardware configuration details](/performance_hardware.html).

Просмотреть файл

@ -27,21 +27,21 @@ The following sections detail prerequisites and installation on Ubuntu. For OS X
## Prerequisites
* CUDA (5.0 or 5.5)
* Boost
* MKL (but see the [boost-eigen branch](https://github.com/BVLC/caffe/tree/boost-eigen) for a boost/Eigen3 port)
* OpenCV
* [CUDA](https://developer.nvidia.com/cuda-zone) 5.0 or 5.5
* [boost](http://www.boost.org/) (1.55 preferred)
* [BLAS](http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) by [MKL](http://software.intel.com/en-us/intel-mkl) (though the `dev` branch supports ATLAS as an alternative)
* [OpenCV](http://opencv.org/)
* glog, gflags, protobuf, leveldb, snappy, hdf5
* For the Python wrapper: python, numpy (>= 1.7 preferred), and boost_python
* For the Matlab wrapper: Matlab with mex
* For the python wrapper: python, numpy (>= 1.7 preferred), and boost_python
* For the MATLAB wrapper: MATLAB with mex
Caffe requires the CUDA NVCC compiler to compile its GPU code. To install CUDA, go to the [NVidia CUDA website](https://developer.nvidia.com/cuda-downloads) and follow installation instructions there. Caffe is verified to compile with both CUDA 5.0 and 5.5.
**CUDA**: Caffe requires the CUDA NVCC compiler to compile its GPU code. To install CUDA, go to the [NVIDIA CUDA website](https://developer.nvidia.com/cuda-downloads) and follow installation instructions there. Caffe compiles with both CUDA 5.0 and 5.5.
N.B. one can install the CUDA libraries without the CUDA driver in order to build and run Caffe in CPU-only mode.
Caffe also needs Intel MKL as the backend of its matrix computation and vectorized computations. We are in the process of removing MKL dependency, but for now you will need to have an MKL installation. You can obtain a [trial license](http://software.intel.com/en-us/intel-mkl) or an [academic license](http://software.intel.com/en-us/intel-education-offerings) (if you are a student).
**MKL**: Caffe needs Intel MKL as the backend of its matrix and vector computations. We are working on support for alternative BLAS libraries, but for now you need to have MKL. You can obtain a [trial license](http://software.intel.com/en-us/intel-mkl) or an [academic license](http://software.intel.com/en-us/intel-education-offerings) (if you are a student).
You will also need other packages, most of which can be installed via apt-get using:
**The Rest**: you will also need other packages, most of which can be installed via apt-get using:
sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev
@ -56,28 +56,27 @@ The only exception being the google logging library, which does not exist in the
./configure
make && make install
If you would like to compile the Python wrapper, you will need to install python, numpy and boost_python. You can either compile them from scratch or use a pre-packaged solution like [Anaconda](https://store.continuum.io/cshop/anaconda/) or [Enthought Canopy](https://www.enthought.com/products/canopy/). Note that if you use the Ubuntu default python, you will need to apt-install the `python-dev` package to have the python headers. You can install any remaining dependencies with
**Python**: If you would like to have the python wrapper, install python, numpy and boost_python. You can either compile them from scratch or use a pre-packaged solution like [Anaconda](https://store.continuum.io/cshop/anaconda/) or [Enthought Canopy](https://www.enthought.com/products/canopy/). Note that if you use the Ubuntu default python, you will need to apt-install the `python-dev` package to have the python headers. You can install any remaining dependencies with
pip install -r /path/to/caffe/python/requirements.txt
If you would like to compile the Matlab wrapper, you will need to install Matlab.
**MATLAB**: if you would like to have the MATLAB wrapper, install MATLAB with the mex compiler.
After setting all the prerequisites, you should modify the `Makefile.config` file and change the paths to those on your computer.
Now that you have the prerequisites, edit your `Makefile.config` to change the paths for your setup.
## Compilation
After installing the prerequisites, simply do `make all` to compile Caffe. If you would like to compile the Python and Matlab wrappers, do
With the prerequisites installed, do `make all` to compile Caffe.
make pycaffe
make matcaffe
To compile the python and MATLAB wrappers do `make pycaffe` and `make matcaffe` respectively.
For a faster build, compile in parallel by doing `make all -j8` where 8 is the number of parallel threads for compilation. A good choice for the number of threads is the number of cores in your machine.
*Distribution*: run `make distribute` to create a `distribute` directory with all the Caffe headers, compiled libraries, binaries, etc. needed for distribution to other machines.
Optionally, you can run `make distribute` to create a `distribute` directory that contains all the necessary files, including the headers, compiled shared libraries, and binary files that you can distribute over different machines.
*Speed*: for a faster build, compile in parallel by doing `make all -j8` where 8 is the number of parallel threads for compilation (a good choice for the number of threads is the number of cores in your machine).
To use Caffe with python, you will need to add `/path/to/caffe/python` or `/path/to/caffe/build/python` to your `PYTHONPATH`.
*Python Module*: for python support, you must add the compiled module to your `PYTHONPATH` (as `/path/to/caffe/python` or the like).
Now that you have compiled Caffe, check out the [MNIST demo](mnist.html) and the pretrained [ImageNet example](imagenet.html).
Now that you have installed Caffe, check out the [MNIST demo](mnist.html) and the pretrained [ImageNet example](imagenet.html).
## OS X Installation
@ -94,7 +93,7 @@ Install [homebrew](http://brew.sh/) to install most of the prerequisites. Starti
brew install homebrew/science/hdf5
brew install homebrew/science/opencv
Building boost from source is needed to link against your local python.
Building boost from source is needed to link against your local python (exceptions might be raised during some OS X installs, but ignore these and continue).
If using homebrew python, python packages like `numpy` and `scipy` are best installed by doing `brew tap homebrew/python`, and then installing them with homebrew.
#### 10.9 additional notes
@ -124,7 +123,7 @@ For each package that you install through homebrew do the following:
The prerequisite homebrew formulae are
cmake boost snappy leveldb protobuf gflags glog homebrew/science/opencv
boost snappy leveldb protobuf gflags glog szip homebrew/science/hdf5 homebrew/science/opencv
so follow steps 1-4 for each.
@ -132,7 +131,7 @@ After this the rest of the installation is the same as under 10.8, as long as `c
### CUDA and MKL
CUDA and MKL are very straightforward to install; download from the NVIDIA and Intel websites.
CUDA and MKL are straightforward to install; download from the NVIDIA and Intel links under "Prerequisites."
### Compiling Caffe

Просмотреть файл

@ -0,0 +1,57 @@
---
layout: default
title: Caffe
---
# Performance and Hardware Configuration
To measure performance on different NVIDIA GPUs we use the Caffe reference ImageNet model.
For training, each time point is 20 iterations/minibatches of 256 images for 5,120 images total. For testing, a 50,000 image validation set is classified.
**Acknowledgements**: BVLC members are very grateful to NVIDIA for providing several GPUs to conduct this research.
## NVIDIA K40
Performance is best with ECC off and boost clock enabled. While ECC makes a negligible difference in speed, disabling it frees ~1 GB of GPU memory.
Best settings with ECC off and maximum clock speed:
* Training is 26.5 secs / 20 iterations (5,120 images)
* Testing is 100 secs / validation set (50,000 images)
Other settings:
* ECC on, max speed: training 26.7 secs / 20 iterations, test 101 secs / validation set
* ECC on, default speed: training 31 secs / 20 iterations, test 117 secs / validation set
* ECC off, default speed: training 31 secs / 20 iterations, test 118 secs / validation set
### K40 configuration tips
For maximum K40 performance, turn off ECC and boost the clock speed (at your own risk).
To turn off ECC, do
sudo nvidia-smi -i 0 --ecc-config=0 # repeat with -i x for each GPU ID
then reboot.
Set the "persistence" mode of the GPU settings by
sudo nvidia-smi -pm 1
and then set the clock speed with
sudo nvidia-smi -i 0 -ac 3004,875 # repeat with -i x for each GPU ID
but note that this configuration resets across driver reloading / rebooting. Include these commands in a boot script to intialize these settings. For a simple fix, add these commands to `/etc/rc.local` (on Ubuntu).
## NVIDIA Titan
Training: 26.26 secs / 20 iterations (5,120 images).
Testing: 100 secs / validation set (50,000 images).
## NVIDIA K20
Training: 36.0 secs / 20 iterations (5,120 images).
Testing: 133 secs / validation set (50,000 images)

Просмотреть файл

@ -5,7 +5,7 @@ layers {
top: "data"
top: "label"
data_param {
source: "ilvsrc12_train_leveldb"
source: "ilsvrc12_train_leveldb"
mean_file: "../../data/ilsvrc12/imagenet_mean.binaryproto"
batch_size: 256
crop_size: 227

Просмотреть файл

@ -5,7 +5,7 @@ layers {
top: "data"
top: "label"
data_param {
source: "ilvsrc12_val_leveldb"
source: "ilsvrc12_val_leveldb"
mean_file: "../../data/ilsvrc12/imagenet_mean.binaryproto"
batch_size: 50
crop_size: 227

Просмотреть файл

@ -1,8 +1,11 @@
#!/bin/bash
PORT=${1:-4000}
echo "usage: build_docs.sh [port]"
PORT=4000
if [ $# -gt 0 ]; then
PORT=$1
fi
jekyll serve -w -s docs/ -d docs/_site --port $PORT
# Find the docs dir, no matter where the script is called
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd $DIR/../docs
jekyll serve -w -s . -d _site --port=$PORT

Просмотреть файл

@ -9,7 +9,6 @@
// ....
// if the last argument is 1, a random shuffle will be carried out before we
// process the file lines.
// You are responsible for shuffling the files yourself.
#include <glog/logging.h>
#include <leveldb/db.h>