зеркало из https://github.com/microsoft/caffe.git
[wip] vision layers, start convolution
This commit is contained in:
Родитель
d15405a4b0
Коммит
59eaba19ac
|
@ -11,9 +11,60 @@ TODO complete list of layers linking to headings
|
|||
|
||||
### Vision Layers
|
||||
|
||||
* Header: `./include/caffe/vision_layers.hpp`
|
||||
|
||||
Vision layers usually take *images* as input and produce other *images* as output.
|
||||
A typical "image" in the real-world may have one color channel ($c = 1$), as in a grayscale image, or three color channels ($c = 3$) as in an RGB (red, green, blue) image.
|
||||
But in this context, the distinguishing characteristic of an image is its spatial structure: usually an image has some non-trivial height $h > 1$ and width $w > 1$.
|
||||
This 2D geometry naturally lends itself to certain decisions about how to process the input.
|
||||
In particular, most of the vision layers work by applying a particular operation to some region of the input to produce a corresponding region of the output.
|
||||
In contrast, other layers (with few exceptions) ignore the spatial structure of the input, treating it as "one big vector" with dimension $$ c h w $$.
|
||||
|
||||
|
||||
#### Convolution
|
||||
|
||||
`CONVOLUTION`
|
||||
* LayerType: `CONVOLUTION`
|
||||
* CPU implementation: `./src/caffe/layers/convolution_layer.cpp`
|
||||
* CUDA GPU implementation: `./src/caffe/layers/convolution_layer.cu`
|
||||
* Options (`ConvolutionParameter convolution_param`)
|
||||
- Required | `num_output` ($c_o$), the number of filters
|
||||
- Required: `kernel_size` or (`kernel_h`, `kernel_w`), specifies height & width of each filter
|
||||
- Strongly recommended (default `type: 'constant' value: 0`): `weight_filler`
|
||||
- Optional (default `true`): `bias_term`, specifies whether to learn and apply a set of additive biases to the filter outputs
|
||||
- Optional (default 0): `pad` or (`pad_h`, `pad_w`), specifies the number of pixels to (implicitly) add to each side of the input
|
||||
- Optional (default 1): `stride` or (`stride_h`, `stride_w`), specifies the intervals at which to apply the filters to the input
|
||||
- Optional (default 1): `group` ($g$) if $>1$, restricts the connectivity of each filter to a subset of the input. In particular, the input to the $i^{th}$ group of $n_f / g$ filters is the $i^{th}$ group of $c_i / g$ input channels.
|
||||
* Input
|
||||
- $n \times c_i \times h_i \times w_i$ (repeated $K \ge 1$ times)
|
||||
* Output
|
||||
- $n \times c_o \times h_o \times w_o$ (repeated $K$ times)
|
||||
* Sample (as seen in `./examples/imagenet/imagenet_train_val.prototxt`)
|
||||
|
||||
layers {
|
||||
name: "conv1"
|
||||
type: CONVOLUTION
|
||||
bottom: "data"
|
||||
top: "conv1"
|
||||
blobs_lr: 1 # learning rate multiplier for the filters
|
||||
blobs_lr: 2 # learning rate multiplier for the biases
|
||||
weight_decay: 1 # weight decay multiplier for the filters
|
||||
weight_decay: 0 # weight decay multiplier for the biases
|
||||
convolution_param {
|
||||
num_output: 96 # learn 96 filters
|
||||
kernel_size: 11 # each filter is 11x11
|
||||
stride: 4 # step 4 pixels between each filter application
|
||||
weight_filler {
|
||||
type: "gaussian" # initialize the filters from a Gaussian
|
||||
std: 0.01 # distribution with stdev 0.01 (default mean: 0)
|
||||
}
|
||||
bias_filler {
|
||||
type: "constant" # initialize the biases to zero (0)
|
||||
value: 0
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
The `CONVOLUTION` layer convolves the input image with a set of learnable filters, each producing one feature map in the output image.
|
||||
|
||||
#### Pooling**
|
||||
|
||||
|
|
Загрузка…
Ссылка в новой задаче