2.7 KiB
title |
---|
Loss |
Loss
In Caffe, as in most of machine learning, learning is driven by a loss function (also known as an error, cost, or objective function). A loss function specifies the goal of learning by mapping parameter settings (i.e., the current network weights) to a scalar value specifying the "badness" of these parameter settings. Hence, the goal of learning is to find a setting of the weights that minimizes the loss function.
The loss in Caffe is computed by the Forward pass of the network.
Each layer takes a set of input (bottom
) blobs and produces a set of output (top
) blobs.
Some of these layers' outputs may be used in the loss function.
A typical choice of loss function for one-versus-all classification tasks is the SOFTMAX_LOSS
function, used in a network definition as follows, for example:
layers {
name: "loss"
type: SOFTMAX_LOSS
bottom: "pred"
bottom: "label"
top: "loss"
}
In a SOFTMAX_LOSS
function, the top
blob is a scalar (dimensions $1 \times 1 \times 1 \times 1
$) which averages the loss (computed from predicted labels pred
and actuals labels label
) over the entire mini-batch.
Loss weights
For nets with multiple layers producing a loss (e.g., a network that both classifies the input using a SOFTMAX_LOSS
layer and reconstructs it using a EUCLIDEAN_LOSS
layer), loss weights can be used to specify their relative importance.
By convention, Caffe layer types with the suffix _LOSS
contribute to the loss function, but other layers are assumed to be purely used for intermediate computations.
However, any layer can be used as a loss by adding a field loss_weight: <float>
to a layer definition for each top
blob produced by the layer.
Layers with the suffix _LOSS
have an implicit loss_weight: 1
for the first top
blob (and loss_weight: 0
for any additional top
s); other layers have an implicit loss_weight: 0
for all top
s.
So, the above SOFTMAX_LOSS
layer could be equivalently written as:
layers {
name: "loss"
type: SOFTMAX_LOSS
bottom: "pred"
bottom: "label"
top: "loss"
loss_weight: 1
}
However, any layer able to backpropagate may be given a non-zero loss_weight
, allowing one to, for example, regularize the activations produced by some intermediate layer(s) of the network if desired.
For non-singleton outputs with an associated non-zero loss, the loss is computed simply by summing over all entries of the blob.
The final loss in Caffe, then, is computed by summing the total weighted loss over the network, as in the following pseudo-code:
loss := 0
for layer in layers:
for top, loss_weight in layer.tops, layer.loss_weights:
loss += loss_weight * sum(top)