caffe/docs/tutorial/loss.md

2.7 KiB

title
Loss

Loss

In Caffe, as in most of machine learning, learning is driven by a loss function (also known as an error, cost, or objective function). A loss function specifies the goal of learning by mapping parameter settings (i.e., the current network weights) to a scalar value specifying the "badness" of these parameter settings. Hence, the goal of learning is to find a setting of the weights that minimizes the loss function.

The loss in Caffe is computed by the Forward pass of the network. Each layer takes a set of input (bottom) blobs and produces a set of output (top) blobs. Some of these layers' outputs may be used in the loss function. A typical choice of loss function for one-versus-all classification tasks is the SOFTMAX_LOSS function, used in a network definition as follows, for example:

layers {
  name: "loss"
  type: SOFTMAX_LOSS
  bottom: "pred"
  bottom: "label"
  top: "loss"
}

In a SOFTMAX_LOSS function, the top blob is a scalar (dimensions $1 \times 1 \times 1 \times 1$) which averages the loss (computed from predicted labels pred and actuals labels label) over the entire mini-batch.

Loss weights

For nets with multiple layers producing a loss (e.g., a network that both classifies the input using a SOFTMAX_LOSS layer and reconstructs it using a EUCLIDEAN_LOSS layer), loss weights can be used to specify their relative importance.

By convention, Caffe layer types with the suffix _LOSS contribute to the loss function, but other layers are assumed to be purely used for intermediate computations. However, any layer can be used as a loss by adding a field loss_weight: <float> to a layer definition for each top blob produced by the layer. Layers with the suffix _LOSS have an implicit loss_weight: 1 for the first top blob (and loss_weight: 0 for any additional tops); other layers have an implicit loss_weight: 0 for all tops. So, the above SOFTMAX_LOSS layer could be equivalently written as:

layers {
  name: "loss"
  type: SOFTMAX_LOSS
  bottom: "pred"
  bottom: "label"
  top: "loss"
  loss_weight: 1
}

However, any layer able to backpropagate may be given a non-zero loss_weight, allowing one to, for example, regularize the activations produced by some intermediate layer(s) of the network if desired. For non-singleton outputs with an associated non-zero loss, the loss is computed simply by summing over all entries of the blob.

The final loss in Caffe, then, is computed by summing the total weighted loss over the network, as in the following pseudo-code:

loss := 0
for layer in layers:
  for top, loss_weight in layer.tops, layer.loss_weights:
    loss += loss_weight * sum(top)