2014-08-29 08:45:46 +04:00
---
title: Fine-tuning for style recognition
description: Fine-tune the ImageNet-trained CaffeNet on the "Flickr Style" dataset.
category: example
include_in_docs: true
priority: 5
---
# Fine-tuning CaffeNet for Style Recognition on "Flickr Style" Data
Fine-tuning takes an already learned model, adapts the architecture, and resumes training from the already learned model weights.
Let's fine-tune the BVLC-distributed CaffeNet model on a different dataset, [Flickr Style ](http://sergeykarayev.com/files/1311.3715v3.pdf ), to predict image style instead of object category.
## Explanation
2014-10-14 21:18:20 +04:00
The Flickr-sourced images of the Style dataset are visually very similar to the ImageNet dataset, on which the `bvlc_reference_caffenet` was trained.
2016-03-02 08:07:56 +03:00
Since that model works well for object category classification, we'd like to use this architecture for our style classifier.
2014-08-29 08:45:46 +04:00
We also only have 80,000 images to train on, so we'd like to start with the parameters learned on the 1,000,000 ImageNet images, and fine-tune as needed.
2016-03-02 08:07:56 +03:00
If we provide the `weights` argument to the `caffe train` command, the pretrained weights will be loaded into our model, matching layers by name.
2014-08-29 08:45:46 +04:00
Because we are predicting 20 classes instead of a 1,000, we do need to change the last layer in the model.
Therefore, we change the name of the last layer from `fc8` to `fc8_flickr` in our prototxt.
2014-10-14 21:18:20 +04:00
Since there is no layer named that in the `bvlc_reference_caffenet` , that layer will begin training with random weights.
2014-08-29 08:45:46 +04:00
2015-12-20 13:12:09 +03:00
We will also decrease the overall learning rate `base_lr` in the solver prototxt, but boost the `lr_mult` on the newly introduced layer.
2014-08-29 08:45:46 +04:00
The idea is to have the rest of the model change very slowly with new data, but let the new layer learn fast.
Additionally, we set `stepsize` in the solver to a lower value than if we were training from scratch, since we're virtually far along in training and therefore want the learning rate to go down faster.
2015-12-20 13:12:09 +03:00
Note that we could also entirely prevent fine-tuning of all layers other than `fc8_flickr` by setting their `lr_mult` to 0.
2014-08-29 08:45:46 +04:00
## Procedure
All steps are to be done from the caffe root directory.
The dataset is distributed as a list of URLs with corresponding labels.
Using a script, we will download a small subset of the data and split it into train and val sets.
2014-10-09 06:56:07 +04:00
caffe % ./examples/finetune_flickr_style/assemble_data.py -h
2014-08-29 08:45:46 +04:00
usage: assemble_data.py [-h] [-s SEED] [-i IMAGES] [-w WORKERS]
Download a subset of Flickr Style to a directory
optional arguments:
-h, --help show this help message and exit
-s SEED, --seed SEED random seed
-i IMAGES, --images IMAGES
number of images to use (-1 for all)
-w WORKERS, --workers WORKERS
num workers used to download images. -x uses (all - x)
cores.
2014-10-09 06:56:07 +04:00
caffe % python examples/finetune_flickr_style/assemble_data.py --workers=-1 --images=2000 --seed 831486
2014-08-29 08:45:46 +04:00
Downloading 2000 images with 7 workers...
Writing train/val for 1939 successfully downloaded images.
This script downloads images and writes train/val file lists into `data/flickr_style` .
The prototxts in this example assume this, and also assume the presence of the ImageNet mean file (run `get_ilsvrc_aux.sh` from `data/ilsvrc12` to obtain this if you haven't yet).
2014-09-04 13:33:19 +04:00
We'll also need the ImageNet-trained model, which you can obtain by running `./scripts/download_model_binary.py models/bvlc_reference_caffenet` .
2014-08-29 08:45:46 +04:00
2016-04-21 00:34:29 +03:00
Now we can train! The key to fine-tuning is the `-weights` argument in the
command below, which tells Caffe that we want to load weights from a pre-trained
Caffe model.
(You can fine-tune in CPU mode by leaving out the `-gpu` flag.)
2014-08-29 08:45:46 +04:00
2014-09-04 22:43:41 +04:00
caffe % ./build/tools/caffe train -solver models/finetune_flickr_style/solver.prototxt -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel -gpu 0
2014-08-29 08:45:46 +04:00
[...]
I0828 22:10:04.025378 9718 solver.cpp:46] Solver scaffolding done.
I0828 22:10:04.025388 9718 caffe.cpp:95] Use GPU with device ID 0
2014-09-04 04:08:19 +04:00
I0828 22:10:04.192004 9718 caffe.cpp:107] Finetuning from models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
2014-08-29 08:45:46 +04:00
[...]
I0828 22:17:48.338963 11510 solver.cpp:165] Solving FlickrStyleCaffeNet
I0828 22:17:48.339010 11510 solver.cpp:251] Iteration 0, Testing net (#0)
2014-09-04 22:43:41 +04:00
I0828 22:18:14.313817 11510 solver.cpp:302] Test net output #0: accuracy = 0.0308
I0828 22:18:14.476822 11510 solver.cpp:195] Iteration 0, loss = 3.78589
2014-08-29 08:45:46 +04:00
I0828 22:18:14.476878 11510 solver.cpp:397] Iteration 0, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:18:19.700408 11510 solver.cpp:195] Iteration 20, loss = 3.25728
2014-08-29 08:45:46 +04:00
I0828 22:18:19.700461 11510 solver.cpp:397] Iteration 20, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:18:24.924685 11510 solver.cpp:195] Iteration 40, loss = 2.18531
2014-08-29 08:45:46 +04:00
I0828 22:18:24.924741 11510 solver.cpp:397] Iteration 40, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:18:30.114858 11510 solver.cpp:195] Iteration 60, loss = 2.4915
2014-08-29 08:45:46 +04:00
I0828 22:18:30.114910 11510 solver.cpp:397] Iteration 60, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:18:35.328071 11510 solver.cpp:195] Iteration 80, loss = 2.04539
2014-08-29 08:45:46 +04:00
I0828 22:18:35.328127 11510 solver.cpp:397] Iteration 80, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:18:40.588317 11510 solver.cpp:195] Iteration 100, loss = 2.1924
2014-08-29 08:45:46 +04:00
I0828 22:18:40.588373 11510 solver.cpp:397] Iteration 100, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:18:46.171576 11510 solver.cpp:195] Iteration 120, loss = 2.25107
2014-08-29 08:45:46 +04:00
I0828 22:18:46.171669 11510 solver.cpp:397] Iteration 120, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:18:51.757809 11510 solver.cpp:195] Iteration 140, loss = 1.355
2014-08-29 08:45:46 +04:00
I0828 22:18:51.757863 11510 solver.cpp:397] Iteration 140, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:18:57.345080 11510 solver.cpp:195] Iteration 160, loss = 1.40815
2014-08-29 08:45:46 +04:00
I0828 22:18:57.345135 11510 solver.cpp:397] Iteration 160, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:19:02.928794 11510 solver.cpp:195] Iteration 180, loss = 1.6558
2014-08-29 08:45:46 +04:00
I0828 22:19:02.928850 11510 solver.cpp:397] Iteration 180, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:19:08.514497 11510 solver.cpp:195] Iteration 200, loss = 0.88126
2014-08-29 08:45:46 +04:00
I0828 22:19:08.514552 11510 solver.cpp:397] Iteration 200, lr = 0.001
[...]
2014-09-04 22:43:41 +04:00
I0828 22:22:40.789010 11510 solver.cpp:195] Iteration 960, loss = 0.112586
2014-08-29 08:45:46 +04:00
I0828 22:22:40.789175 11510 solver.cpp:397] Iteration 960, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:22:46.376626 11510 solver.cpp:195] Iteration 980, loss = 0.0959077
2014-08-29 08:45:46 +04:00
I0828 22:22:46.376682 11510 solver.cpp:397] Iteration 980, lr = 0.001
I0828 22:22:51.687258 11510 solver.cpp:251] Iteration 1000, Testing net (#0)
2014-09-04 22:43:41 +04:00
I0828 22:23:17.438894 11510 solver.cpp:302] Test net output #0: accuracy = 0.2356
2014-08-29 08:45:46 +04:00
2014-09-04 22:43:41 +04:00
Note how rapidly the loss went down. Although the 23.5% accuracy is only modest, it was achieved in only 1000, and evidence that the model is starting to learn quickly and well.
2014-10-17 22:34:25 +04:00
Once the model is fully fine-tuned on the whole training set over 100,000 iterations the final validation accuracy is 39.16%.
This takes ~7 hours in Caffe on a K40 GPU.
2014-08-29 08:45:46 +04:00
For comparison, here is how the loss goes down when we do not start with a pre-trained model:
I0828 22:24:18.624004 12919 solver.cpp:165] Solving FlickrStyleCaffeNet
I0828 22:24:18.624099 12919 solver.cpp:251] Iteration 0, Testing net (#0)
2014-09-04 22:43:41 +04:00
I0828 22:24:44.520992 12919 solver.cpp:302] Test net output #0: accuracy = 0.0366
I0828 22:24:44.676905 12919 solver.cpp:195] Iteration 0, loss = 3.47942
2014-08-29 08:45:46 +04:00
I0828 22:24:44.677120 12919 solver.cpp:397] Iteration 0, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:24:50.152454 12919 solver.cpp:195] Iteration 20, loss = 2.99694
2014-08-29 08:45:46 +04:00
I0828 22:24:50.152509 12919 solver.cpp:397] Iteration 20, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:24:55.736256 12919 solver.cpp:195] Iteration 40, loss = 3.0498
2014-08-29 08:45:46 +04:00
I0828 22:24:55.736311 12919 solver.cpp:397] Iteration 40, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:25:01.316514 12919 solver.cpp:195] Iteration 60, loss = 2.99549
2014-08-29 08:45:46 +04:00
I0828 22:25:01.316567 12919 solver.cpp:397] Iteration 60, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:25:06.899554 12919 solver.cpp:195] Iteration 80, loss = 3.00573
2014-08-29 08:45:46 +04:00
I0828 22:25:06.899610 12919 solver.cpp:397] Iteration 80, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:25:12.484624 12919 solver.cpp:195] Iteration 100, loss = 2.99094
2014-08-29 08:45:46 +04:00
I0828 22:25:12.484678 12919 solver.cpp:397] Iteration 100, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:25:18.069056 12919 solver.cpp:195] Iteration 120, loss = 3.01616
2014-08-29 08:45:46 +04:00
I0828 22:25:18.069149 12919 solver.cpp:397] Iteration 120, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:25:23.650928 12919 solver.cpp:195] Iteration 140, loss = 2.98786
2014-08-29 08:45:46 +04:00
I0828 22:25:23.650984 12919 solver.cpp:397] Iteration 140, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:25:29.235535 12919 solver.cpp:195] Iteration 160, loss = 3.00724
2014-08-29 08:45:46 +04:00
I0828 22:25:29.235589 12919 solver.cpp:397] Iteration 160, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:25:34.816898 12919 solver.cpp:195] Iteration 180, loss = 3.00099
2014-08-29 08:45:46 +04:00
I0828 22:25:34.816953 12919 solver.cpp:397] Iteration 180, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:25:40.396656 12919 solver.cpp:195] Iteration 200, loss = 2.99848
2014-08-29 08:45:46 +04:00
I0828 22:25:40.396711 12919 solver.cpp:397] Iteration 200, lr = 0.001
[...]
2014-09-04 22:43:41 +04:00
I0828 22:29:12.539094 12919 solver.cpp:195] Iteration 960, loss = 2.99203
2014-08-29 08:45:46 +04:00
I0828 22:29:12.539258 12919 solver.cpp:397] Iteration 960, lr = 0.001
2014-09-04 22:43:41 +04:00
I0828 22:29:18.123092 12919 solver.cpp:195] Iteration 980, loss = 2.99345
2014-08-29 08:45:46 +04:00
I0828 22:29:18.123147 12919 solver.cpp:397] Iteration 980, lr = 0.001
I0828 22:29:23.432059 12919 solver.cpp:251] Iteration 1000, Testing net (#0)
2014-09-04 22:43:41 +04:00
I0828 22:29:49.409044 12919 solver.cpp:302] Test net output #0: accuracy = 0.0572
2014-08-29 08:45:46 +04:00
This model is only beginning to learn.
Fine-tuning can be feasible when training from scratch would not be for lack of time or data.
Even in CPU mode each pass through the training set takes ~100 s. GPU fine-tuning is of course faster still and can learn a useful model in minutes or hours instead of days or weeks.
Furthermore, note that the model has only trained on < 2 , 000 instances . Transfer learning a new task like style recognition from the ImageNet pretraining can require much less data than training from scratch .
2014-09-04 05:50:53 +04:00
2014-08-29 08:45:46 +04:00
Now try fine-tuning to your own tasks and data!
2014-09-04 05:50:53 +04:00
## Trained model
2014-10-17 22:34:25 +04:00
We provide a model trained on all 80K images, with final accuracy of 39%.
2014-09-04 05:50:53 +04:00
Simply do `./scripts/download_model_binary.py models/finetune_flickr_style` to obtain it.
2014-08-29 08:45:46 +04:00
## License
The Flickr Style dataset as distributed here contains only URLs to images.
Some of the images may have copyright.
2014-09-04 05:50:53 +04:00
Training a category-recognition model for research/non-commercial use may constitute fair use of this data, but the result should not be used for commercial purposes.