Tuesday, 22 November 2016

Use Faster RCNN and ResNet codes for object detection and image classification with your own training data

I have recently uploaded two repositories to GitHub, both based on publicly available codes for state-of-the-art (1) object detection and (2) image classification. I would like to leave a few notes here, though.

(1) Faster RCNN for object detection (GitHub Link).

You can use your own PASCAL VOC formatted data to train an object detector. Check out how to alter the network parameters as shown in the example files located in:
In particular, you want to change the following settings in stage1_fast_rcnn_train.pt and stage2_fast_rcnn_train.pt:
num_class:2 # in our example, person detection only has two classes: person vs background
In cls_score -- num_output:2
In bbox_pred -- num_output:8 # this value is 4*num_class
Also in stage1_rpn_train.pt and stage2_rpn_train.pt:
Finally, in fast_rcnn_test.pt:
In cls_score -- num_output:2
In bbox_pred -- num_output:8 # this value is 4*num_class
Additionally, you need to modify lib/datasets/pascal_voc.py:
self._classes = ('__background__', # always index 0
And then recompile from python prompt:
You can then follow instructions from this page to train your model.

(2) Fine tuning ResNet for image classification (GitHub Link).

This one is simple to use, and you may check this out before attempting to fine tune a ResNet model.

Example scripts can be found in: finetune-resnet-flower/caffe/examples/flower463/

Network parameters can be found in: finetune-resnet-flower/caffe/models/resnet_flower463/

Note that the parameters in solver50.prototxt may not be optimal (at least for my task at hand). For better performance (of course, slower training), you can try to increase stepsize as shown below:
test_iter: 2000
test_interval: 1000
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 500
max_iter: 1000000
momentum: 0.9
weight_decay: 0.0005
Also, set the batch size appropriately to reflect the graphic memory capability of your system.


  1. This comment has been removed by the author.

  2. Thank you for sharing your ResNet training and testing code. The flower463 dataset is ilsvrc12 dataset, Right? I tried to download the raw image from imagenet but it is very larger size (~138GB) and it contains the raw JPEG image. Do you use the raw image or just get it from folder /caffe/data/ilsvrc12 by using get_ilsvrc_aux.sh? Is it possible to use Resnet for pascal VOC2012?
    I think it is more easy for get data.

  3. Flower463 is a dataset for flower (i.e., fine-grained) classification. It is a commercial dataset at this time but I will see if I can make it publicly available later. It is in similar format to the ILSVRC, but obviously the images are different.

    Basically, you will need to have images belong to each class in a separate folder, preferably also with all images resized to 256x256. Then you can run caffe/examples/flower463/create_flower.sh to prepare lmdb data, and then caffe/examples/flower463/make_flower_mean.sh for image mean computation, and finally caffe/examples/flower463/train_finetune_resnet_50.sh to begin training.

    Further reading here:

    It is possible to use ResNet for PASCAL VOC 2012, but this is a rather general question and I'm not sure if you would like to do classification or detection.