Tuesday 22 November 2016

Use Faster RCNN and ResNet codes for object detection and image classification with your own training data

I have recently uploaded two repositories to GitHub, both based on publicly available codes for state-of-the-art (1) object detection and (2) image classification. I would like to leave a few notes here, though.

(1) Faster RCNN for object detection (GitHub Link).

You can use your own PASCAL VOC formatted data to train an object detector. Check out how to alter the network parameters as shown in the example files located in:
person_detection_voc2012/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/*.pt
In particular, you want to change the following settings in stage1_fast_rcnn_train.pt and stage2_fast_rcnn_train.pt:
num_class:2 # in our example, person detection only has two classes: person vs background
In cls_score -- num_output:2
In bbox_pred -- num_output:8 # this value is 4*num_class
Also in stage1_rpn_train.pt and stage2_rpn_train.pt:
num_class:2
Finally, in fast_rcnn_test.pt:
In cls_score -- num_output:2
In bbox_pred -- num_output:8 # this value is 4*num_class
Additionally, you need to modify lib/datasets/pascal_voc.py:
self._classes = ('__background__', # always index 0
                 'person')
And then recompile from python prompt:
importpy_compile
py_compile.compile(r'pascal_voc.py')
You can then follow instructions from this page to train your model.

(2) Fine tuning ResNet for image classification (GitHub Link).

This one is simple to use, and you may check this out before attempting to fine tune a ResNet model.

Example scripts can be found in: finetune-resnet-flower/caffe/examples/flower463/

Network parameters can be found in: finetune-resnet-flower/caffe/models/resnet_flower463/

Note that the parameters in solver50.prototxt may not be optimal (at least for my task at hand). For better performance (of course, slower training), you can try to increase stepsize as shown below:
test_iter: 2000
test_interval: 1000
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 500
max_iter: 1000000
momentum: 0.9
weight_decay: 0.0005
Also, set the batch size appropriately to reflect the graphic memory capability of your system.