Faster R-CNN: Towards Real Time Object Detection with Region Proposal - Ren - NIPS 2015

Info

Title: Faster R-CNN: Towards Real Time Object Detection with Region Proposal
Task: Object Detection
Author: Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun
Date: June 2015
Arxiv: 1506.01497
Published: NIPS 2015

Highlights

Faster R-CNN is the mainstream method of 2-stage method. The proposed RPN network replaces the Selective Search algorithm so that the detection task can be completed end-to-end by the neural network. Roughly speaking, Faster R-CNN = RPN + Fast R-CNN, the nature of the convolution calculation shared with RCNN makes the calculations introduced by RPN very small, allowing Faster R-CNN to run at 5fps on a single GPU. Reach SOTA in terms of accuracy.

Motivation & Design

Regional Proposal Networks

Faster R-CNN: Towards Real Time Object Detection with Region Proposal

The RPN network models the Proposal task as a two-category problem.

The first step is to generate an anchor box of different size and aspect ratio on a sliding window, determine the threshold of the IOU, and calibrate the positive and negative of the anchor box according to Ground Truth. Thus, the sample that is passed into the RPN network is the anchor box and whether there is an object in each anchor box. The RPN network maps each sample to a probability value and four coordinate values. The probability value reflects the probability that the anchor box has an object, and the four coordinate values are used to regress the position of the defined object. Finally, the two classifications and the coordinates of the Loss are unified to be the target training of the RPN network.

The RPN network has a large number of super-parameters, the size and length-to-width ratio of the anchor box, the threshold of IoU, and the ratio of Proposal positive and negative samples on each image.

Alternate Training

Faster R-CNN: Towards Real Time Object Detection with Region Proposal

The RPN network is implemented on the feature map, so that the convolution operation of the feature extractor part can be completely shared with the RCNN. During training, the training of the RPN and RCNN can be alternated by alternately fixing the parameters of the RPN and RCNN parts and updating the other part.

Performance

Faster R-CNN: Towards Real Time Object Detection with Region Proposal

PREVIOUSInception-v4, Inception-ResNet and the Impact of Residual Connections on Learning - Szegedy et al. - 2016

NEXTMegDet: A Large Mini-Batch Object Detector - C. Peng et al - CVPR 2018