- Title: Training Region-based Object Detectors with Online Hard Example Mining
- Task: Object Detection
- Author: A. Shrivastava, A. Gupta, and R. Girshick
- Date: Apr. 2016
- Arxiv: 1604.03540
- Published: CVPR 2016
Highlights & Drawbacks
- Learning-based design for balancing examples for ROI in 2-stage detection network
- Plug-in ready trick, easy to be integrated
- Additional Parameters for Training
Motivation & Design
There is a 1:3 strategy in Faster-RCNN network, which samples negative ROIs(backgrounds) to balance the ratio for positive and negative data in a batch. It’s empirical and hand-designed(need additional effort when setting hyper-params).
The authors designed an additional sub-network to “learn” the sampling process for negative ROIs, forcing the network focus on ones which are similar to objects(the hard ones), such as backgrounds contain part of objects.
The ‘hard’ examples are defined using probability from detection head, which means that the sample network is exactly the classification network. In practice, the selecting range is set to [0.1, 0.5].
Performance & Ablation Study
OHEM can improve performance even after adding bells and whistles like Multi-scale training and Iterative bbox regression.
- You Only Look Once: Unified, Real Time Object Detection - Redmon et al. - CVPR 2016
- An analysis of scale invariance in object detection - SNIP - Singh - CVPR 2018
- Faster R-CNN: Towards Real Time Object Detection with Region Proposal - Ren - NIPS 2015
- (FPN)Feature Pyramid Networks for Object Detection - Lin - CVPR 2017
- (RetinaNet)Focal loss for dense object detection - Lin - ICCV 2017
- YOLO9000: Better, Faster, Stronger - Redmon et al. - 2016