R-CNN minus R

Deep convolutional neural networks (CNNs) have had a major impact in most areas of image understanding. In object category detection, however, the best results have been obtained by techniques such as R(egion)-CNN that combine CNNs with cues from image segmentation, using techniques such as selectiv...

Full description

Bibliographic Details
Main Authors: Lenc, K, Vedaldi, A
Format: Conference item
Language:English
Published: BMVA Press 2015
Description
Summary:Deep convolutional neural networks (CNNs) have had a major impact in most areas of image understanding. In object category detection, however, the best results have been obtained by techniques such as R(egion)-CNN that combine CNNs with cues from image segmentation, using techniques such as selective search to propose possible object locations in images. However, the role of segmentation in CNN detectors remains controversial. On the one hand, segmentation may be a necessary modelling component, carrying essential geometric information not contained in the CNN; on the other hand, it may be merely a way of accelerating detection, by focusing the CNN classifier on promising image areas. In this paper, we answer this question by developing a detector that uses a trivial region generation scheme, constant for each image. While such region proposals approximate objects poorly, we show that a bounding box regressor using intermediate convolutional features can recover sufficiently accurate bounding boxes, demonstrating that, indeed, the required geometric information is contained in the CNN itself. Combined with convolutional feature pooling, we also obtain an excellent and fast detector that does not require to process an image with algorithms other than the CNN itself. We also streamline and simplify the training of CNN-based detectors by integrating several learning steps in a single algorithm, as well as by proposing a number of improvements that accelerate detection.