Robust Head Detection in Complex Videos Using Two-Stage Deep Convolution Framework

Pedestrian head detection plays an important role in identifying and localizing individuals in real world visual data. Head detection is a nontrivial problem due to considerable variance in camera view-points, scales, human poses, and appearances in the scene. Thanks to the translation invariance pr...

Full description

Bibliographic Details
Main Authors: Sultan Daud Khan, Yasir Ali, Basim Zafar, Abdulfattah Noorwali
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9096268/
Description
Summary:Pedestrian head detection plays an important role in identifying and localizing individuals in real world visual data. Head detection is a nontrivial problem due to considerable variance in camera view-points, scales, human poses, and appearances in the scene. Thanks to the translation invariance property of convolutional neural networks (CNNs) which enables large capacity CNNs to handle the problem of appearance and pose variations in the scene. However, the problem of scale invariance is still an open issue. To address this problem, this paper presents a two-stage head detection framework that utilizes fully convolutional network (FCN) to generate scale-aware proposals followed by CNN that classifies each proposal into two classes, i.e. head and background. Experiments results show that using scale-aware proposals obtained by FCN, the object recall rate and mean average precision (mAP) are improved. Additionaly, we demonstrate that our framework achieved state-of-the-art results on four challenging benchmark datasets, i.e. HollywoodHeads, Casablanca, SHOCK, and WIDERFACE.
ISSN:2169-3536