Summary: | Recent studies show that convolutional neural networks (CNNs) has made a series of breakthroughs in the two tasks of face detection and pose estimation, respectively. There are two CNN frameworks for solving these two integrated tasks simultaneously. One is to use face detection network to detect faces firstly, and then use pose estimation network to estimate each face’s pose; the other is to use region proposal algorithm to generate many candidate regions that may contain faces, and then use a single deep multi-task CNN to process these regions for simultaneous face detection and pose estimation. The former’s problem is pose estimation’s performance is affected by face detection network because two networks are separate. The latter generates lots of candidate regions, which will bring huge computation cost to CNN and can’t achieve real-time. To solve the above existing problems, we propose a multi-task CNN cascade framework that integrates these two tasks. We show that multi-task learning of face detection and head pose estimation helps to extract more representative features. We exploit CNN feature fusion strategy to further improve head pose estimation’s performance. We evaluate face detection on FDDB benchmark, and evaluate pose estimation on AFW benchmark. Our method achieves comparative result compared with state-of-the-art in these two tasks and can achieve real-time performance.
|