Summary: | Human pose as a query modality is an alternative and rich experience for image and video retrieval. We present a novel approach for the task of human pose retrieval, and make the following contributions: first, we introduce `deep poselets' for pose-sensitive detection of various body parts, that are built on convolutional neural network (CNN) features. These deep poselets significantly outperform previous instantiations of Berkeley poselets [2]. Second, using these detector responses, we construct a pose representation that is suitable for pose search, and show that pose retrieval performance exceeds previous methods by a factor of two. The compared methods include Bag of visual words [24], Berkeley poselets [2] and Human pose estimation algorithms [28]. All the methods are quantitatively evaluated on a large dataset of images built from a number of standard benchmarks together with frames from Hollywood movies.
|