Latent SVMs for human detection with a locally affine deformation field

Methods for human detection and localization typically use histograms of gradients (HOG) and work well for aligned data with low variance. For methods based on HOG despite the fact the higher resolution templates capture more details, their use does not lead to a better performance, because even a s...

Full description

Bibliographic Details
Main Authors: Ladický, L, Zisserman, A, Torr, PHS
Format: Conference item
Language:English
Published: British Machine Vision Association and Society for Pattern Recognition 2012
Description
Summary:Methods for human detection and localization typically use histograms of gradients (HOG) and work well for aligned data with low variance. For methods based on HOG despite the fact the higher resolution templates capture more details, their use does not lead to a better performance, because even a small variance in the data could cause the discriminative edges to fall into different neighbouring cells. To overcome these problems, Felzenszwalb et al. proposed a star-graph part based deformable model with a fixed number of rigid parts, which could capture these variations in the data leading to state-ofthe- Art results. Motivated by this work, we propose a latent deformable template model with a locally affine deformation field, which allows for more general and more natural deformations of the template while not over-fitting the data; and we also provide a novel inference method for this kind of problem. This deformation model gives us a way to measure the distances between training samples, and we show how this can be used to cluster the problem into several modes, corresponding to different types of objects, viewpoints or poses. Our method leads to a significant improvement over the state-of-the-art with small computational overhead.