Summary: | We propose a method for human action recognition from still images that uses the silhouette and the upper body as a proxy for the pose of the person, and also to guide alignment between samples for the purpose of computing registered feature descriptors. Our contributions include an efficient algorithm, formulated as an energy minimization, for using the silhouette to align body parts between imaged human samples. The descriptors computed over the aligned body parts are incorporated, via a multiple kernel framework, together with other standard features (such as a deformable part model (DPM) and dense SIFT), to learn a classifier for each action class. Experiments on the challenging PASCAL VOC 2012 dataset shows that our method exceeds the state-of-the-art performance on the majority of action classes.
|