Summary: | Action recognition from videos has many potential applications. However, there are many unresolved challenges, such as pose-invariant recognition, robustness to occlusion and others. In this paper, we propose to combine motion of body parts and pose hypothesis generation validated with specific canonical poses observed in a novel mutually reinforcing framework to achieve pose-invariant action recognition. To capture the temporal dynamics of an action, we introduce temporal stick features computed using the stick poses obtained. The combination of pose-invariant kinematic features from motion, pose hypothesis and temporal stick features are used for action recognition, thus forming a mutually reinforcing framework that repeats until the action recognition result converges. The proposed mutual reinforcement framework is capable of handling changes in posture of the person, occlusion and partial view-invariance. We perform experiments on several benchmark datasets which showed the performance of the proposed algorithm and its ability to handle pose variation and occlusion.
|