Fully convolutional networks for action recognition

Human action recognition is an important and challenging topic in computer vision. Recently, convolutional neural networks (CNNs) have established impressive results for many image recognition tasks. The CNNs usually contain million parameters which prone to overfit when training on small datasets....

Full description

Bibliographic Details
Main Authors: Sheng Yu, Yun Cheng, Li Xie, Shao‐Zi Li
Format: Article
Language:English
Published: Wiley 2017-12-01
Series:IET Computer Vision
Subjects:
Online Access:https://doi.org/10.1049/iet-cvi.2017.0005
Description
Summary:Human action recognition is an important and challenging topic in computer vision. Recently, convolutional neural networks (CNNs) have established impressive results for many image recognition tasks. The CNNs usually contain million parameters which prone to overfit when training on small datasets. Therefore, the CNNs do not produce superior performance over traditional methods for action recognition. In this study, the authors design a novel two‐stream fully convolutional networks architecture for action recognition which can significantly reduce parameters while keeping performance. To utilise the advantage of spatial‐temporal features, a linear weighted fusion method is used to fuse two‐stream networks’ feature maps and a video pooling method is adopted to construct the video‐level features. At the meantime, the authors also demonstrate that the improved dense trajectories has significant impact for action recognition. The authors’ method can achieve the state‐of‐the‐art performance on two challenging datasets UCF101 (93.0%) and HMDB51 (70.2%).
ISSN:1751-9632
1751-9640