Fully convolutional networks for action recognition
Human action recognition is an important and challenging topic in computer vision. Recently, convolutional neural networks (CNNs) have established impressive results for many image recognition tasks. The CNNs usually contain million parameters which prone to overfit when training on small datasets....
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2017-12-01
|
Series: | IET Computer Vision |
Subjects: | |
Online Access: | https://doi.org/10.1049/iet-cvi.2017.0005 |
Summary: | Human action recognition is an important and challenging topic in computer vision. Recently, convolutional neural networks (CNNs) have established impressive results for many image recognition tasks. The CNNs usually contain million parameters which prone to overfit when training on small datasets. Therefore, the CNNs do not produce superior performance over traditional methods for action recognition. In this study, the authors design a novel two‐stream fully convolutional networks architecture for action recognition which can significantly reduce parameters while keeping performance. To utilise the advantage of spatial‐temporal features, a linear weighted fusion method is used to fuse two‐stream networks’ feature maps and a video pooling method is adopted to construct the video‐level features. At the meantime, the authors also demonstrate that the improved dense trajectories has significant impact for action recognition. The authors’ method can achieve the state‐of‐the‐art performance on two challenging datasets UCF101 (93.0%) and HMDB51 (70.2%). |
---|---|
ISSN: | 1751-9632 1751-9640 |