Fully convolutional networks for action recognition
Human action recognition is an important and challenging topic in computer vision. Recently, convolutional neural networks (CNNs) have established impressive results for many image recognition tasks. The CNNs usually contain million parameters which prone to overfit when training on small datasets....
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2017-12-01
|
Series: | IET Computer Vision |
Subjects: | |
Online Access: | https://doi.org/10.1049/iet-cvi.2017.0005 |
_version_ | 1797684402484936704 |
---|---|
author | Sheng Yu Yun Cheng Li Xie Shao‐Zi Li |
author_facet | Sheng Yu Yun Cheng Li Xie Shao‐Zi Li |
author_sort | Sheng Yu |
collection | DOAJ |
description | Human action recognition is an important and challenging topic in computer vision. Recently, convolutional neural networks (CNNs) have established impressive results for many image recognition tasks. The CNNs usually contain million parameters which prone to overfit when training on small datasets. Therefore, the CNNs do not produce superior performance over traditional methods for action recognition. In this study, the authors design a novel two‐stream fully convolutional networks architecture for action recognition which can significantly reduce parameters while keeping performance. To utilise the advantage of spatial‐temporal features, a linear weighted fusion method is used to fuse two‐stream networks’ feature maps and a video pooling method is adopted to construct the video‐level features. At the meantime, the authors also demonstrate that the improved dense trajectories has significant impact for action recognition. The authors’ method can achieve the state‐of‐the‐art performance on two challenging datasets UCF101 (93.0%) and HMDB51 (70.2%). |
first_indexed | 2024-03-12T00:29:08Z |
format | Article |
id | doaj.art-56f38707e73b491393b5b0090dd58012 |
institution | Directory Open Access Journal |
issn | 1751-9632 1751-9640 |
language | English |
last_indexed | 2024-03-12T00:29:08Z |
publishDate | 2017-12-01 |
publisher | Wiley |
record_format | Article |
series | IET Computer Vision |
spelling | doaj.art-56f38707e73b491393b5b0090dd580122023-09-15T10:26:00ZengWileyIET Computer Vision1751-96321751-96402017-12-0111874474910.1049/iet-cvi.2017.0005Fully convolutional networks for action recognitionSheng Yu0Yun Cheng1Li Xie2Shao‐Zi Li3Cognitive Science DepartmentXiamen UniversityXiamenFujianPeople's Republic of ChinaSchool of InformationHunan University of Humanities, Science and TechnologyLoudiHunanPeople's Republic of ChinaSchool of InformationHunan University of Humanities, Science and TechnologyLoudiHunanPeople's Republic of ChinaCognitive Science DepartmentXiamen UniversityXiamenFujianPeople's Republic of ChinaHuman action recognition is an important and challenging topic in computer vision. Recently, convolutional neural networks (CNNs) have established impressive results for many image recognition tasks. The CNNs usually contain million parameters which prone to overfit when training on small datasets. Therefore, the CNNs do not produce superior performance over traditional methods for action recognition. In this study, the authors design a novel two‐stream fully convolutional networks architecture for action recognition which can significantly reduce parameters while keeping performance. To utilise the advantage of spatial‐temporal features, a linear weighted fusion method is used to fuse two‐stream networks’ feature maps and a video pooling method is adopted to construct the video‐level features. At the meantime, the authors also demonstrate that the improved dense trajectories has significant impact for action recognition. The authors’ method can achieve the state‐of‐the‐art performance on two challenging datasets UCF101 (93.0%) and HMDB51 (70.2%).https://doi.org/10.1049/iet-cvi.2017.0005human action recognitioncomputer visionconvolutional neural networksCNNimage recognition taskstwo-stream fully convolutional networks architecture |
spellingShingle | Sheng Yu Yun Cheng Li Xie Shao‐Zi Li Fully convolutional networks for action recognition IET Computer Vision human action recognition computer vision convolutional neural networks CNN image recognition tasks two-stream fully convolutional networks architecture |
title | Fully convolutional networks for action recognition |
title_full | Fully convolutional networks for action recognition |
title_fullStr | Fully convolutional networks for action recognition |
title_full_unstemmed | Fully convolutional networks for action recognition |
title_short | Fully convolutional networks for action recognition |
title_sort | fully convolutional networks for action recognition |
topic | human action recognition computer vision convolutional neural networks CNN image recognition tasks two-stream fully convolutional networks architecture |
url | https://doi.org/10.1049/iet-cvi.2017.0005 |
work_keys_str_mv | AT shengyu fullyconvolutionalnetworksforactionrecognition AT yuncheng fullyconvolutionalnetworksforactionrecognition AT lixie fullyconvolutionalnetworksforactionrecognition AT shaozili fullyconvolutionalnetworksforactionrecognition |