Fully convolutional networks for action recognition

Human action recognition is an important and challenging topic in computer vision. Recently, convolutional neural networks (CNNs) have established impressive results for many image recognition tasks. The CNNs usually contain million parameters which prone to overfit when training on small datasets....

Full description

Bibliographic Details
Main Authors: Sheng Yu, Yun Cheng, Li Xie, Shao‐Zi Li
Format: Article
Language:English
Published: Wiley 2017-12-01
Series:IET Computer Vision
Subjects:
Online Access:https://doi.org/10.1049/iet-cvi.2017.0005
_version_ 1797684402484936704
author Sheng Yu
Yun Cheng
Li Xie
Shao‐Zi Li
author_facet Sheng Yu
Yun Cheng
Li Xie
Shao‐Zi Li
author_sort Sheng Yu
collection DOAJ
description Human action recognition is an important and challenging topic in computer vision. Recently, convolutional neural networks (CNNs) have established impressive results for many image recognition tasks. The CNNs usually contain million parameters which prone to overfit when training on small datasets. Therefore, the CNNs do not produce superior performance over traditional methods for action recognition. In this study, the authors design a novel two‐stream fully convolutional networks architecture for action recognition which can significantly reduce parameters while keeping performance. To utilise the advantage of spatial‐temporal features, a linear weighted fusion method is used to fuse two‐stream networks’ feature maps and a video pooling method is adopted to construct the video‐level features. At the meantime, the authors also demonstrate that the improved dense trajectories has significant impact for action recognition. The authors’ method can achieve the state‐of‐the‐art performance on two challenging datasets UCF101 (93.0%) and HMDB51 (70.2%).
first_indexed 2024-03-12T00:29:08Z
format Article
id doaj.art-56f38707e73b491393b5b0090dd58012
institution Directory Open Access Journal
issn 1751-9632
1751-9640
language English
last_indexed 2024-03-12T00:29:08Z
publishDate 2017-12-01
publisher Wiley
record_format Article
series IET Computer Vision
spelling doaj.art-56f38707e73b491393b5b0090dd580122023-09-15T10:26:00ZengWileyIET Computer Vision1751-96321751-96402017-12-0111874474910.1049/iet-cvi.2017.0005Fully convolutional networks for action recognitionSheng Yu0Yun Cheng1Li Xie2Shao‐Zi Li3Cognitive Science DepartmentXiamen UniversityXiamenFujianPeople's Republic of ChinaSchool of InformationHunan University of Humanities, Science and TechnologyLoudiHunanPeople's Republic of ChinaSchool of InformationHunan University of Humanities, Science and TechnologyLoudiHunanPeople's Republic of ChinaCognitive Science DepartmentXiamen UniversityXiamenFujianPeople's Republic of ChinaHuman action recognition is an important and challenging topic in computer vision. Recently, convolutional neural networks (CNNs) have established impressive results for many image recognition tasks. The CNNs usually contain million parameters which prone to overfit when training on small datasets. Therefore, the CNNs do not produce superior performance over traditional methods for action recognition. In this study, the authors design a novel two‐stream fully convolutional networks architecture for action recognition which can significantly reduce parameters while keeping performance. To utilise the advantage of spatial‐temporal features, a linear weighted fusion method is used to fuse two‐stream networks’ feature maps and a video pooling method is adopted to construct the video‐level features. At the meantime, the authors also demonstrate that the improved dense trajectories has significant impact for action recognition. The authors’ method can achieve the state‐of‐the‐art performance on two challenging datasets UCF101 (93.0%) and HMDB51 (70.2%).https://doi.org/10.1049/iet-cvi.2017.0005human action recognitioncomputer visionconvolutional neural networksCNNimage recognition taskstwo-stream fully convolutional networks architecture
spellingShingle Sheng Yu
Yun Cheng
Li Xie
Shao‐Zi Li
Fully convolutional networks for action recognition
IET Computer Vision
human action recognition
computer vision
convolutional neural networks
CNN
image recognition tasks
two-stream fully convolutional networks architecture
title Fully convolutional networks for action recognition
title_full Fully convolutional networks for action recognition
title_fullStr Fully convolutional networks for action recognition
title_full_unstemmed Fully convolutional networks for action recognition
title_short Fully convolutional networks for action recognition
title_sort fully convolutional networks for action recognition
topic human action recognition
computer vision
convolutional neural networks
CNN
image recognition tasks
two-stream fully convolutional networks architecture
url https://doi.org/10.1049/iet-cvi.2017.0005
work_keys_str_mv AT shengyu fullyconvolutionalnetworksforactionrecognition
AT yuncheng fullyconvolutionalnetworksforactionrecognition
AT lixie fullyconvolutionalnetworksforactionrecognition
AT shaozili fullyconvolutionalnetworksforactionrecognition