Multi-Scale Receptive Fields Convolutional Network for Action Recognition

Extracting good action representations from video frames is an intricate challenge due to the presence of moving objects of various sizes across current action recognition datasets. Most of the current action recognition methodologies have paid scant attention to this characteristic and have relied...

Full description

Bibliographic Details
Main Authors: Zhiang Dong, Miao Xie, Xiaoqiang Li
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/6/3403
_version_ 1797613714240700416
author Zhiang Dong
Miao Xie
Xiaoqiang Li
author_facet Zhiang Dong
Miao Xie
Xiaoqiang Li
author_sort Zhiang Dong
collection DOAJ
description Extracting good action representations from video frames is an intricate challenge due to the presence of moving objects of various sizes across current action recognition datasets. Most of the current action recognition methodologies have paid scant attention to this characteristic and have relied on deep learning models to automatically solve it. In this paper, we introduce a multi-scale receptive fields convolutional network (MSRFNet), which is fashioned after the pseudo-3D residual network architecture to mitigate the impact of scale variation in moving objects. The crux of MSRFNet is the integration of a multi-scale receptive fields block, which incorporates multiple dilated convolution layers that share identical convolutional parameters, but feature different receptive fields. MSRFNet leverages three scales of receptive fields to extract features from moving objects of diverse sizes, striving to produce scale-specific feature maps with a uniform representational power. Through visualization of the attention of MSRFNet, we analyze how the model re-allocates its attention to moving objects after implementing the multi-scale receptive fields approach. Experimental results on the benchmark dataset demonstrate that MSRFNet achieves improvement of 3.2% on UCF101, improvement of 5.8% on HMDB51, and improvement of 7.7% on Kinetics-400 compared with the baseline. Compared with state-of-the-art techniques, MSRFNet gets comparable or superior results, thereby affirming the effectiveness of the proposed approach.
first_indexed 2024-03-11T06:59:40Z
format Article
id doaj.art-891482b3d31f4473afd8793ec7afab2f
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T06:59:40Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-891482b3d31f4473afd8793ec7afab2f2023-11-17T09:21:12ZengMDPI AGApplied Sciences2076-34172023-03-01136340310.3390/app13063403Multi-Scale Receptive Fields Convolutional Network for Action RecognitionZhiang Dong0Miao Xie1Xiaoqiang Li2School of Software Technology, Zhejiang University, Ningbo 315048, ChinaSchool of Computer Engineering and Science, Shanghai University, Shanghai 200444, ChinaSchool of Computer Engineering and Science, Shanghai University, Shanghai 200444, ChinaExtracting good action representations from video frames is an intricate challenge due to the presence of moving objects of various sizes across current action recognition datasets. Most of the current action recognition methodologies have paid scant attention to this characteristic and have relied on deep learning models to automatically solve it. In this paper, we introduce a multi-scale receptive fields convolutional network (MSRFNet), which is fashioned after the pseudo-3D residual network architecture to mitigate the impact of scale variation in moving objects. The crux of MSRFNet is the integration of a multi-scale receptive fields block, which incorporates multiple dilated convolution layers that share identical convolutional parameters, but feature different receptive fields. MSRFNet leverages three scales of receptive fields to extract features from moving objects of diverse sizes, striving to produce scale-specific feature maps with a uniform representational power. Through visualization of the attention of MSRFNet, we analyze how the model re-allocates its attention to moving objects after implementing the multi-scale receptive fields approach. Experimental results on the benchmark dataset demonstrate that MSRFNet achieves improvement of 3.2% on UCF101, improvement of 5.8% on HMDB51, and improvement of 7.7% on Kinetics-400 compared with the baseline. Compared with state-of-the-art techniques, MSRFNet gets comparable or superior results, thereby affirming the effectiveness of the proposed approach.https://www.mdpi.com/2076-3417/13/6/3403action recognitionpseudo-3D residual networksmulti-scale receptive fields
spellingShingle Zhiang Dong
Miao Xie
Xiaoqiang Li
Multi-Scale Receptive Fields Convolutional Network for Action Recognition
Applied Sciences
action recognition
pseudo-3D residual networks
multi-scale receptive fields
title Multi-Scale Receptive Fields Convolutional Network for Action Recognition
title_full Multi-Scale Receptive Fields Convolutional Network for Action Recognition
title_fullStr Multi-Scale Receptive Fields Convolutional Network for Action Recognition
title_full_unstemmed Multi-Scale Receptive Fields Convolutional Network for Action Recognition
title_short Multi-Scale Receptive Fields Convolutional Network for Action Recognition
title_sort multi scale receptive fields convolutional network for action recognition
topic action recognition
pseudo-3D residual networks
multi-scale receptive fields
url https://www.mdpi.com/2076-3417/13/6/3403
work_keys_str_mv AT zhiangdong multiscalereceptivefieldsconvolutionalnetworkforactionrecognition
AT miaoxie multiscalereceptivefieldsconvolutionalnetworkforactionrecognition
AT xiaoqiangli multiscalereceptivefieldsconvolutionalnetworkforactionrecognition