Multi-Scale Receptive Fields Convolutional Network for Action Recognition
Extracting good action representations from video frames is an intricate challenge due to the presence of moving objects of various sizes across current action recognition datasets. Most of the current action recognition methodologies have paid scant attention to this characteristic and have relied...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-03-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/6/3403 |
_version_ | 1797613714240700416 |
---|---|
author | Zhiang Dong Miao Xie Xiaoqiang Li |
author_facet | Zhiang Dong Miao Xie Xiaoqiang Li |
author_sort | Zhiang Dong |
collection | DOAJ |
description | Extracting good action representations from video frames is an intricate challenge due to the presence of moving objects of various sizes across current action recognition datasets. Most of the current action recognition methodologies have paid scant attention to this characteristic and have relied on deep learning models to automatically solve it. In this paper, we introduce a multi-scale receptive fields convolutional network (MSRFNet), which is fashioned after the pseudo-3D residual network architecture to mitigate the impact of scale variation in moving objects. The crux of MSRFNet is the integration of a multi-scale receptive fields block, which incorporates multiple dilated convolution layers that share identical convolutional parameters, but feature different receptive fields. MSRFNet leverages three scales of receptive fields to extract features from moving objects of diverse sizes, striving to produce scale-specific feature maps with a uniform representational power. Through visualization of the attention of MSRFNet, we analyze how the model re-allocates its attention to moving objects after implementing the multi-scale receptive fields approach. Experimental results on the benchmark dataset demonstrate that MSRFNet achieves improvement of 3.2% on UCF101, improvement of 5.8% on HMDB51, and improvement of 7.7% on Kinetics-400 compared with the baseline. Compared with state-of-the-art techniques, MSRFNet gets comparable or superior results, thereby affirming the effectiveness of the proposed approach. |
first_indexed | 2024-03-11T06:59:40Z |
format | Article |
id | doaj.art-891482b3d31f4473afd8793ec7afab2f |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-11T06:59:40Z |
publishDate | 2023-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-891482b3d31f4473afd8793ec7afab2f2023-11-17T09:21:12ZengMDPI AGApplied Sciences2076-34172023-03-01136340310.3390/app13063403Multi-Scale Receptive Fields Convolutional Network for Action RecognitionZhiang Dong0Miao Xie1Xiaoqiang Li2School of Software Technology, Zhejiang University, Ningbo 315048, ChinaSchool of Computer Engineering and Science, Shanghai University, Shanghai 200444, ChinaSchool of Computer Engineering and Science, Shanghai University, Shanghai 200444, ChinaExtracting good action representations from video frames is an intricate challenge due to the presence of moving objects of various sizes across current action recognition datasets. Most of the current action recognition methodologies have paid scant attention to this characteristic and have relied on deep learning models to automatically solve it. In this paper, we introduce a multi-scale receptive fields convolutional network (MSRFNet), which is fashioned after the pseudo-3D residual network architecture to mitigate the impact of scale variation in moving objects. The crux of MSRFNet is the integration of a multi-scale receptive fields block, which incorporates multiple dilated convolution layers that share identical convolutional parameters, but feature different receptive fields. MSRFNet leverages three scales of receptive fields to extract features from moving objects of diverse sizes, striving to produce scale-specific feature maps with a uniform representational power. Through visualization of the attention of MSRFNet, we analyze how the model re-allocates its attention to moving objects after implementing the multi-scale receptive fields approach. Experimental results on the benchmark dataset demonstrate that MSRFNet achieves improvement of 3.2% on UCF101, improvement of 5.8% on HMDB51, and improvement of 7.7% on Kinetics-400 compared with the baseline. Compared with state-of-the-art techniques, MSRFNet gets comparable or superior results, thereby affirming the effectiveness of the proposed approach.https://www.mdpi.com/2076-3417/13/6/3403action recognitionpseudo-3D residual networksmulti-scale receptive fields |
spellingShingle | Zhiang Dong Miao Xie Xiaoqiang Li Multi-Scale Receptive Fields Convolutional Network for Action Recognition Applied Sciences action recognition pseudo-3D residual networks multi-scale receptive fields |
title | Multi-Scale Receptive Fields Convolutional Network for Action Recognition |
title_full | Multi-Scale Receptive Fields Convolutional Network for Action Recognition |
title_fullStr | Multi-Scale Receptive Fields Convolutional Network for Action Recognition |
title_full_unstemmed | Multi-Scale Receptive Fields Convolutional Network for Action Recognition |
title_short | Multi-Scale Receptive Fields Convolutional Network for Action Recognition |
title_sort | multi scale receptive fields convolutional network for action recognition |
topic | action recognition pseudo-3D residual networks multi-scale receptive fields |
url | https://www.mdpi.com/2076-3417/13/6/3403 |
work_keys_str_mv | AT zhiangdong multiscalereceptivefieldsconvolutionalnetworkforactionrecognition AT miaoxie multiscalereceptivefieldsconvolutionalnetworkforactionrecognition AT xiaoqiangli multiscalereceptivefieldsconvolutionalnetworkforactionrecognition |