Action Recognition Network Based on Local Spatiotemporal Features and Global Temporal Excitation

Temporal modeling is a key problem in action recognition, and it remains difficult to accurately model temporal information of videos. In this paper, we present a local spatiotemporal extraction module (LSTE) and a channel time excitation module (CTE), which are specially designed to accurately mode...

Full description

Bibliographic Details
Main Authors: Shukai Li, Xiaofang Wang, Dongri Shan, Peng Zhang
Format: Article
Language:English
Published: MDPI AG 2023-06-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/11/6811
_version_ 1797597784298225664
author Shukai Li
Xiaofang Wang
Dongri Shan
Peng Zhang
author_facet Shukai Li
Xiaofang Wang
Dongri Shan
Peng Zhang
author_sort Shukai Li
collection DOAJ
description Temporal modeling is a key problem in action recognition, and it remains difficult to accurately model temporal information of videos. In this paper, we present a local spatiotemporal extraction module (LSTE) and a channel time excitation module (CTE), which are specially designed to accurately model temporal information in video sequences. The LSTE module first obtains difference features by computing the pixel-wise differences between adjacent frames within each video segment and then obtains local motion features by stressing the effect of the feature channels sensitive to difference information. The local motion features are merged with the spatial features to represent local spatiotemporal features of each segment. The CTE module adaptively excites time-sensitive channels by modeling the interdependencies of channels in terms of time to enhance the global temporal information. Further, the above two modules are embedded into the existing 2DCNN baseline methods to build an action recognition network based on local spatiotemporal features and global temporal excitation (LSCT). We conduct experiments on the temporal-dependent Something-Something V1 and V2 datasets. We compare the recognition results with those obtained by the current methods, which proves the effectiveness of our methods.
first_indexed 2024-03-11T03:10:21Z
format Article
id doaj.art-3532041ae8624072810f1ce5f43fa2e8
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T03:10:21Z
publishDate 2023-06-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-3532041ae8624072810f1ce5f43fa2e82023-11-18T07:37:03ZengMDPI AGApplied Sciences2076-34172023-06-011311681110.3390/app13116811Action Recognition Network Based on Local Spatiotemporal Features and Global Temporal ExcitationShukai Li0Xiaofang Wang1Dongri Shan2Peng Zhang3School of Mechanical Engineering, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250300, ChinaSchool of Information and Automation Engineering, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250300, ChinaSchool of Mechanical Engineering, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250300, ChinaSchool of Information and Automation Engineering, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250300, ChinaTemporal modeling is a key problem in action recognition, and it remains difficult to accurately model temporal information of videos. In this paper, we present a local spatiotemporal extraction module (LSTE) and a channel time excitation module (CTE), which are specially designed to accurately model temporal information in video sequences. The LSTE module first obtains difference features by computing the pixel-wise differences between adjacent frames within each video segment and then obtains local motion features by stressing the effect of the feature channels sensitive to difference information. The local motion features are merged with the spatial features to represent local spatiotemporal features of each segment. The CTE module adaptively excites time-sensitive channels by modeling the interdependencies of channels in terms of time to enhance the global temporal information. Further, the above two modules are embedded into the existing 2DCNN baseline methods to build an action recognition network based on local spatiotemporal features and global temporal excitation (LSCT). We conduct experiments on the temporal-dependent Something-Something V1 and V2 datasets. We compare the recognition results with those obtained by the current methods, which proves the effectiveness of our methods.https://www.mdpi.com/2076-3417/13/11/6811local spatiotemporal featureschannel time excitationaction recognitionfeature enhancement
spellingShingle Shukai Li
Xiaofang Wang
Dongri Shan
Peng Zhang
Action Recognition Network Based on Local Spatiotemporal Features and Global Temporal Excitation
Applied Sciences
local spatiotemporal features
channel time excitation
action recognition
feature enhancement
title Action Recognition Network Based on Local Spatiotemporal Features and Global Temporal Excitation
title_full Action Recognition Network Based on Local Spatiotemporal Features and Global Temporal Excitation
title_fullStr Action Recognition Network Based on Local Spatiotemporal Features and Global Temporal Excitation
title_full_unstemmed Action Recognition Network Based on Local Spatiotemporal Features and Global Temporal Excitation
title_short Action Recognition Network Based on Local Spatiotemporal Features and Global Temporal Excitation
title_sort action recognition network based on local spatiotemporal features and global temporal excitation
topic local spatiotemporal features
channel time excitation
action recognition
feature enhancement
url https://www.mdpi.com/2076-3417/13/11/6811
work_keys_str_mv AT shukaili actionrecognitionnetworkbasedonlocalspatiotemporalfeaturesandglobaltemporalexcitation
AT xiaofangwang actionrecognitionnetworkbasedonlocalspatiotemporalfeaturesandglobaltemporalexcitation
AT dongrishan actionrecognitionnetworkbasedonlocalspatiotemporalfeaturesandglobaltemporalexcitation
AT pengzhang actionrecognitionnetworkbasedonlocalspatiotemporalfeaturesandglobaltemporalexcitation