STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video.

Most deep learning-based action recognition models focus only on short-term motions, so the model often causes misjudgments of actions that are combined by multiple processes, such as long jump, high jump, etc. The proposal of Temporal Segment Networks (TSN) enables the network to capture long-term...

Full description

Bibliographic Details
Main Authors:	Guoan Yang, Yong Yang, Zhengzhi Lu, Junjie Yang, Deyang Liu, Chuanbo Zhou, Zien Fan
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2022-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0265115

_version_	1828385929930211328
author	Guoan Yang Yong Yang Zhengzhi Lu Junjie Yang Deyang Liu Chuanbo Zhou Zien Fan
author_facet	Guoan Yang Yong Yang Zhengzhi Lu Junjie Yang Deyang Liu Chuanbo Zhou Zien Fan
author_sort	Guoan Yang
collection	DOAJ
description	Most deep learning-based action recognition models focus only on short-term motions, so the model often causes misjudgments of actions that are combined by multiple processes, such as long jump, high jump, etc. The proposal of Temporal Segment Networks (TSN) enables the network to capture long-term information in the video, but ignores that some unrelated frames or areas in the video can also cause great interference to action recognition. To solve this problem, a soft attention mechanism is introduced in TSN and a Spatial-Temporal Attention Temporal Segment Networks (STA-TSN), which retains the ability to capture long-term information and enables the network to adaptively focus on key features in space and time, is proposed. First, a multi-scale spatial focus feature enhancement strategy is proposed to fuse original convolution features with multi-scale spatial focus features obtained through a soft attention mechanism with spatial pyramid pooling. Second, a deep learning-based key frames exploration module, which utilizes a soft attention mechanism based on Long-Short Term Memory (LSTM) to adaptively learn temporal attention weights, is designed. Third, a temporal-attention regularization is developed to guide our STA-TSN to better realize the exploration of key frames. Finally, the experimental results show that our proposed STA-TSN outperforms TSN in the four public datasets UCF101, HMDB51, JHMDB and THUMOS14, as well as achieves state-of-the-art results.
first_indexed	2024-12-10T05:28:27Z
format	Article
id	doaj.art-1f771825323b46f28f2d6fcfc8625921
institution	Directory Open Access Journal
issn	1932-6203
language	English
last_indexed	2024-12-10T05:28:27Z
publishDate	2022-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj.art-1f771825323b46f28f2d6fcfc86259212022-12-22T02:00:37ZengPublic Library of Science (PLoS)PLoS ONE1932-62032022-01-01173e026511510.1371/journal.pone.0265115STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video.Guoan YangYong YangZhengzhi LuJunjie YangDeyang LiuChuanbo ZhouZien FanMost deep learning-based action recognition models focus only on short-term motions, so the model often causes misjudgments of actions that are combined by multiple processes, such as long jump, high jump, etc. The proposal of Temporal Segment Networks (TSN) enables the network to capture long-term information in the video, but ignores that some unrelated frames or areas in the video can also cause great interference to action recognition. To solve this problem, a soft attention mechanism is introduced in TSN and a Spatial-Temporal Attention Temporal Segment Networks (STA-TSN), which retains the ability to capture long-term information and enables the network to adaptively focus on key features in space and time, is proposed. First, a multi-scale spatial focus feature enhancement strategy is proposed to fuse original convolution features with multi-scale spatial focus features obtained through a soft attention mechanism with spatial pyramid pooling. Second, a deep learning-based key frames exploration module, which utilizes a soft attention mechanism based on Long-Short Term Memory (LSTM) to adaptively learn temporal attention weights, is designed. Third, a temporal-attention regularization is developed to guide our STA-TSN to better realize the exploration of key frames. Finally, the experimental results show that our proposed STA-TSN outperforms TSN in the four public datasets UCF101, HMDB51, JHMDB and THUMOS14, as well as achieves state-of-the-art results.https://doi.org/10.1371/journal.pone.0265115
spellingShingle	Guoan Yang Yong Yang Zhengzhi Lu Junjie Yang Deyang Liu Chuanbo Zhou Zien Fan STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video. PLoS ONE
title	STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video.
title_full	STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video.
title_fullStr	STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video.
title_full_unstemmed	STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video.
title_short	STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video.
title_sort	sta tsn spatial temporal attention temporal segment network for action recognition in video
url	https://doi.org/10.1371/journal.pone.0265115
work_keys_str_mv	AT guoanyang statsnspatialtemporalattentiontemporalsegmentnetworkforactionrecognitioninvideo AT yongyang statsnspatialtemporalattentiontemporalsegmentnetworkforactionrecognitioninvideo AT zhengzhilu statsnspatialtemporalattentiontemporalsegmentnetworkforactionrecognitioninvideo AT junjieyang statsnspatialtemporalattentiontemporalsegmentnetworkforactionrecognitioninvideo AT deyangliu statsnspatialtemporalattentiontemporalsegmentnetworkforactionrecognitioninvideo AT chuanbozhou statsnspatialtemporalattentiontemporalsegmentnetworkforactionrecognitioninvideo AT zienfan statsnspatialtemporalattentiontemporalsegmentnetworkforactionrecognitioninvideo

STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video.

Similar Items