Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model

The violence in public areas occurs frequently and video surveillance is of great significance for maintaining public safety.Compared with fixed cameras,unmanned aerial vehicles (UAVs) have surveillance mobility.However,in aerial images,the rapid movement of UAVs as well as the change of posture and...

Full description

Bibliographic Details
Main Author:	SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu
Format:	Article
Language:	zho
Published:	Editorial office of Computer Science 2022-06-01
Series:	Jisuanji kexue
Subjects:	violence recognition\|human pose estimation\|aerial photography\|spatial-temporal graph convolutional\|cascade network\|attention mechanism
Online Access:	https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-6-254.pdf

_version_	1827965535680200704
author	SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu
author_facet	SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu
author_sort	SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu
collection	DOAJ
description	The violence in public areas occurs frequently and video surveillance is of great significance for maintaining public safety.Compared with fixed cameras,unmanned aerial vehicles (UAVs) have surveillance mobility.However,in aerial images,the rapid movement of UAVs as well as the change of posture and height cause the problem of motion blur and large-scale change of target.To solve this problem,an attention spatial-temporal convolutional network (AST-GCN) combining attention mechanism is designed to realize the identification of violent behavior in aerial video.The proposed method is divided into two steps:the key frame detection network completes the initial positioning,and the AST-GCN network completes the behavior identification through the sequence features.Firstly,aiming at video violence localization,a key frame cascade detection network is designed to realize violence key frame detection based on human posture estimation,and preliminarily judge the occurrence time of violence.Secondly,the skeleton information of multiple frames around key frames is extracted from the video sequence,and the skeleton data is pre-processed,including normalization,screening and completion,so as to improve the robustness of different scenes and the partial missing of key nodes.And the skeleton temporal-spatial representation matrix is constructed according to the extracted skeleton information.Finally,AST-GCN network analyzes and identifies multiple frames of human skeleton information,to integrate attention module,improve feature expression ability,and complete the recognition of violent behavior.The method is validated on self-built aerial violence data set,and experimental results show that the AST-GCN can realize the recognition of aerial scene violence,and the recognition accuracy is 86.6%.The proposed method has important engineering value and scientific signifi-cance for the realization of aerial video surveillance and human pose understanding applications.
first_indexed	2024-04-09T17:34:50Z
format	Article
id	doaj.art-50f7055af189488d984f6077c4eaddd8
institution	Directory Open Access Journal
issn	1002-137X
language	zho
last_indexed	2024-04-09T17:34:50Z
publishDate	2022-06-01
publisher	Editorial office of Computer Science
record_format	Article
series	Jisuanji kexue
spelling	doaj.art-50f7055af189488d984f6077c4eaddd82023-04-18T02:32:00ZzhoEditorial office of Computer ScienceJisuanji kexue1002-137X2022-06-0149625426110.11896/jsjkx.210400272Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention ModelSHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu01 School of Information,Southwest University of Science and Technology,Mianyang,Sichuan 621000,China ;2 School of Information and Software Engineering,University of Electronic Science & Technology,Chengdu 610054,ChinaThe violence in public areas occurs frequently and video surveillance is of great significance for maintaining public safety.Compared with fixed cameras,unmanned aerial vehicles (UAVs) have surveillance mobility.However,in aerial images,the rapid movement of UAVs as well as the change of posture and height cause the problem of motion blur and large-scale change of target.To solve this problem,an attention spatial-temporal convolutional network (AST-GCN) combining attention mechanism is designed to realize the identification of violent behavior in aerial video.The proposed method is divided into two steps:the key frame detection network completes the initial positioning,and the AST-GCN network completes the behavior identification through the sequence features.Firstly,aiming at video violence localization,a key frame cascade detection network is designed to realize violence key frame detection based on human posture estimation,and preliminarily judge the occurrence time of violence.Secondly,the skeleton information of multiple frames around key frames is extracted from the video sequence,and the skeleton data is pre-processed,including normalization,screening and completion,so as to improve the robustness of different scenes and the partial missing of key nodes.And the skeleton temporal-spatial representation matrix is constructed according to the extracted skeleton information.Finally,AST-GCN network analyzes and identifies multiple frames of human skeleton information,to integrate attention module,improve feature expression ability,and complete the recognition of violent behavior.The method is validated on self-built aerial violence data set,and experimental results show that the AST-GCN can realize the recognition of aerial scene violence,and the recognition accuracy is 86.6%.The proposed method has important engineering value and scientific signifi-cance for the realization of aerial video surveillance and human pose understanding applications.https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-6-254.pdfviolence recognition\|human pose estimation\|aerial photography\|spatial-temporal graph convolutional\|cascade network\|attention mechanism
spellingShingle	SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model Jisuanji kexue violence recognition\|human pose estimation\|aerial photography\|spatial-temporal graph convolutional\|cascade network\|attention mechanism
title	Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model
title_full	Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model
title_fullStr	Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model
title_full_unstemmed	Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model
title_short	Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model
title_sort	aerial violence recognition based on spatial temporal graph convolutional networks and attention model
topic	violence recognition\|human pose estimation\|aerial photography\|spatial-temporal graph convolutional\|cascade network\|attention mechanism
url	https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-6-254.pdf
work_keys_str_mv	AT shaoyanhualiwenfengzhangxiaoqiangchuhongyuraoyunbochenlu aerialviolencerecognitionbasedonspatialtemporalgraphconvolutionalnetworksandattentionmodel

Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model

Similar Items