Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model

The violence in public areas occurs frequently and video surveillance is of great significance for maintaining public safety.Compared with fixed cameras,unmanned aerial vehicles (UAVs) have surveillance mobility.However,in aerial images,the rapid movement of UAVs as well as the change of posture and...

Full description

Bibliographic Details
Main Author: SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu
Format: Article
Language:zho
Published: Editorial office of Computer Science 2022-06-01
Series:Jisuanji kexue
Subjects:
Online Access:https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-6-254.pdf
_version_ 1827965535680200704
author SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu
author_facet SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu
author_sort SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu
collection DOAJ
description The violence in public areas occurs frequently and video surveillance is of great significance for maintaining public safety.Compared with fixed cameras,unmanned aerial vehicles (UAVs) have surveillance mobility.However,in aerial images,the rapid movement of UAVs as well as the change of posture and height cause the problem of motion blur and large-scale change of target.To solve this problem,an attention spatial-temporal convolutional network (AST-GCN) combining attention mechanism is designed to realize the identification of violent behavior in aerial video.The proposed method is divided into two steps:the key frame detection network completes the initial positioning,and the AST-GCN network completes the behavior identification through the sequence features.Firstly,aiming at video violence localization,a key frame cascade detection network is designed to realize violence key frame detection based on human posture estimation,and preliminarily judge the occurrence time of violence.Secondly,the skeleton information of multiple frames around key frames is extracted from the video sequence,and the skeleton data is pre-processed,including normalization,screening and completion,so as to improve the robustness of different scenes and the partial missing of key nodes.And the skeleton temporal-spatial representation matrix is constructed according to the extracted skeleton information.Finally,AST-GCN network analyzes and identifies multiple frames of human skeleton information,to integrate attention module,improve feature expression ability,and complete the recognition of violent behavior.The method is validated on self-built aerial violence data set,and experimental results show that the AST-GCN can realize the recognition of aerial scene violence,and the recognition accuracy is 86.6%.The proposed method has important engineering value and scientific signifi-cance for the realization of aerial video surveillance and human pose understanding applications.
first_indexed 2024-04-09T17:34:50Z
format Article
id doaj.art-50f7055af189488d984f6077c4eaddd8
institution Directory Open Access Journal
issn 1002-137X
language zho
last_indexed 2024-04-09T17:34:50Z
publishDate 2022-06-01
publisher Editorial office of Computer Science
record_format Article
series Jisuanji kexue
spelling doaj.art-50f7055af189488d984f6077c4eaddd82023-04-18T02:32:00ZzhoEditorial office of Computer ScienceJisuanji kexue1002-137X2022-06-0149625426110.11896/jsjkx.210400272Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention ModelSHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu01 School of Information,Southwest University of Science and Technology,Mianyang,Sichuan 621000,China ;2 School of Information and Software Engineering,University of Electronic Science & Technology,Chengdu 610054,ChinaThe violence in public areas occurs frequently and video surveillance is of great significance for maintaining public safety.Compared with fixed cameras,unmanned aerial vehicles (UAVs) have surveillance mobility.However,in aerial images,the rapid movement of UAVs as well as the change of posture and height cause the problem of motion blur and large-scale change of target.To solve this problem,an attention spatial-temporal convolutional network (AST-GCN) combining attention mechanism is designed to realize the identification of violent behavior in aerial video.The proposed method is divided into two steps:the key frame detection network completes the initial positioning,and the AST-GCN network completes the behavior identification through the sequence features.Firstly,aiming at video violence localization,a key frame cascade detection network is designed to realize violence key frame detection based on human posture estimation,and preliminarily judge the occurrence time of violence.Secondly,the skeleton information of multiple frames around key frames is extracted from the video sequence,and the skeleton data is pre-processed,including normalization,screening and completion,so as to improve the robustness of different scenes and the partial missing of key nodes.And the skeleton temporal-spatial representation matrix is constructed according to the extracted skeleton information.Finally,AST-GCN network analyzes and identifies multiple frames of human skeleton information,to integrate attention module,improve feature expression ability,and complete the recognition of violent behavior.The method is validated on self-built aerial violence data set,and experimental results show that the AST-GCN can realize the recognition of aerial scene violence,and the recognition accuracy is 86.6%.The proposed method has important engineering value and scientific signifi-cance for the realization of aerial video surveillance and human pose understanding applications.https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-6-254.pdfviolence recognition|human pose estimation|aerial photography|spatial-temporal graph convolutional|cascade network|attention mechanism
spellingShingle SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu
Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model
Jisuanji kexue
violence recognition|human pose estimation|aerial photography|spatial-temporal graph convolutional|cascade network|attention mechanism
title Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model
title_full Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model
title_fullStr Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model
title_full_unstemmed Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model
title_short Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model
title_sort aerial violence recognition based on spatial temporal graph convolutional networks and attention model
topic violence recognition|human pose estimation|aerial photography|spatial-temporal graph convolutional|cascade network|attention mechanism
url https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-6-254.pdf
work_keys_str_mv AT shaoyanhualiwenfengzhangxiaoqiangchuhongyuraoyunbochenlu aerialviolencerecognitionbasedonspatialtemporalgraphconvolutionalnetworksandattentionmodel