Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model
The violence in public areas occurs frequently and video surveillance is of great significance for maintaining public safety.Compared with fixed cameras,unmanned aerial vehicles (UAVs) have surveillance mobility.However,in aerial images,the rapid movement of UAVs as well as the change of posture and...
Main Author: | |
---|---|
Format: | Article |
Language: | zho |
Published: |
Editorial office of Computer Science
2022-06-01
|
Series: | Jisuanji kexue |
Subjects: | |
Online Access: | https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-6-254.pdf |
_version_ | 1827965535680200704 |
---|---|
author | SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu |
author_facet | SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu |
author_sort | SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu |
collection | DOAJ |
description | The violence in public areas occurs frequently and video surveillance is of great significance for maintaining public safety.Compared with fixed cameras,unmanned aerial vehicles (UAVs) have surveillance mobility.However,in aerial images,the rapid movement of UAVs as well as the change of posture and height cause the problem of motion blur and large-scale change of target.To solve this problem,an attention spatial-temporal convolutional network (AST-GCN) combining attention mechanism is designed to realize the identification of violent behavior in aerial video.The proposed method is divided into two steps:the key frame detection network completes the initial positioning,and the AST-GCN network completes the behavior identification through the sequence features.Firstly,aiming at video violence localization,a key frame cascade detection network is designed to realize violence key frame detection based on human posture estimation,and preliminarily judge the occurrence time of violence.Secondly,the skeleton information of multiple frames around key frames is extracted from the video sequence,and the skeleton data is pre-processed,including normalization,screening and completion,so as to improve the robustness of different scenes and the partial missing of key nodes.And the skeleton temporal-spatial representation matrix is constructed according to the extracted skeleton information.Finally,AST-GCN network analyzes and identifies multiple frames of human skeleton information,to integrate attention module,improve feature expression ability,and complete the recognition of violent behavior.The method is validated on self-built aerial violence data set,and experimental results show that the AST-GCN can realize the recognition of aerial scene violence,and the recognition accuracy is 86.6%.The proposed method has important engineering value and scientific signifi-cance for the realization of aerial video surveillance and human pose understanding applications. |
first_indexed | 2024-04-09T17:34:50Z |
format | Article |
id | doaj.art-50f7055af189488d984f6077c4eaddd8 |
institution | Directory Open Access Journal |
issn | 1002-137X |
language | zho |
last_indexed | 2024-04-09T17:34:50Z |
publishDate | 2022-06-01 |
publisher | Editorial office of Computer Science |
record_format | Article |
series | Jisuanji kexue |
spelling | doaj.art-50f7055af189488d984f6077c4eaddd82023-04-18T02:32:00ZzhoEditorial office of Computer ScienceJisuanji kexue1002-137X2022-06-0149625426110.11896/jsjkx.210400272Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention ModelSHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu01 School of Information,Southwest University of Science and Technology,Mianyang,Sichuan 621000,China ;2 School of Information and Software Engineering,University of Electronic Science & Technology,Chengdu 610054,ChinaThe violence in public areas occurs frequently and video surveillance is of great significance for maintaining public safety.Compared with fixed cameras,unmanned aerial vehicles (UAVs) have surveillance mobility.However,in aerial images,the rapid movement of UAVs as well as the change of posture and height cause the problem of motion blur and large-scale change of target.To solve this problem,an attention spatial-temporal convolutional network (AST-GCN) combining attention mechanism is designed to realize the identification of violent behavior in aerial video.The proposed method is divided into two steps:the key frame detection network completes the initial positioning,and the AST-GCN network completes the behavior identification through the sequence features.Firstly,aiming at video violence localization,a key frame cascade detection network is designed to realize violence key frame detection based on human posture estimation,and preliminarily judge the occurrence time of violence.Secondly,the skeleton information of multiple frames around key frames is extracted from the video sequence,and the skeleton data is pre-processed,including normalization,screening and completion,so as to improve the robustness of different scenes and the partial missing of key nodes.And the skeleton temporal-spatial representation matrix is constructed according to the extracted skeleton information.Finally,AST-GCN network analyzes and identifies multiple frames of human skeleton information,to integrate attention module,improve feature expression ability,and complete the recognition of violent behavior.The method is validated on self-built aerial violence data set,and experimental results show that the AST-GCN can realize the recognition of aerial scene violence,and the recognition accuracy is 86.6%.The proposed method has important engineering value and scientific signifi-cance for the realization of aerial video surveillance and human pose understanding applications.https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-6-254.pdfviolence recognition|human pose estimation|aerial photography|spatial-temporal graph convolutional|cascade network|attention mechanism |
spellingShingle | SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model Jisuanji kexue violence recognition|human pose estimation|aerial photography|spatial-temporal graph convolutional|cascade network|attention mechanism |
title | Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model |
title_full | Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model |
title_fullStr | Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model |
title_full_unstemmed | Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model |
title_short | Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model |
title_sort | aerial violence recognition based on spatial temporal graph convolutional networks and attention model |
topic | violence recognition|human pose estimation|aerial photography|spatial-temporal graph convolutional|cascade network|attention mechanism |
url | https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-6-254.pdf |
work_keys_str_mv | AT shaoyanhualiwenfengzhangxiaoqiangchuhongyuraoyunbochenlu aerialviolencerecognitionbasedonspatialtemporalgraphconvolutionalnetworksandattentionmodel |