A Spatio‐Temporal Enhanced Graph‐Transformer AutoEncoder embedded pose for anomaly detection

Abstract Due to the robustness of skeleton data to human scale, illumination changes, dynamic camera views, and complex backgrounds, great progress has been made in skeleton‐based video anomaly detection in recent years. The spatio‐temporal graph convolutional network has been proven to be effective...

Full description

Bibliographic Details
Main Authors:	Honglei Zhu, Pengjuan Wei, Zhigang Xu
Format:	Article
Language:	English
Published:	Wiley 2024-04-01
Series:	IET Computer Vision
Subjects:	computer vision convolutional neural nets feature extraction pose estimation video surveillance
Online Access:	https://doi.org/10.1049/cvi2.12257

_version_	1797210094115487744
author	Honglei Zhu Pengjuan Wei Zhigang Xu
author_facet	Honglei Zhu Pengjuan Wei Zhigang Xu
author_sort	Honglei Zhu
collection	DOAJ
description	Abstract Due to the robustness of skeleton data to human scale, illumination changes, dynamic camera views, and complex backgrounds, great progress has been made in skeleton‐based video anomaly detection in recent years. The spatio‐temporal graph convolutional network has been proven to be effective in modelling the spatio‐temporal dependencies of non‐Euclidean data such as human skeleton graphs, and the autoencoder based on this basic unit is widely used to model sequence features. However, due to the limitations of the convolution kernel, the model cannot capture the correlation between non‐adjacent joints, and it is difficult to deal with long‐term sequences, resulting in an insufficient understanding of behaviour. To address this issue, this paper applies the Transformer to the human skeleton and proposes the Spatio‐Temporal Enhanced Graph‐Transformer AutoEncoder (STEGT‐AE) to improve the capability of modelling. In addition, the multi‐memory model with skip connections is employed to provide different levels of coding features, thereby enhancing the ability of the model to distinguish similar heterogeneous behaviours. Furthermore, the STEGT‐AE has a single encoder‐double decoder architecture, which can improve the detection performance by the combining reconstruction and prediction error. The experimental results show that performances of STEGT‐AE is significantly better than other advanced algorithms on four baseline datasets.
first_indexed	2024-04-24T10:05:07Z
format	Article
id	doaj.art-41d360b7678b4b1285f23c36b7d97b4f
institution	Directory Open Access Journal
issn	1751-9632 1751-9640
language	English
last_indexed	2024-04-24T10:05:07Z
publishDate	2024-04-01
publisher	Wiley
record_format	Article
series	IET Computer Vision
spelling	doaj.art-41d360b7678b4b1285f23c36b7d97b4f2024-04-13T04:15:00ZengWileyIET Computer Vision1751-96321751-96402024-04-0118340541910.1049/cvi2.12257A Spatio‐Temporal Enhanced Graph‐Transformer AutoEncoder embedded pose for anomaly detectionHonglei Zhu0Pengjuan Wei1Zhigang Xu2School of Computer and Communication Lanzhou University of Technology Lanzhou Gansu ChinaSchool of Computer and Communication Lanzhou University of Technology Lanzhou Gansu ChinaSchool of Computer and Communication Lanzhou University of Technology Lanzhou Gansu ChinaAbstract Due to the robustness of skeleton data to human scale, illumination changes, dynamic camera views, and complex backgrounds, great progress has been made in skeleton‐based video anomaly detection in recent years. The spatio‐temporal graph convolutional network has been proven to be effective in modelling the spatio‐temporal dependencies of non‐Euclidean data such as human skeleton graphs, and the autoencoder based on this basic unit is widely used to model sequence features. However, due to the limitations of the convolution kernel, the model cannot capture the correlation between non‐adjacent joints, and it is difficult to deal with long‐term sequences, resulting in an insufficient understanding of behaviour. To address this issue, this paper applies the Transformer to the human skeleton and proposes the Spatio‐Temporal Enhanced Graph‐Transformer AutoEncoder (STEGT‐AE) to improve the capability of modelling. In addition, the multi‐memory model with skip connections is employed to provide different levels of coding features, thereby enhancing the ability of the model to distinguish similar heterogeneous behaviours. Furthermore, the STEGT‐AE has a single encoder‐double decoder architecture, which can improve the detection performance by the combining reconstruction and prediction error. The experimental results show that performances of STEGT‐AE is significantly better than other advanced algorithms on four baseline datasets.https://doi.org/10.1049/cvi2.12257computer visionconvolutional neural netsfeature extractionpose estimationvideo surveillance
spellingShingle	Honglei Zhu Pengjuan Wei Zhigang Xu A Spatio‐Temporal Enhanced Graph‐Transformer AutoEncoder embedded pose for anomaly detection IET Computer Vision computer vision convolutional neural nets feature extraction pose estimation video surveillance
title	A Spatio‐Temporal Enhanced Graph‐Transformer AutoEncoder embedded pose for anomaly detection
title_full	A Spatio‐Temporal Enhanced Graph‐Transformer AutoEncoder embedded pose for anomaly detection
title_fullStr	A Spatio‐Temporal Enhanced Graph‐Transformer AutoEncoder embedded pose for anomaly detection
title_full_unstemmed	A Spatio‐Temporal Enhanced Graph‐Transformer AutoEncoder embedded pose for anomaly detection
title_short	A Spatio‐Temporal Enhanced Graph‐Transformer AutoEncoder embedded pose for anomaly detection
title_sort	spatio temporal enhanced graph transformer autoencoder embedded pose for anomaly detection
topic	computer vision convolutional neural nets feature extraction pose estimation video surveillance
url	https://doi.org/10.1049/cvi2.12257
work_keys_str_mv	AT hongleizhu aspatiotemporalenhancedgraphtransformerautoencoderembeddedposeforanomalydetection AT pengjuanwei aspatiotemporalenhancedgraphtransformerautoencoderembeddedposeforanomalydetection AT zhigangxu aspatiotemporalenhancedgraphtransformerautoencoderembeddedposeforanomalydetection AT hongleizhu spatiotemporalenhancedgraphtransformerautoencoderembeddedposeforanomalydetection AT pengjuanwei spatiotemporalenhancedgraphtransformerautoencoderembeddedposeforanomalydetection AT zhigangxu spatiotemporalenhancedgraphtransformerautoencoderembeddedposeforanomalydetection

A Spatio‐Temporal Enhanced Graph‐Transformer AutoEncoder embedded pose for anomaly detection

Similar Items