Deep anomaly detection through visual attention in surveillance videos

Abstract This paper describes a method for learning anomaly behavior in the video by finding an attention region from spatiotemporal information, in contrast to the full-frame learning. In our proposed method, a robust background subtraction (BG) for extracting motion, indicating the location of att...

Full description

Bibliographic Details
Main Authors:	Nasaruddin Nasaruddin, Kahlil Muchtar, Afdhal Afdhal, Alvin Prayuda Juniarta Dwiyantoro
Format:	Article
Language:	English
Published:	SpringerOpen 2020-10-01
Series:	Journal of Big Data
Subjects:	Visual attention approach Convolutional neural network (CNN) Integrated surveillance system Anomaly classification
Online Access:	http://link.springer.com/article/10.1186/s40537-020-00365-y

_version_	1828495130435256320
author	Nasaruddin Nasaruddin Kahlil Muchtar Afdhal Afdhal Alvin Prayuda Juniarta Dwiyantoro
author_facet	Nasaruddin Nasaruddin Kahlil Muchtar Afdhal Afdhal Alvin Prayuda Juniarta Dwiyantoro
author_sort	Nasaruddin Nasaruddin
collection	DOAJ
description	Abstract This paper describes a method for learning anomaly behavior in the video by finding an attention region from spatiotemporal information, in contrast to the full-frame learning. In our proposed method, a robust background subtraction (BG) for extracting motion, indicating the location of attention regions is employed. The resulting regions are finally fed into a three-dimensional Convolutional Neural Network (3D CNN). Specifically, by taking advantage of C3D (Convolution 3-dimensional), to completely exploit spatiotemporal relation, a deep convolution network is developed to distinguish normal and anomalous events. Our system is trained and tested against a large-scale UCF-Crime anomaly dataset for validating its effectiveness. This dataset contains 1900 long and untrimmed real-world surveillance videos and splits into 950 anomaly events and 950 normal events, respectively. In total, there are approximately ~ 13 million frames are learned during the training and testing phase. As shown in the experiments section, in terms of accuracy, the proposed visual attention model can obtain 99.25 accuracies. From the industrial application point of view, the extraction of this attention region can assist the security officer on focusing on the corresponding anomaly region, instead of a wider, full-framed inspection.
first_indexed	2024-12-11T12:07:44Z
format	Article
id	doaj.art-3a7831f4d3204bd98a28df4a4e02aa20
institution	Directory Open Access Journal
issn	2196-1115
language	English
last_indexed	2024-12-11T12:07:44Z
publishDate	2020-10-01
publisher	SpringerOpen
record_format	Article
series	Journal of Big Data
spelling	doaj.art-3a7831f4d3204bd98a28df4a4e02aa202022-12-22T01:07:53ZengSpringerOpenJournal of Big Data2196-11152020-10-017111710.1186/s40537-020-00365-yDeep anomaly detection through visual attention in surveillance videosNasaruddin Nasaruddin0Kahlil Muchtar1Afdhal Afdhal2Alvin Prayuda Juniarta Dwiyantoro3Department of Electrical and Computer Engineering, Syiah Kuala UniversityDepartment of Electrical and Computer Engineering, Syiah Kuala UniversityDepartment of Electrical and Computer Engineering, Syiah Kuala UniversityNodefluxAbstract This paper describes a method for learning anomaly behavior in the video by finding an attention region from spatiotemporal information, in contrast to the full-frame learning. In our proposed method, a robust background subtraction (BG) for extracting motion, indicating the location of attention regions is employed. The resulting regions are finally fed into a three-dimensional Convolutional Neural Network (3D CNN). Specifically, by taking advantage of C3D (Convolution 3-dimensional), to completely exploit spatiotemporal relation, a deep convolution network is developed to distinguish normal and anomalous events. Our system is trained and tested against a large-scale UCF-Crime anomaly dataset for validating its effectiveness. This dataset contains 1900 long and untrimmed real-world surveillance videos and splits into 950 anomaly events and 950 normal events, respectively. In total, there are approximately ~ 13 million frames are learned during the training and testing phase. As shown in the experiments section, in terms of accuracy, the proposed visual attention model can obtain 99.25 accuracies. From the industrial application point of view, the extraction of this attention region can assist the security officer on focusing on the corresponding anomaly region, instead of a wider, full-framed inspection.http://link.springer.com/article/10.1186/s40537-020-00365-yVisual attention approachConvolutional neural network (CNN)Integrated surveillance systemAnomaly classification
spellingShingle	Nasaruddin Nasaruddin Kahlil Muchtar Afdhal Afdhal Alvin Prayuda Juniarta Dwiyantoro Deep anomaly detection through visual attention in surveillance videos Journal of Big Data Visual attention approach Convolutional neural network (CNN) Integrated surveillance system Anomaly classification
title	Deep anomaly detection through visual attention in surveillance videos
title_full	Deep anomaly detection through visual attention in surveillance videos
title_fullStr	Deep anomaly detection through visual attention in surveillance videos
title_full_unstemmed	Deep anomaly detection through visual attention in surveillance videos
title_short	Deep anomaly detection through visual attention in surveillance videos
title_sort	deep anomaly detection through visual attention in surveillance videos
topic	Visual attention approach Convolutional neural network (CNN) Integrated surveillance system Anomaly classification
url	http://link.springer.com/article/10.1186/s40537-020-00365-y
work_keys_str_mv	AT nasaruddinnasaruddin deepanomalydetectionthroughvisualattentioninsurveillancevideos AT kahlilmuchtar deepanomalydetectionthroughvisualattentioninsurveillancevideos AT afdhalafdhal deepanomalydetectionthroughvisualattentioninsurveillancevideos AT alvinprayudajuniartadwiyantoro deepanomalydetectionthroughvisualattentioninsurveillancevideos

Deep anomaly detection through visual attention in surveillance videos

Similar Items