Enhancing Surveillance Systems: Integration of Object, Behavior, and Space Information in Captions for Advanced Risk Assessment

This paper presents a novel approach to risk assessment by incorporating image captioning as a fundamental component to enhance the effectiveness of surveillance systems. The proposed surveillance system utilizes image captioning to generate descriptive captions that portray the relationship between...

Full description

Bibliographic Details
Main Authors: Minseong Jeon, Jaepil Ko, Kyungjoo Cheoi
Format: Article
Language:English
Published: MDPI AG 2024-01-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/24/1/292
Description
Summary:This paper presents a novel approach to risk assessment by incorporating image captioning as a fundamental component to enhance the effectiveness of surveillance systems. The proposed surveillance system utilizes image captioning to generate descriptive captions that portray the relationship between objects, actions, and space elements within the observed scene. Subsequently, it evaluates the risk level based on the content of these captions. After defining the risk levels to be detected in the surveillance system, we constructed a dataset consisting of [Image-Caption-Danger Score]. Our dataset offers caption data presented in a unique sentence format, departing from conventional caption styles. This unique format enables a comprehensive interpretation of surveillance scenes by considering various elements, such as objects, actions, and spatial context. We fine-tuned the BLIP-2 model using our dataset to generate captions, and captions were then interpreted with BERT to evaluate the risk level of each scene, categorizing them into stages ranging from 1 to 7. Multiple experiments provided empirical support for the effectiveness of the proposed system, demonstrating significant accuracy rates of 92.3%, 89.8%, and 94.3% for three distinct risk levels: safety, hazard, and danger, respectively.
ISSN:1424-8220