MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features

Visual surveillance requires robust detection of foreground objects under challenging environments of abrupt lighting variation, stationary foreground objects, dynamic background objects, and severe weather conditions. Most classical algorithms leverage background model images produced by statistica...

Full description

Bibliographic Details
Main Authors: Jae-Yeul Kim, Jong-Eun Ha
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10371296/
_version_ 1797373110336356352
author Jae-Yeul Kim
Jong-Eun Ha
author_facet Jae-Yeul Kim
Jong-Eun Ha
author_sort Jae-Yeul Kim
collection DOAJ
description Visual surveillance requires robust detection of foreground objects under challenging environments of abrupt lighting variation, stationary foreground objects, dynamic background objects, and severe weather conditions. Most classical algorithms leverage background model images produced by statistical modeling of the change of brightness values over time. Since they have difficulties using global features, many false detections occur at the stationary foreground regions and dynamic background objects. Recent deep learning-based methods can easily reflect global characteristics compared to classical methods. However, deep learning-based methods still need to be improved in utilizing spatiotemporal information. We propose an algorithm for efficiently using spatiotemporal information by adopting a split and merge framework. First, we split spatiotemporal information on successive multiple images into spatial and temporal parts using two sub-networks of semantic and motion networks. Finally, separated information is fused in a spatiotemporal fusion network. The proposed network consists of three sub-networks, which we note as MSF-NET (Motion and Semantic features Fusion NETwork). Also, we propose a method to train the proposed MSF-NET stably. Compared to the latest deep learning algorithms, the proposed MSF-NET gives 9% and 13% higher FM in the LASIESTA and SBI datasets. Also, we designed the proposed MSF-NET to be lightweight to run in real-time on a desktop GPU.
first_indexed 2024-03-08T18:45:39Z
format Article
id doaj.art-0cee51602ba44baf89c00ab6a86020fe
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-08T18:45:39Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-0cee51602ba44baf89c00ab6a86020fe2023-12-29T00:03:45ZengIEEEIEEE Access2169-35362023-01-011114555114556510.1109/ACCESS.2023.334584210371296MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic FeaturesJae-Yeul Kim0https://orcid.org/0000-0002-7765-4972Jong-Eun Ha1https://orcid.org/0000-0002-4144-1000Graduate School of Information and Communication Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South KoreaDepartment of Mechanical and Automotive Engineering, Seoul National University of Science and Technology, Seoul, South KoreaVisual surveillance requires robust detection of foreground objects under challenging environments of abrupt lighting variation, stationary foreground objects, dynamic background objects, and severe weather conditions. Most classical algorithms leverage background model images produced by statistical modeling of the change of brightness values over time. Since they have difficulties using global features, many false detections occur at the stationary foreground regions and dynamic background objects. Recent deep learning-based methods can easily reflect global characteristics compared to classical methods. However, deep learning-based methods still need to be improved in utilizing spatiotemporal information. We propose an algorithm for efficiently using spatiotemporal information by adopting a split and merge framework. First, we split spatiotemporal information on successive multiple images into spatial and temporal parts using two sub-networks of semantic and motion networks. Finally, separated information is fused in a spatiotemporal fusion network. The proposed network consists of three sub-networks, which we note as MSF-NET (Motion and Semantic features Fusion NETwork). Also, we propose a method to train the proposed MSF-NET stably. Compared to the latest deep learning algorithms, the proposed MSF-NET gives 9% and 13% higher FM in the LASIESTA and SBI datasets. Also, we designed the proposed MSF-NET to be lightweight to run in real-time on a desktop GPU.https://ieeexplore.ieee.org/document/10371296/Deep learningforeground object detectionspatiotemporal informationvisual surveillance
spellingShingle Jae-Yeul Kim
Jong-Eun Ha
MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features
IEEE Access
Deep learning
foreground object detection
spatiotemporal information
visual surveillance
title MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features
title_full MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features
title_fullStr MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features
title_full_unstemmed MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features
title_short MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features
title_sort msf net foreground objects detection with fusion of motion and semantic features
topic Deep learning
foreground object detection
spatiotemporal information
visual surveillance
url https://ieeexplore.ieee.org/document/10371296/
work_keys_str_mv AT jaeyeulkim msfnetforegroundobjectsdetectionwithfusionofmotionandsemanticfeatures
AT jongeunha msfnetforegroundobjectsdetectionwithfusionofmotionandsemanticfeatures