MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features

Visual surveillance requires robust detection of foreground objects under challenging environments of abrupt lighting variation, stationary foreground objects, dynamic background objects, and severe weather conditions. Most classical algorithms leverage background model images produced by statistica...

Full description

Bibliographic Details
Main Authors:	Jae-Yeul Kim, Jong-Eun Ha
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Deep learning foreground object detection spatiotemporal information visual surveillance
Online Access:	https://ieeexplore.ieee.org/document/10371296/

_version_	1797373110336356352
author	Jae-Yeul Kim Jong-Eun Ha
author_facet	Jae-Yeul Kim Jong-Eun Ha
author_sort	Jae-Yeul Kim
collection	DOAJ
description	Visual surveillance requires robust detection of foreground objects under challenging environments of abrupt lighting variation, stationary foreground objects, dynamic background objects, and severe weather conditions. Most classical algorithms leverage background model images produced by statistical modeling of the change of brightness values over time. Since they have difficulties using global features, many false detections occur at the stationary foreground regions and dynamic background objects. Recent deep learning-based methods can easily reflect global characteristics compared to classical methods. However, deep learning-based methods still need to be improved in utilizing spatiotemporal information. We propose an algorithm for efficiently using spatiotemporal information by adopting a split and merge framework. First, we split spatiotemporal information on successive multiple images into spatial and temporal parts using two sub-networks of semantic and motion networks. Finally, separated information is fused in a spatiotemporal fusion network. The proposed network consists of three sub-networks, which we note as MSF-NET (Motion and Semantic features Fusion NETwork). Also, we propose a method to train the proposed MSF-NET stably. Compared to the latest deep learning algorithms, the proposed MSF-NET gives 9% and 13% higher FM in the LASIESTA and SBI datasets. Also, we designed the proposed MSF-NET to be lightweight to run in real-time on a desktop GPU.
first_indexed	2024-03-08T18:45:39Z
format	Article
id	doaj.art-0cee51602ba44baf89c00ab6a86020fe
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-08T18:45:39Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-0cee51602ba44baf89c00ab6a86020fe2023-12-29T00:03:45ZengIEEEIEEE Access2169-35362023-01-011114555114556510.1109/ACCESS.2023.334584210371296MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic FeaturesJae-Yeul Kim0https://orcid.org/0000-0002-7765-4972Jong-Eun Ha1https://orcid.org/0000-0002-4144-1000Graduate School of Information and Communication Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South KoreaDepartment of Mechanical and Automotive Engineering, Seoul National University of Science and Technology, Seoul, South KoreaVisual surveillance requires robust detection of foreground objects under challenging environments of abrupt lighting variation, stationary foreground objects, dynamic background objects, and severe weather conditions. Most classical algorithms leverage background model images produced by statistical modeling of the change of brightness values over time. Since they have difficulties using global features, many false detections occur at the stationary foreground regions and dynamic background objects. Recent deep learning-based methods can easily reflect global characteristics compared to classical methods. However, deep learning-based methods still need to be improved in utilizing spatiotemporal information. We propose an algorithm for efficiently using spatiotemporal information by adopting a split and merge framework. First, we split spatiotemporal information on successive multiple images into spatial and temporal parts using two sub-networks of semantic and motion networks. Finally, separated information is fused in a spatiotemporal fusion network. The proposed network consists of three sub-networks, which we note as MSF-NET (Motion and Semantic features Fusion NETwork). Also, we propose a method to train the proposed MSF-NET stably. Compared to the latest deep learning algorithms, the proposed MSF-NET gives 9% and 13% higher FM in the LASIESTA and SBI datasets. Also, we designed the proposed MSF-NET to be lightweight to run in real-time on a desktop GPU.https://ieeexplore.ieee.org/document/10371296/Deep learningforeground object detectionspatiotemporal informationvisual surveillance
spellingShingle	Jae-Yeul Kim Jong-Eun Ha MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features IEEE Access Deep learning foreground object detection spatiotemporal information visual surveillance
title	MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features
title_full	MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features
title_fullStr	MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features
title_full_unstemmed	MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features
title_short	MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features
title_sort	msf net foreground objects detection with fusion of motion and semantic features
topic	Deep learning foreground object detection spatiotemporal information visual surveillance
url	https://ieeexplore.ieee.org/document/10371296/
work_keys_str_mv	AT jaeyeulkim msfnetforegroundobjectsdetectionwithfusionofmotionandsemanticfeatures AT jongeunha msfnetforegroundobjectsdetectionwithfusionofmotionandsemanticfeatures

MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features

Similar Items