MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features
Visual surveillance requires robust detection of foreground objects under challenging environments of abrupt lighting variation, stationary foreground objects, dynamic background objects, and severe weather conditions. Most classical algorithms leverage background model images produced by statistica...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10371296/ |
_version_ | 1797373110336356352 |
---|---|
author | Jae-Yeul Kim Jong-Eun Ha |
author_facet | Jae-Yeul Kim Jong-Eun Ha |
author_sort | Jae-Yeul Kim |
collection | DOAJ |
description | Visual surveillance requires robust detection of foreground objects under challenging environments of abrupt lighting variation, stationary foreground objects, dynamic background objects, and severe weather conditions. Most classical algorithms leverage background model images produced by statistical modeling of the change of brightness values over time. Since they have difficulties using global features, many false detections occur at the stationary foreground regions and dynamic background objects. Recent deep learning-based methods can easily reflect global characteristics compared to classical methods. However, deep learning-based methods still need to be improved in utilizing spatiotemporal information. We propose an algorithm for efficiently using spatiotemporal information by adopting a split and merge framework. First, we split spatiotemporal information on successive multiple images into spatial and temporal parts using two sub-networks of semantic and motion networks. Finally, separated information is fused in a spatiotemporal fusion network. The proposed network consists of three sub-networks, which we note as MSF-NET (Motion and Semantic features Fusion NETwork). Also, we propose a method to train the proposed MSF-NET stably. Compared to the latest deep learning algorithms, the proposed MSF-NET gives 9% and 13% higher FM in the LASIESTA and SBI datasets. Also, we designed the proposed MSF-NET to be lightweight to run in real-time on a desktop GPU. |
first_indexed | 2024-03-08T18:45:39Z |
format | Article |
id | doaj.art-0cee51602ba44baf89c00ab6a86020fe |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-08T18:45:39Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-0cee51602ba44baf89c00ab6a86020fe2023-12-29T00:03:45ZengIEEEIEEE Access2169-35362023-01-011114555114556510.1109/ACCESS.2023.334584210371296MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic FeaturesJae-Yeul Kim0https://orcid.org/0000-0002-7765-4972Jong-Eun Ha1https://orcid.org/0000-0002-4144-1000Graduate School of Information and Communication Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South KoreaDepartment of Mechanical and Automotive Engineering, Seoul National University of Science and Technology, Seoul, South KoreaVisual surveillance requires robust detection of foreground objects under challenging environments of abrupt lighting variation, stationary foreground objects, dynamic background objects, and severe weather conditions. Most classical algorithms leverage background model images produced by statistical modeling of the change of brightness values over time. Since they have difficulties using global features, many false detections occur at the stationary foreground regions and dynamic background objects. Recent deep learning-based methods can easily reflect global characteristics compared to classical methods. However, deep learning-based methods still need to be improved in utilizing spatiotemporal information. We propose an algorithm for efficiently using spatiotemporal information by adopting a split and merge framework. First, we split spatiotemporal information on successive multiple images into spatial and temporal parts using two sub-networks of semantic and motion networks. Finally, separated information is fused in a spatiotemporal fusion network. The proposed network consists of three sub-networks, which we note as MSF-NET (Motion and Semantic features Fusion NETwork). Also, we propose a method to train the proposed MSF-NET stably. Compared to the latest deep learning algorithms, the proposed MSF-NET gives 9% and 13% higher FM in the LASIESTA and SBI datasets. Also, we designed the proposed MSF-NET to be lightweight to run in real-time on a desktop GPU.https://ieeexplore.ieee.org/document/10371296/Deep learningforeground object detectionspatiotemporal informationvisual surveillance |
spellingShingle | Jae-Yeul Kim Jong-Eun Ha MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features IEEE Access Deep learning foreground object detection spatiotemporal information visual surveillance |
title | MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features |
title_full | MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features |
title_fullStr | MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features |
title_full_unstemmed | MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features |
title_short | MSF-NET: Foreground Objects Detection With Fusion of Motion and Semantic Features |
title_sort | msf net foreground objects detection with fusion of motion and semantic features |
topic | Deep learning foreground object detection spatiotemporal information visual surveillance |
url | https://ieeexplore.ieee.org/document/10371296/ |
work_keys_str_mv | AT jaeyeulkim msfnetforegroundobjectsdetectionwithfusionofmotionandsemanticfeatures AT jongeunha msfnetforegroundobjectsdetectionwithfusionofmotionandsemanticfeatures |