End-to-End Background Subtraction via a Multi-Scale Spatio-Temporal Model

Background subtraction is an important task in computer vision. Traditional approaches usually utilize low-level visual features like color, texture, or edge to build background models. Due to the lack of deep features, they often achieve poor performance when facing complex video scenes such as ill...

Full description

Bibliographic Details
Main Authors: Yizhong Yang, Tao Zhang, Jinzhao Hu, Dong Xu, Guangjun Xie
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8768285/
Description
Summary:Background subtraction is an important task in computer vision. Traditional approaches usually utilize low-level visual features like color, texture, or edge to build background models. Due to the lack of deep features, they often achieve poor performance when facing complex video scenes such as illumination changes, background, or camera motions, camouflage effects and shadows. Recently, deep learning has shown to perform well in extracting deep features. To improve the robustness of background subtraction, in this paper, we propose an end-to-end multi-scale spatio-temporal (MS-ST) method which is able to extract deep features from video sequences. First, a video clip is input into a convolutional neural network for extracting multi-scale spatial features. Subsequently, to exploit the temporal information, we combine temporal sampling operations and ConvLSTM modules to extract the multi-scale temporal contextual information. Finally, the segmentation result is generated by fusing multi-scale spatio-temporal features. The experimental results on the CDnet-2014 dataset and the LASIESTA dataset demonstrate the effectiveness and superiority of the proposed method.
ISSN:2169-3536