A Novel Spatio-Temporal 3D Convolutional Encoder-Decoder Network for Dynamic Saliency Prediction

As human beings are living in an always changing environment, predicting saliency maps from dynamic visual stimulus is of importance for modeling human visual system. Compared with human behavior, recent models based on LSTM and 3DCNN are still not good enough due to the limitation in spatio-tempora...

Full description

Bibliographic Details
Main Authors:	Hao Li, Fei Qi, Guangming Shi
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Visual attention dynamic saliency prediction 3D fully convolutional networks spatio-temporal features
Online Access:	https://ieeexplore.ieee.org/document/9367171/

_version_	1818614284361400320
author	Hao Li Fei Qi Guangming Shi
author_facet	Hao Li Fei Qi Guangming Shi
author_sort	Hao Li
collection	DOAJ
description	As human beings are living in an always changing environment, predicting saliency maps from dynamic visual stimulus is of importance for modeling human visual system. Compared with human behavior, recent models based on LSTM and 3DCNN are still not good enough due to the limitation in spatio-temporal feature representation. In this paper, a novel 3D convolutional encoder-decoder architecture is proposed for saliency prediction on dynamic scenes. The encoder consists of two subnetworks to extract both spatial and temporal features in parallel with intermediate fusion, respectively. The saliency map is produced in decoder by firstly enlarging features in spatial dimensions and then aggregating temporal information. Specially designed structures can transfer pooling indices from encoder to decoder, which helps the generation of location-aware saliency maps. The proposed network can be trained and inferred in an end-to-end manner. Experimental results on benchmark DHF1K show that the proposed model achieves the state-of-the-art performance on key metrics including both normalized scanpath saliency and Pearson's correlation coefficient.
first_indexed	2024-12-16T16:15:34Z
format	Article
id	doaj.art-52e6f3e821814147ba648b01530a6b21
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-16T16:15:34Z
publishDate	2021-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-52e6f3e821814147ba648b01530a6b212022-12-21T22:25:05ZengIEEEIEEE Access2169-35362021-01-019363283634110.1109/ACCESS.2021.30633729367171A Novel Spatio-Temporal 3D Convolutional Encoder-Decoder Network for Dynamic Saliency PredictionHao Li0Fei Qi1https://orcid.org/0000-0002-2161-1551Guangming Shi2https://orcid.org/0000-0003-2179-3292School of Artificial Intelligence, Xidian University, Xi’an, ChinaSchool of Artificial Intelligence, Xidian University, Xi’an, ChinaSchool of Artificial Intelligence, Xidian University, Xi’an, ChinaAs human beings are living in an always changing environment, predicting saliency maps from dynamic visual stimulus is of importance for modeling human visual system. Compared with human behavior, recent models based on LSTM and 3DCNN are still not good enough due to the limitation in spatio-temporal feature representation. In this paper, a novel 3D convolutional encoder-decoder architecture is proposed for saliency prediction on dynamic scenes. The encoder consists of two subnetworks to extract both spatial and temporal features in parallel with intermediate fusion, respectively. The saliency map is produced in decoder by firstly enlarging features in spatial dimensions and then aggregating temporal information. Specially designed structures can transfer pooling indices from encoder to decoder, which helps the generation of location-aware saliency maps. The proposed network can be trained and inferred in an end-to-end manner. Experimental results on benchmark DHF1K show that the proposed model achieves the state-of-the-art performance on key metrics including both normalized scanpath saliency and Pearson's correlation coefficient.https://ieeexplore.ieee.org/document/9367171/Visual attentiondynamic saliency prediction3D fully convolutional networksspatio-temporal features
spellingShingle	Hao Li Fei Qi Guangming Shi A Novel Spatio-Temporal 3D Convolutional Encoder-Decoder Network for Dynamic Saliency Prediction IEEE Access Visual attention dynamic saliency prediction 3D fully convolutional networks spatio-temporal features
title	A Novel Spatio-Temporal 3D Convolutional Encoder-Decoder Network for Dynamic Saliency Prediction
title_full	A Novel Spatio-Temporal 3D Convolutional Encoder-Decoder Network for Dynamic Saliency Prediction
title_fullStr	A Novel Spatio-Temporal 3D Convolutional Encoder-Decoder Network for Dynamic Saliency Prediction
title_full_unstemmed	A Novel Spatio-Temporal 3D Convolutional Encoder-Decoder Network for Dynamic Saliency Prediction
title_short	A Novel Spatio-Temporal 3D Convolutional Encoder-Decoder Network for Dynamic Saliency Prediction
title_sort	novel spatio temporal 3d convolutional encoder decoder network for dynamic saliency prediction
topic	Visual attention dynamic saliency prediction 3D fully convolutional networks spatio-temporal features
url	https://ieeexplore.ieee.org/document/9367171/
work_keys_str_mv	AT haoli anovelspatiotemporal3dconvolutionalencoderdecodernetworkfordynamicsaliencyprediction AT feiqi anovelspatiotemporal3dconvolutionalencoderdecodernetworkfordynamicsaliencyprediction AT guangmingshi anovelspatiotemporal3dconvolutionalencoderdecodernetworkfordynamicsaliencyprediction AT haoli novelspatiotemporal3dconvolutionalencoderdecodernetworkfordynamicsaliencyprediction AT feiqi novelspatiotemporal3dconvolutionalencoderdecodernetworkfordynamicsaliencyprediction AT guangmingshi novelspatiotemporal3dconvolutionalencoderdecodernetworkfordynamicsaliencyprediction

A Novel Spatio-Temporal 3D Convolutional Encoder-Decoder Network for Dynamic Saliency Prediction

Similar Items