Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism

The direction of human gaze is an important indicator of human behavior, reflecting the level of attention and cognitive state towards various visual stimuli in the environment. Convolutional neural networks have achieved good performance in gaze estimation tasks, but their global modeling capabilit...

Full description

Bibliographic Details
Main Authors:	Yujie Li, Jiahui Chen, Jiaxin Ma, Xiwen Wang, Wei Zhang
Format:	Article
Language:	English
Published:	MDPI AG 2023-07-01
Series:	Sensors
Subjects:	gaze estimation swin transformer convolutional neural networks (CNN) deep learning self-attention mechanism
Online Access:	https://www.mdpi.com/1424-8220/23/13/6226

_version_	1797590800517824512
author	Yujie Li Jiahui Chen Jiaxin Ma Xiwen Wang Wei Zhang
author_facet	Yujie Li Jiahui Chen Jiaxin Ma Xiwen Wang Wei Zhang
author_sort	Yujie Li
collection	DOAJ
description	The direction of human gaze is an important indicator of human behavior, reflecting the level of attention and cognitive state towards various visual stimuli in the environment. Convolutional neural networks have achieved good performance in gaze estimation tasks, but their global modeling capability is limited, making it difficult to further improve prediction performance. In recent years, transformer models have been introduced for gaze estimation and have achieved state-of-the-art performance. However, their slicing-and-mapping mechanism for processing local image patches can compromise local spatial information. Moreover, the single down-sampling rate and fixed-size tokens are not suitable for multiscale feature learning in gaze estimation tasks. To overcome these limitations, this study introduces a Swin Transformer for gaze estimation and designs two network architectures: a pure Swin Transformer gaze estimation model (SwinT-GE) and a hybrid gaze estimation model that combines convolutional structures with SwinT-GE (Res-Swin-GE). SwinT-GE uses the tiny version of the Swin Transformer for gaze estimation. Res-Swin-GE replaces the slicing-and-mapping mechanism of SwinT-GE with convolutional structures. Experimental results demonstrate that Res-Swin-GE significantly outperforms SwinT-GE, exhibiting strong competitiveness on the MpiiFaceGaze dataset and achieving a 7.5% performance improvement over existing state-of-the-art methods on the Eyediap dataset.
first_indexed	2024-03-11T01:28:34Z
format	Article
id	doaj.art-9f1578cc7686434da9f755714fea56bd
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-11T01:28:34Z
publishDate	2023-07-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-9f1578cc7686434da9f755714fea56bd2023-11-18T17:32:24ZengMDPI AGSensors1424-82202023-07-012313622610.3390/s23136226Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention MechanismYujie Li0Jiahui Chen1Jiaxin Ma2Xiwen Wang3Wei Zhang4School of Artificial Intelligence, Guilin University of Electronic Technology, Guilin 541004, ChinaSchool of Artificial Intelligence, Guilin University of Electronic Technology, Guilin 541004, ChinaSchool of Artificial Intelligence, Guilin University of Electronic Technology, Guilin 541004, ChinaSchool of Artificial Intelligence, Guilin University of Electronic Technology, Guilin 541004, ChinaSchool of Artificial Intelligence, Guilin University of Electronic Technology, Guilin 541004, ChinaThe direction of human gaze is an important indicator of human behavior, reflecting the level of attention and cognitive state towards various visual stimuli in the environment. Convolutional neural networks have achieved good performance in gaze estimation tasks, but their global modeling capability is limited, making it difficult to further improve prediction performance. In recent years, transformer models have been introduced for gaze estimation and have achieved state-of-the-art performance. However, their slicing-and-mapping mechanism for processing local image patches can compromise local spatial information. Moreover, the single down-sampling rate and fixed-size tokens are not suitable for multiscale feature learning in gaze estimation tasks. To overcome these limitations, this study introduces a Swin Transformer for gaze estimation and designs two network architectures: a pure Swin Transformer gaze estimation model (SwinT-GE) and a hybrid gaze estimation model that combines convolutional structures with SwinT-GE (Res-Swin-GE). SwinT-GE uses the tiny version of the Swin Transformer for gaze estimation. Res-Swin-GE replaces the slicing-and-mapping mechanism of SwinT-GE with convolutional structures. Experimental results demonstrate that Res-Swin-GE significantly outperforms SwinT-GE, exhibiting strong competitiveness on the MpiiFaceGaze dataset and achieving a 7.5% performance improvement over existing state-of-the-art methods on the Eyediap dataset.https://www.mdpi.com/1424-8220/23/13/6226gaze estimationswin transformerconvolutional neural networks (CNN)deep learningself-attention mechanism
spellingShingle	Yujie Li Jiahui Chen Jiaxin Ma Xiwen Wang Wei Zhang Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism Sensors gaze estimation swin transformer convolutional neural networks (CNN) deep learning self-attention mechanism
title	Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism
title_full	Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism
title_fullStr	Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism
title_full_unstemmed	Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism
title_short	Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism
title_sort	gaze estimation based on convolutional structure and sliding window based attention mechanism
topic	gaze estimation swin transformer convolutional neural networks (CNN) deep learning self-attention mechanism
url	https://www.mdpi.com/1424-8220/23/13/6226
work_keys_str_mv	AT yujieli gazeestimationbasedonconvolutionalstructureandslidingwindowbasedattentionmechanism AT jiahuichen gazeestimationbasedonconvolutionalstructureandslidingwindowbasedattentionmechanism AT jiaxinma gazeestimationbasedonconvolutionalstructureandslidingwindowbasedattentionmechanism AT xiwenwang gazeestimationbasedonconvolutionalstructureandslidingwindowbasedattentionmechanism AT weizhang gazeestimationbasedonconvolutionalstructureandslidingwindowbasedattentionmechanism

Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism

Similar Items