Gaze Estimation via Strip Pooling and Multi-Criss-Cross Attention Networks

Deep learning techniques for gaze estimation usually determine gaze direction directly from images of the face. These algorithms achieve good performance because face images contain more feature information than eye images. However, these image classes contain a substantial amount of redundant infor...

Full description

Bibliographic Details
Main Authors:	Chao Yan, Weiguo Pan, Cheng Xu, Songyin Dai, Xuewei Li
Format:	Article
Language:	English
Published:	MDPI AG 2023-05-01
Series:	Applied Sciences
Subjects:	gaze estimation deep learning strip pooling multi-criss-cross attention
Online Access:	https://www.mdpi.com/2076-3417/13/10/5901

_version_	1797601290731126784
author	Chao Yan Weiguo Pan Cheng Xu Songyin Dai Xuewei Li
author_facet	Chao Yan Weiguo Pan Cheng Xu Songyin Dai Xuewei Li
author_sort	Chao Yan
collection	DOAJ
description	Deep learning techniques for gaze estimation usually determine gaze direction directly from images of the face. These algorithms achieve good performance because face images contain more feature information than eye images. However, these image classes contain a substantial amount of redundant information that may interfere with gaze prediction and may represent a bottleneck for performance improvement. To address these issues, we model long-distance dependencies between the eyes via Strip Pooling and Multi-Criss-Cross Attention Networks (SPMCCA-Net), which consist of two newly designed network modules. One module is represented by a feature enhancement bottleneck block based on fringe pooling. By incorporating strip pooling, this residual module not only enlarges its receptive fields to capture long-distance dependence between the eyes but also increases weights on important features and reduces the interference of redundant information unrelated to gaze. The other module is a multi-criss-cross attention network. This module exploits a cross-attention mechanism to further enhance long-range dependence between the eyes by incorporating the distribution of eye-gaze features and providing more gaze cues for improving estimation accuracy. Network training relies on the multi-loss function, combined with smooth L1 loss and cross entropy loss. This approach speeds up training convergence while increasing gaze estimation precision. Extensive experiments demonstrate that SPMCCA-Net outperforms several state-of-the-art methods, achieving mean angular error values of 10.13° on the Gaze360 dataset and 6.61° on the RT-gene dataset.
first_indexed	2024-03-11T03:59:05Z
format	Article
id	doaj.art-d754463db6ef4fc8bf43f1ba495497cc
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-11T03:59:05Z
publishDate	2023-05-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-d754463db6ef4fc8bf43f1ba495497cc2023-11-18T00:17:29ZengMDPI AGApplied Sciences2076-34172023-05-011310590110.3390/app13105901Gaze Estimation via Strip Pooling and Multi-Criss-Cross Attention NetworksChao Yan0Weiguo Pan1Cheng Xu2Songyin Dai3Xuewei Li4Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, ChinaBeijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, ChinaBeijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, ChinaBeijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, ChinaBeijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, ChinaDeep learning techniques for gaze estimation usually determine gaze direction directly from images of the face. These algorithms achieve good performance because face images contain more feature information than eye images. However, these image classes contain a substantial amount of redundant information that may interfere with gaze prediction and may represent a bottleneck for performance improvement. To address these issues, we model long-distance dependencies between the eyes via Strip Pooling and Multi-Criss-Cross Attention Networks (SPMCCA-Net), which consist of two newly designed network modules. One module is represented by a feature enhancement bottleneck block based on fringe pooling. By incorporating strip pooling, this residual module not only enlarges its receptive fields to capture long-distance dependence between the eyes but also increases weights on important features and reduces the interference of redundant information unrelated to gaze. The other module is a multi-criss-cross attention network. This module exploits a cross-attention mechanism to further enhance long-range dependence between the eyes by incorporating the distribution of eye-gaze features and providing more gaze cues for improving estimation accuracy. Network training relies on the multi-loss function, combined with smooth L1 loss and cross entropy loss. This approach speeds up training convergence while increasing gaze estimation precision. Extensive experiments demonstrate that SPMCCA-Net outperforms several state-of-the-art methods, achieving mean angular error values of 10.13° on the Gaze360 dataset and 6.61° on the RT-gene dataset.https://www.mdpi.com/2076-3417/13/10/5901gaze estimationdeep learningstrip poolingmulti-criss-cross attention
spellingShingle	Chao Yan Weiguo Pan Cheng Xu Songyin Dai Xuewei Li Gaze Estimation via Strip Pooling and Multi-Criss-Cross Attention Networks Applied Sciences gaze estimation deep learning strip pooling multi-criss-cross attention
title	Gaze Estimation via Strip Pooling and Multi-Criss-Cross Attention Networks
title_full	Gaze Estimation via Strip Pooling and Multi-Criss-Cross Attention Networks
title_fullStr	Gaze Estimation via Strip Pooling and Multi-Criss-Cross Attention Networks
title_full_unstemmed	Gaze Estimation via Strip Pooling and Multi-Criss-Cross Attention Networks
title_short	Gaze Estimation via Strip Pooling and Multi-Criss-Cross Attention Networks
title_sort	gaze estimation via strip pooling and multi criss cross attention networks
topic	gaze estimation deep learning strip pooling multi-criss-cross attention
url	https://www.mdpi.com/2076-3417/13/10/5901
work_keys_str_mv	AT chaoyan gazeestimationviastrippoolingandmulticrisscrossattentionnetworks AT weiguopan gazeestimationviastrippoolingandmulticrisscrossattentionnetworks AT chengxu gazeestimationviastrippoolingandmulticrisscrossattentionnetworks AT songyindai gazeestimationviastrippoolingandmulticrisscrossattentionnetworks AT xueweili gazeestimationviastrippoolingandmulticrisscrossattentionnetworks

Gaze Estimation via Strip Pooling and Multi-Criss-Cross Attention Networks

Similar Items