Person-Specific Gaze Estimation from Low-Quality Webcam Images

Gaze estimation is an established research problem in computer vision. It has various applications in real life, from human–computer interactions to health care and virtual reality, making it more viable for the research community. Due to the significant success of deep learning techniques in other...

Full description

Bibliographic Details
Main Authors: Mohd Faizan Ansari, Pawel Kasprowski, Peter Peer
Format: Article
Language:English
Published: MDPI AG 2023-04-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/23/8/4138
_version_ 1797603532744949760
author Mohd Faizan Ansari
Pawel Kasprowski
Peter Peer
author_facet Mohd Faizan Ansari
Pawel Kasprowski
Peter Peer
author_sort Mohd Faizan Ansari
collection DOAJ
description Gaze estimation is an established research problem in computer vision. It has various applications in real life, from human–computer interactions to health care and virtual reality, making it more viable for the research community. Due to the significant success of deep learning techniques in other computer vision tasks—for example, image classification, object detection, object segmentation, and object tracking—deep learning-based gaze estimation has also received more attention in recent years. This paper uses a convolutional neural network (CNN) for person-specific gaze estimation. The person-specific gaze estimation utilizes a single model trained for one individual user, contrary to the commonly-used generalized models trained on multiple people’s data. We utilized only low-quality images directly collected from a standard desktop webcam, so our method can be applied to any computer system equipped with such a camera without additional hardware requirements. First, we used the web camera to collect a dataset of face and eye images. Then, we tested different combinations of CNN parameters, including the learning and dropout rates. Our findings show that building a person-specific eye-tracking model produces better results with a selection of good hyperparameters when compared to universal models that are trained on multiple users’ data. In particular, we achieved the best results for the left eye with 38.20 MAE (Mean Absolute Error) in pixels, the right eye with 36.01 MAE, both eyes combined with 51.18 MAE, and the whole face with 30.09 MAE, which is equivalent to approximately 1.45 degrees for the left eye, 1.37 degrees for the right eye, 1.98 degrees for both eyes combined, and 1.14 degrees for full-face images.
first_indexed 2024-03-11T04:33:28Z
format Article
id doaj.art-dc3b103b7f664e0d89cbf064384e7b09
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-11T04:33:28Z
publishDate 2023-04-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-dc3b103b7f664e0d89cbf064384e7b092023-11-17T21:19:37ZengMDPI AGSensors1424-82202023-04-01238413810.3390/s23084138Person-Specific Gaze Estimation from Low-Quality Webcam ImagesMohd Faizan Ansari0Pawel Kasprowski1Peter Peer2Department of Applied Informatics, Silesian University of Technology, 44-100 Gliwice, PolandDepartment of Applied Informatics, Silesian University of Technology, 44-100 Gliwice, PolandFaculty of Computer and Information Science, University of Ljubljana, Večna Pot 113, SI-1000 Ljubljana, SloveniaGaze estimation is an established research problem in computer vision. It has various applications in real life, from human–computer interactions to health care and virtual reality, making it more viable for the research community. Due to the significant success of deep learning techniques in other computer vision tasks—for example, image classification, object detection, object segmentation, and object tracking—deep learning-based gaze estimation has also received more attention in recent years. This paper uses a convolutional neural network (CNN) for person-specific gaze estimation. The person-specific gaze estimation utilizes a single model trained for one individual user, contrary to the commonly-used generalized models trained on multiple people’s data. We utilized only low-quality images directly collected from a standard desktop webcam, so our method can be applied to any computer system equipped with such a camera without additional hardware requirements. First, we used the web camera to collect a dataset of face and eye images. Then, we tested different combinations of CNN parameters, including the learning and dropout rates. Our findings show that building a person-specific eye-tracking model produces better results with a selection of good hyperparameters when compared to universal models that are trained on multiple users’ data. In particular, we achieved the best results for the left eye with 38.20 MAE (Mean Absolute Error) in pixels, the right eye with 36.01 MAE, both eyes combined with 51.18 MAE, and the whole face with 30.09 MAE, which is equivalent to approximately 1.45 degrees for the left eye, 1.37 degrees for the right eye, 1.98 degrees for both eyes combined, and 1.14 degrees for full-face images.https://www.mdpi.com/1424-8220/23/8/4138gaze estimationconvolution neural networkcomputer visiondeep learning
spellingShingle Mohd Faizan Ansari
Pawel Kasprowski
Peter Peer
Person-Specific Gaze Estimation from Low-Quality Webcam Images
Sensors
gaze estimation
convolution neural network
computer vision
deep learning
title Person-Specific Gaze Estimation from Low-Quality Webcam Images
title_full Person-Specific Gaze Estimation from Low-Quality Webcam Images
title_fullStr Person-Specific Gaze Estimation from Low-Quality Webcam Images
title_full_unstemmed Person-Specific Gaze Estimation from Low-Quality Webcam Images
title_short Person-Specific Gaze Estimation from Low-Quality Webcam Images
title_sort person specific gaze estimation from low quality webcam images
topic gaze estimation
convolution neural network
computer vision
deep learning
url https://www.mdpi.com/1424-8220/23/8/4138
work_keys_str_mv AT mohdfaizanansari personspecificgazeestimationfromlowqualitywebcamimages
AT pawelkasprowski personspecificgazeestimationfromlowqualitywebcamimages
AT peterpeer personspecificgazeestimationfromlowqualitywebcamimages