Evaluation Toolkit for Adaptable Automatic Gaze Estimation

Cognitive development researchers have long been interested in understanding how infants learn to perceive and understand the world [11, 9, 7]. One technique for investigating infant cognition involves presenting stimuli and observing the direction and duration of their gaze [5]. Experiments of this...

Full description

Bibliographic Details
Main Author:	Hart, Peter
Other Authors:	Tenenbaum, Joshua
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/143331

_version_	1811076270065188864
author	Hart, Peter
author2	Tenenbaum, Joshua
author_facet	Tenenbaum, Joshua Hart, Peter
author_sort	Hart, Peter
collection	MIT
description	Cognitive development researchers have long been interested in understanding how infants learn to perceive and understand the world [11, 9, 7]. One technique for investigating infant cognition involves presenting stimuli and observing the direction and duration of their gaze [5]. Experiments of this type currently require human annotation, or special hardware like infrared eye tracking to annotate video data of the infants’ faces. The MIT Early Childhood Cognition Lab developed the project Lookit, which allows volunteers to participate in preferential looking studies from home [10]. In these studies, the stimuli are presented on a laptop screen and the infants’ reactions are recorded with a web camera. Although this platform removes some bottlenecks from the data collection process, data generated from Lookit still require human annotators to determine the infant’s gaze direction and duration. Other researchers, such as Virginia A. Marchman and her associates at the Stanford Language Learning Lab, have recorded videos with notable differences such as the position of the participants, video color, and video resolution. Recent developments in the field of computer vision have allowed for advancements in automatic gaze tracking from videos. Preliminary results suggest that the convolutional neural network (CNN) based gaze estimation model iCatcher+ can be trained to infer gaze direction with near-human accuracy [4, 2]. Cognitive development researchers care about several different metrics in addition to accuracy. I created a suite of tools for analyzing video data sets and evaluating the performance of gaze tracking models. These tools include key performance metric calculation and visualization and video data analysis. These tools can be used to aid the development of a general purpose gaze detection model that can be adapted to perform well over diverse video attributes.
first_indexed	2024-09-23T10:19:01Z
format	Thesis
id	mit-1721.1/143331
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T10:19:01Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1433312022-06-16T03:43:00Z Evaluation Toolkit for Adaptable Automatic Gaze Estimation Hart, Peter Tenenbaum, Joshua Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Cognitive development researchers have long been interested in understanding how infants learn to perceive and understand the world [11, 9, 7]. One technique for investigating infant cognition involves presenting stimuli and observing the direction and duration of their gaze [5]. Experiments of this type currently require human annotation, or special hardware like infrared eye tracking to annotate video data of the infants’ faces. The MIT Early Childhood Cognition Lab developed the project Lookit, which allows volunteers to participate in preferential looking studies from home [10]. In these studies, the stimuli are presented on a laptop screen and the infants’ reactions are recorded with a web camera. Although this platform removes some bottlenecks from the data collection process, data generated from Lookit still require human annotators to determine the infant’s gaze direction and duration. Other researchers, such as Virginia A. Marchman and her associates at the Stanford Language Learning Lab, have recorded videos with notable differences such as the position of the participants, video color, and video resolution. Recent developments in the field of computer vision have allowed for advancements in automatic gaze tracking from videos. Preliminary results suggest that the convolutional neural network (CNN) based gaze estimation model iCatcher+ can be trained to infer gaze direction with near-human accuracy [4, 2]. Cognitive development researchers care about several different metrics in addition to accuracy. I created a suite of tools for analyzing video data sets and evaluating the performance of gaze tracking models. These tools include key performance metric calculation and visualization and video data analysis. These tools can be used to aid the development of a general purpose gaze detection model that can be adapted to perform well over diverse video attributes. M.Eng. 2022-06-15T13:13:04Z 2022-06-15T13:13:04Z 2022-02 2022-02-22T18:32:19.094Z Thesis https://hdl.handle.net/1721.1/143331 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Hart, Peter Evaluation Toolkit for Adaptable Automatic Gaze Estimation
title	Evaluation Toolkit for Adaptable Automatic Gaze Estimation
title_full	Evaluation Toolkit for Adaptable Automatic Gaze Estimation
title_fullStr	Evaluation Toolkit for Adaptable Automatic Gaze Estimation
title_full_unstemmed	Evaluation Toolkit for Adaptable Automatic Gaze Estimation
title_short	Evaluation Toolkit for Adaptable Automatic Gaze Estimation
title_sort	evaluation toolkit for adaptable automatic gaze estimation
url	https://hdl.handle.net/1721.1/143331
work_keys_str_mv	AT hartpeter evaluationtoolkitforadaptableautomaticgazeestimation

Evaluation Toolkit for Adaptable Automatic Gaze Estimation

Similar Items