Evaluation Toolkit for Adaptable Automatic Gaze Estimation

Cognitive development researchers have long been interested in understanding how infants learn to perceive and understand the world [11, 9, 7]. One technique for investigating infant cognition involves presenting stimuli and observing the direction and duration of their gaze [5]. Experiments of this...

Full description

Bibliographic Details
Main Author: Hart, Peter
Other Authors: Tenenbaum, Joshua
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/143331
_version_ 1811076270065188864
author Hart, Peter
author2 Tenenbaum, Joshua
author_facet Tenenbaum, Joshua
Hart, Peter
author_sort Hart, Peter
collection MIT
description Cognitive development researchers have long been interested in understanding how infants learn to perceive and understand the world [11, 9, 7]. One technique for investigating infant cognition involves presenting stimuli and observing the direction and duration of their gaze [5]. Experiments of this type currently require human annotation, or special hardware like infrared eye tracking to annotate video data of the infants’ faces. The MIT Early Childhood Cognition Lab developed the project Lookit, which allows volunteers to participate in preferential looking studies from home [10]. In these studies, the stimuli are presented on a laptop screen and the infants’ reactions are recorded with a web camera. Although this platform removes some bottlenecks from the data collection process, data generated from Lookit still require human annotators to determine the infant’s gaze direction and duration. Other researchers, such as Virginia A. Marchman and her associates at the Stanford Language Learning Lab, have recorded videos with notable differences such as the position of the participants, video color, and video resolution. Recent developments in the field of computer vision have allowed for advancements in automatic gaze tracking from videos. Preliminary results suggest that the convolutional neural network (CNN) based gaze estimation model iCatcher+ can be trained to infer gaze direction with near-human accuracy [4, 2]. Cognitive development researchers care about several different metrics in addition to accuracy. I created a suite of tools for analyzing video data sets and evaluating the performance of gaze tracking models. These tools include key performance metric calculation and visualization and video data analysis. These tools can be used to aid the development of a general purpose gaze detection model that can be adapted to perform well over diverse video attributes.
first_indexed 2024-09-23T10:19:01Z
format Thesis
id mit-1721.1/143331
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T10:19:01Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1433312022-06-16T03:43:00Z Evaluation Toolkit for Adaptable Automatic Gaze Estimation Hart, Peter Tenenbaum, Joshua Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Cognitive development researchers have long been interested in understanding how infants learn to perceive and understand the world [11, 9, 7]. One technique for investigating infant cognition involves presenting stimuli and observing the direction and duration of their gaze [5]. Experiments of this type currently require human annotation, or special hardware like infrared eye tracking to annotate video data of the infants’ faces. The MIT Early Childhood Cognition Lab developed the project Lookit, which allows volunteers to participate in preferential looking studies from home [10]. In these studies, the stimuli are presented on a laptop screen and the infants’ reactions are recorded with a web camera. Although this platform removes some bottlenecks from the data collection process, data generated from Lookit still require human annotators to determine the infant’s gaze direction and duration. Other researchers, such as Virginia A. Marchman and her associates at the Stanford Language Learning Lab, have recorded videos with notable differences such as the position of the participants, video color, and video resolution. Recent developments in the field of computer vision have allowed for advancements in automatic gaze tracking from videos. Preliminary results suggest that the convolutional neural network (CNN) based gaze estimation model iCatcher+ can be trained to infer gaze direction with near-human accuracy [4, 2]. Cognitive development researchers care about several different metrics in addition to accuracy. I created a suite of tools for analyzing video data sets and evaluating the performance of gaze tracking models. These tools include key performance metric calculation and visualization and video data analysis. These tools can be used to aid the development of a general purpose gaze detection model that can be adapted to perform well over diverse video attributes. M.Eng. 2022-06-15T13:13:04Z 2022-06-15T13:13:04Z 2022-02 2022-02-22T18:32:19.094Z Thesis https://hdl.handle.net/1721.1/143331 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Hart, Peter
Evaluation Toolkit for Adaptable Automatic Gaze Estimation
title Evaluation Toolkit for Adaptable Automatic Gaze Estimation
title_full Evaluation Toolkit for Adaptable Automatic Gaze Estimation
title_fullStr Evaluation Toolkit for Adaptable Automatic Gaze Estimation
title_full_unstemmed Evaluation Toolkit for Adaptable Automatic Gaze Estimation
title_short Evaluation Toolkit for Adaptable Automatic Gaze Estimation
title_sort evaluation toolkit for adaptable automatic gaze estimation
url https://hdl.handle.net/1721.1/143331
work_keys_str_mv AT hartpeter evaluationtoolkitforadaptableautomaticgazeestimation