Evaluation Toolkit for Adaptable Automatic Gaze Estimation
Cognitive development researchers have long been interested in understanding how infants learn to perceive and understand the world [11, 9, 7]. One technique for investigating infant cognition involves presenting stimuli and observing the direction and duration of their gaze [5]. Experiments of this...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/143331 |
_version_ | 1811076270065188864 |
---|---|
author | Hart, Peter |
author2 | Tenenbaum, Joshua |
author_facet | Tenenbaum, Joshua Hart, Peter |
author_sort | Hart, Peter |
collection | MIT |
description | Cognitive development researchers have long been interested in understanding how infants learn to perceive and understand the world [11, 9, 7]. One technique for investigating infant cognition involves presenting stimuli and observing the direction and duration of their gaze [5]. Experiments of this type currently require human annotation, or special hardware like infrared eye tracking to annotate video data of the infants’ faces.
The MIT Early Childhood Cognition Lab developed the project Lookit, which allows volunteers to participate in preferential looking studies from home [10]. In these studies, the stimuli are presented on a laptop screen and the infants’ reactions are recorded with a web camera. Although this platform removes some bottlenecks from the data collection process, data generated from Lookit still require human annotators to determine the infant’s gaze direction and duration. Other researchers, such as Virginia A. Marchman and her associates at the Stanford Language Learning Lab, have recorded videos with notable differences such as the position of the participants, video color, and video resolution.
Recent developments in the field of computer vision have allowed for advancements in automatic gaze tracking from videos. Preliminary results suggest that the convolutional neural network (CNN) based gaze estimation model iCatcher+ can be trained to infer gaze direction with near-human accuracy [4, 2]. Cognitive development researchers care about several different metrics in addition to accuracy. I created a suite of tools for analyzing video data sets and evaluating the performance of gaze tracking models. These tools include key performance metric calculation and visualization and video data analysis. These tools can be used to aid the development of a general purpose gaze detection model that can be adapted to perform well over diverse video attributes. |
first_indexed | 2024-09-23T10:19:01Z |
format | Thesis |
id | mit-1721.1/143331 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T10:19:01Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1433312022-06-16T03:43:00Z Evaluation Toolkit for Adaptable Automatic Gaze Estimation Hart, Peter Tenenbaum, Joshua Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Cognitive development researchers have long been interested in understanding how infants learn to perceive and understand the world [11, 9, 7]. One technique for investigating infant cognition involves presenting stimuli and observing the direction and duration of their gaze [5]. Experiments of this type currently require human annotation, or special hardware like infrared eye tracking to annotate video data of the infants’ faces. The MIT Early Childhood Cognition Lab developed the project Lookit, which allows volunteers to participate in preferential looking studies from home [10]. In these studies, the stimuli are presented on a laptop screen and the infants’ reactions are recorded with a web camera. Although this platform removes some bottlenecks from the data collection process, data generated from Lookit still require human annotators to determine the infant’s gaze direction and duration. Other researchers, such as Virginia A. Marchman and her associates at the Stanford Language Learning Lab, have recorded videos with notable differences such as the position of the participants, video color, and video resolution. Recent developments in the field of computer vision have allowed for advancements in automatic gaze tracking from videos. Preliminary results suggest that the convolutional neural network (CNN) based gaze estimation model iCatcher+ can be trained to infer gaze direction with near-human accuracy [4, 2]. Cognitive development researchers care about several different metrics in addition to accuracy. I created a suite of tools for analyzing video data sets and evaluating the performance of gaze tracking models. These tools include key performance metric calculation and visualization and video data analysis. These tools can be used to aid the development of a general purpose gaze detection model that can be adapted to perform well over diverse video attributes. M.Eng. 2022-06-15T13:13:04Z 2022-06-15T13:13:04Z 2022-02 2022-02-22T18:32:19.094Z Thesis https://hdl.handle.net/1721.1/143331 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Hart, Peter Evaluation Toolkit for Adaptable Automatic Gaze Estimation |
title | Evaluation Toolkit for Adaptable Automatic Gaze Estimation |
title_full | Evaluation Toolkit for Adaptable Automatic Gaze Estimation |
title_fullStr | Evaluation Toolkit for Adaptable Automatic Gaze Estimation |
title_full_unstemmed | Evaluation Toolkit for Adaptable Automatic Gaze Estimation |
title_short | Evaluation Toolkit for Adaptable Automatic Gaze Estimation |
title_sort | evaluation toolkit for adaptable automatic gaze estimation |
url | https://hdl.handle.net/1721.1/143331 |
work_keys_str_mv | AT hartpeter evaluationtoolkitforadaptableautomaticgazeestimation |