When Computer Vision Gazes at Cognition

Joint attention is a core, early-developing form of social interaction. It is based on our ability to discriminate the third party objects that other people are looking at. While it has been shown that people can accurately determine whether another person is looking directly at them versus away, li...

Full description

Bibliographic Details
Main Authors: Gao, Tao, Harari, Daniel, Tenenbaum, Joshua, Ullman, Shimon
Format: Technical Report
Language:en_US
Published: Center for Brains, Minds and Machines (CBMM), arXiv 2015
Subjects:
Online Access:http://hdl.handle.net/1721.1/100190
_version_ 1811070884691050496
author Gao, Tao
Harari, Daniel
Tenenbaum, Joshua
Ullman, Shimon
author_facet Gao, Tao
Harari, Daniel
Tenenbaum, Joshua
Ullman, Shimon
author_sort Gao, Tao
collection MIT
description Joint attention is a core, early-developing form of social interaction. It is based on our ability to discriminate the third party objects that other people are looking at. While it has been shown that people can accurately determine whether another person is looking directly at them versus away, little is known about human ability to discriminate a third person gaze directed towards objects that are further away, especially in unconstraint cases where the looker can move her head and eyes freely. In this paper we address this question by jointly exploring human psychophysics and a cognitively motivated computer vision model, which can detect the 3D direction of gaze from 2D face images. The synthesis of behavioral study and computer vision yields several interesting discoveries. (1) Human accuracy of discriminating targets 8{\deg}-10{\deg} of visual angle apart is around 40% in a free looking gaze task; (2) The ability to interpret gaze of different lookers vary dramatically; (3) This variance can be captured by the computational model; (4) Human outperforms the current model significantly. These results collectively show that the acuity of human joint attention is indeed highly impressive, given the computational challenge of the natural looking task. Moreover, the gap between human and model performance, as well as the variability of gaze interpretation across different lookers, require further understanding of the underlying mechanisms utilized by humans for this challenging task.
first_indexed 2024-09-23T08:43:12Z
format Technical Report
id mit-1721.1/100190
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T08:43:12Z
publishDate 2015
publisher Center for Brains, Minds and Machines (CBMM), arXiv
record_format dspace
spelling mit-1721.1/1001902019-04-09T19:27:56Z When Computer Vision Gazes at Cognition Gao, Tao Harari, Daniel Tenenbaum, Joshua Ullman, Shimon Computer vision Gaze Social Interaction Artificial Intelligence Joint attention is a core, early-developing form of social interaction. It is based on our ability to discriminate the third party objects that other people are looking at. While it has been shown that people can accurately determine whether another person is looking directly at them versus away, little is known about human ability to discriminate a third person gaze directed towards objects that are further away, especially in unconstraint cases where the looker can move her head and eyes freely. In this paper we address this question by jointly exploring human psychophysics and a cognitively motivated computer vision model, which can detect the 3D direction of gaze from 2D face images. The synthesis of behavioral study and computer vision yields several interesting discoveries. (1) Human accuracy of discriminating targets 8{\deg}-10{\deg} of visual angle apart is around 40% in a free looking gaze task; (2) The ability to interpret gaze of different lookers vary dramatically; (3) This variance can be captured by the computational model; (4) Human outperforms the current model significantly. These results collectively show that the acuity of human joint attention is indeed highly impressive, given the computational challenge of the natural looking task. Moreover, the gap between human and model performance, as well as the variability of gaze interpretation across different lookers, require further understanding of the underlying mechanisms utilized by humans for this challenging task. This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. 2015-12-11T20:48:15Z 2015-12-11T20:48:15Z 2014-12-12 Technical Report Working Paper Other http://hdl.handle.net/1721.1/100190 arXiv:1412.2672 en_US CBMM Memo Series;025 Attribution-NonCommercial 3.0 United States http://creativecommons.org/licenses/by-nc/3.0/us/ application/pdf Center for Brains, Minds and Machines (CBMM), arXiv
spellingShingle Computer vision
Gaze
Social Interaction
Artificial Intelligence
Gao, Tao
Harari, Daniel
Tenenbaum, Joshua
Ullman, Shimon
When Computer Vision Gazes at Cognition
title When Computer Vision Gazes at Cognition
title_full When Computer Vision Gazes at Cognition
title_fullStr When Computer Vision Gazes at Cognition
title_full_unstemmed When Computer Vision Gazes at Cognition
title_short When Computer Vision Gazes at Cognition
title_sort when computer vision gazes at cognition
topic Computer vision
Gaze
Social Interaction
Artificial Intelligence
url http://hdl.handle.net/1721.1/100190
work_keys_str_mv AT gaotao whencomputervisiongazesatcognition
AT hararidaniel whencomputervisiongazesatcognition
AT tenenbaumjoshua whencomputervisiongazesatcognition
AT ullmanshimon whencomputervisiongazesatcognition