Following Gaze in Video

Following the gaze of people inside videos is an important signal for understanding people and their actions. In this paper, we present an approach for following gaze in video by predicting where a person (in the video) is looking even when the object is in a different frame. We collect VideoGaze, a...

Full description

Bibliographic Details
Main Authors:	Recasens Continente, Adria, Vondrick, Carl Martin, Khosla, Aditya, Torralba, Antonio
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	English
Published:	Institute of Electrical and Electronics Engineers 2019
Online Access:	https://hdl.handle.net/1721.1/122778

_version_	1826190795776983040
author	Recasens Continente, Adria Vondrick, Carl Martin Khosla, Aditya Torralba, Antonio
author2	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Recasens Continente, Adria Vondrick, Carl Martin Khosla, Aditya Torralba, Antonio
author_sort	Recasens Continente, Adria
collection	MIT
description	Following the gaze of people inside videos is an important signal for understanding people and their actions. In this paper, we present an approach for following gaze in video by predicting where a person (in the video) is looking even when the object is in a different frame. We collect VideoGaze, a new dataset which we use as a benchmark to both train and evaluate models. Given one frame with a person in it, our model estimates a density for gaze location in every frame and the probability that the person is looking in that particular frame. A key aspect of our approach is an end-to-end model that jointly estimates: saliency, gaze pose, and geometric relationships between views while only using gaze as supervision. Visualizations suggest that the model learns to internally solve these intermediate tasks automatically without additional supervision. Experiments show that our approach follows gaze in video better than existing approaches, enabling a richer understanding of human activities in video. Keywords: Motion pictures, Head, Three-dimensional displays, Predictive models, Geometry, Semantics, gaze tracking, learning (artificial intelligence), video signal processing
first_indexed	2024-09-23T08:45:48Z
format	Article
id	mit-1721.1/122778
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T08:45:48Z
publishDate	2019
publisher	Institute of Electrical and Electronics Engineers
record_format	dspace
spelling	mit-1721.1/1227782022-09-23T14:22:58Z Following Gaze in Video Recasens Continente, Adria Vondrick, Carl Martin Khosla, Aditya Torralba, Antonio Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Following the gaze of people inside videos is an important signal for understanding people and their actions. In this paper, we present an approach for following gaze in video by predicting where a person (in the video) is looking even when the object is in a different frame. We collect VideoGaze, a new dataset which we use as a benchmark to both train and evaluate models. Given one frame with a person in it, our model estimates a density for gaze location in every frame and the probability that the person is looking in that particular frame. A key aspect of our approach is an end-to-end model that jointly estimates: saliency, gaze pose, and geometric relationships between views while only using gaze as supervision. Visualizations suggest that the model learns to internally solve these intermediate tasks automatically without additional supervision. Experiments show that our approach follows gaze in video better than existing approaches, enabling a richer understanding of human activities in video. Keywords: Motion pictures, Head, Three-dimensional displays, Predictive models, Geometry, Semantics, gaze tracking, learning (artificial intelligence), video signal processing 2019-11-06T20:19:42Z 2019-11-06T20:19:42Z 2017-12 2019-07-11T16:25:39Z Article http://purl.org/eprint/type/ConferencePaper 2380-7504 https://hdl.handle.net/1721.1/122778 Recasens Continente, Adria et al. "Following Gaze in Video," 2017 IEEE International Conference on Computer Vision (ICCV), October 2017, Venice, Italy, Institute of Electrical and Electronics Engineers, December 2017 ©IEEE en http://dx.doi.org/10.1109/iccv.2017.160 2017 IEEE International Conference on Computer Vision (ICCV) Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Electrical and Electronics Engineers MIT web domain
spellingShingle	Recasens Continente, Adria Vondrick, Carl Martin Khosla, Aditya Torralba, Antonio Following Gaze in Video
title	Following Gaze in Video
title_full	Following Gaze in Video
title_fullStr	Following Gaze in Video
title_full_unstemmed	Following Gaze in Video
title_short	Following Gaze in Video
title_sort	following gaze in video
url	https://hdl.handle.net/1721.1/122778
work_keys_str_mv	AT recasenscontinenteadria followinggazeinvideo AT vondrickcarlmartin followinggazeinvideo AT khoslaaditya followinggazeinvideo AT torralbaantonio followinggazeinvideo

Following Gaze in Video

Similar Items