Discovering, Learning, and Exploiting Visual Cues

Animals have evolved over millions of years to exploit the faintest visual cues for perception, navigation, and survival. Complex and intricate vision systems found in animals, such as bee eyes, exploit cues like polarization of light relative to the Sun’s position to navigate and process motion at...

Full description

Bibliographic Details
Main Author:	Tiwary, Kushagra
Other Authors:	Raskar, Ramesh
Format:	Thesis
Published:	Massachusetts Institute of Technology 2023
Online Access:	https://hdl.handle.net/1721.1/152014 https://orcid.org/0000-0003-3964-8771

_version_	1826204570930380800
author	Tiwary, Kushagra
author2	Raskar, Ramesh
author_facet	Raskar, Ramesh Tiwary, Kushagra
author_sort	Tiwary, Kushagra
collection	MIT
description	Animals have evolved over millions of years to exploit the faintest visual cues for perception, navigation, and survival. Complex and intricate vision systems found in animals, such as bee eyes, exploit cues like polarization of light relative to the Sun’s position to navigate and process motion at one three-hundredth of a second. In humans, the evolution of the eyes and the processing of visual cues are also tightly intertwined. Babies develop depth-of-field at 6 months, are often scared of their own shadows, and confuse their reflections with the real world. As the infant matures into an adult, they intuitively learn from their experiences how these cues instead provide valuable hidden information about their environments and can be exploited for depth perception and driving. Inspired by our usage of visual cues, this thesis explores visual cues in the modern context of data-driven imaging techniques. We first explore how visual cues can be learned from and exploited by combining physics-based forward models with data-driven AI systems. We first map the space of physics-based and data-driven systems and show the future of vision lies in the intersection of both regimes. Next, we show how shadows can be exploited to image and 3D reconstruct the hidden parts of the scene. We then exploit multi-view reflections to convert household objects into radiance-field cameras that can image the world from the object's perspective in 5D. This enables applications of occlusion imaging, beyond field-of-view novel-view synthesis, and depth estimation from objects to their environments. Finally, we discuss how current approaches rely on humans to design imaging systems that can learn and exploit visual cues. However, as sensing in space, time, and different modalities become ubiquitous, relying on human-designed systems is not sufficient to build complex vision systems. We then propose a technique that combines reinforcement learning with computer vision to automatically learn which cues to exploit to accomplish the task without human intervention. We show how in one such scenario agents can start to automatically learn to use multiple cameras and the triangulation cue to estimate the depth of an unknown object in the scene without access to prior information about the camera, the algorithm, or the object.
first_indexed	2024-09-23T12:57:34Z
format	Thesis
id	mit-1721.1/152014
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T12:57:34Z
publishDate	2023
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1520142023-09-01T03:28:51Z Discovering, Learning, and Exploiting Visual Cues Tiwary, Kushagra Raskar, Ramesh Program in Media Arts and Sciences (Massachusetts Institute of Technology) Animals have evolved over millions of years to exploit the faintest visual cues for perception, navigation, and survival. Complex and intricate vision systems found in animals, such as bee eyes, exploit cues like polarization of light relative to the Sun’s position to navigate and process motion at one three-hundredth of a second. In humans, the evolution of the eyes and the processing of visual cues are also tightly intertwined. Babies develop depth-of-field at 6 months, are often scared of their own shadows, and confuse their reflections with the real world. As the infant matures into an adult, they intuitively learn from their experiences how these cues instead provide valuable hidden information about their environments and can be exploited for depth perception and driving. Inspired by our usage of visual cues, this thesis explores visual cues in the modern context of data-driven imaging techniques. We first explore how visual cues can be learned from and exploited by combining physics-based forward models with data-driven AI systems. We first map the space of physics-based and data-driven systems and show the future of vision lies in the intersection of both regimes. Next, we show how shadows can be exploited to image and 3D reconstruct the hidden parts of the scene. We then exploit multi-view reflections to convert household objects into radiance-field cameras that can image the world from the object's perspective in 5D. This enables applications of occlusion imaging, beyond field-of-view novel-view synthesis, and depth estimation from objects to their environments. Finally, we discuss how current approaches rely on humans to design imaging systems that can learn and exploit visual cues. However, as sensing in space, time, and different modalities become ubiquitous, relying on human-designed systems is not sufficient to build complex vision systems. We then propose a technique that combines reinforcement learning with computer vision to automatically learn which cues to exploit to accomplish the task without human intervention. We show how in one such scenario agents can start to automatically learn to use multiple cameras and the triangulation cue to estimate the depth of an unknown object in the scene without access to prior information about the camera, the algorithm, or the object. S.M. 2023-08-30T16:00:00Z 2023-08-30T16:00:00Z 2023-06 2023-08-16T20:34:33.582Z Thesis https://hdl.handle.net/1721.1/152014 https://orcid.org/0000-0003-3964-8771 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Tiwary, Kushagra Discovering, Learning, and Exploiting Visual Cues
title	Discovering, Learning, and Exploiting Visual Cues
title_full	Discovering, Learning, and Exploiting Visual Cues
title_fullStr	Discovering, Learning, and Exploiting Visual Cues
title_full_unstemmed	Discovering, Learning, and Exploiting Visual Cues
title_short	Discovering, Learning, and Exploiting Visual Cues
title_sort	discovering learning and exploiting visual cues
url	https://hdl.handle.net/1721.1/152014 https://orcid.org/0000-0003-3964-8771
work_keys_str_mv	AT tiwarykushagra discoveringlearningandexploitingvisualcues

Discovering, Learning, and Exploiting Visual Cues

Similar Items