Discovering, Learning, and Exploiting Visual Cues

Animals have evolved over millions of years to exploit the faintest visual cues for perception, navigation, and survival. Complex and intricate vision systems found in animals, such as bee eyes, exploit cues like polarization of light relative to the Sun’s position to navigate and process motion at...

Full description

Bibliographic Details
Main Author: Tiwary, Kushagra
Other Authors: Raskar, Ramesh
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/152014
https://orcid.org/0000-0003-3964-8771
_version_ 1826204570930380800
author Tiwary, Kushagra
author2 Raskar, Ramesh
author_facet Raskar, Ramesh
Tiwary, Kushagra
author_sort Tiwary, Kushagra
collection MIT
description Animals have evolved over millions of years to exploit the faintest visual cues for perception, navigation, and survival. Complex and intricate vision systems found in animals, such as bee eyes, exploit cues like polarization of light relative to the Sun’s position to navigate and process motion at one three-hundredth of a second. In humans, the evolution of the eyes and the processing of visual cues are also tightly intertwined. Babies develop depth-of-field at 6 months, are often scared of their own shadows, and confuse their reflections with the real world. As the infant matures into an adult, they intuitively learn from their experiences how these cues instead provide valuable hidden information about their environments and can be exploited for depth perception and driving. Inspired by our usage of visual cues, this thesis explores visual cues in the modern context of data-driven imaging techniques. We first explore how visual cues can be learned from and exploited by combining physics-based forward models with data-driven AI systems. We first map the space of physics-based and data-driven systems and show the future of vision lies in the intersection of both regimes. Next, we show how shadows can be exploited to image and 3D reconstruct the hidden parts of the scene. We then exploit multi-view reflections to convert household objects into radiance-field cameras that can image the world from the object's perspective in 5D. This enables applications of occlusion imaging, beyond field-of-view novel-view synthesis, and depth estimation from objects to their environments. Finally, we discuss how current approaches rely on humans to design imaging systems that can learn and exploit visual cues. However, as sensing in space, time, and different modalities become ubiquitous, relying on human-designed systems is not sufficient to build complex vision systems. We then propose a technique that combines reinforcement learning with computer vision to automatically learn which cues to exploit to accomplish the task without human intervention. We show how in one such scenario agents can start to automatically learn to use multiple cameras and the triangulation cue to estimate the depth of an unknown object in the scene without access to prior information about the camera, the algorithm, or the object.
first_indexed 2024-09-23T12:57:34Z
format Thesis
id mit-1721.1/152014
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T12:57:34Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1520142023-09-01T03:28:51Z Discovering, Learning, and Exploiting Visual Cues Tiwary, Kushagra Raskar, Ramesh Program in Media Arts and Sciences (Massachusetts Institute of Technology) Animals have evolved over millions of years to exploit the faintest visual cues for perception, navigation, and survival. Complex and intricate vision systems found in animals, such as bee eyes, exploit cues like polarization of light relative to the Sun’s position to navigate and process motion at one three-hundredth of a second. In humans, the evolution of the eyes and the processing of visual cues are also tightly intertwined. Babies develop depth-of-field at 6 months, are often scared of their own shadows, and confuse their reflections with the real world. As the infant matures into an adult, they intuitively learn from their experiences how these cues instead provide valuable hidden information about their environments and can be exploited for depth perception and driving. Inspired by our usage of visual cues, this thesis explores visual cues in the modern context of data-driven imaging techniques. We first explore how visual cues can be learned from and exploited by combining physics-based forward models with data-driven AI systems. We first map the space of physics-based and data-driven systems and show the future of vision lies in the intersection of both regimes. Next, we show how shadows can be exploited to image and 3D reconstruct the hidden parts of the scene. We then exploit multi-view reflections to convert household objects into radiance-field cameras that can image the world from the object's perspective in 5D. This enables applications of occlusion imaging, beyond field-of-view novel-view synthesis, and depth estimation from objects to their environments. Finally, we discuss how current approaches rely on humans to design imaging systems that can learn and exploit visual cues. However, as sensing in space, time, and different modalities become ubiquitous, relying on human-designed systems is not sufficient to build complex vision systems. We then propose a technique that combines reinforcement learning with computer vision to automatically learn which cues to exploit to accomplish the task without human intervention. We show how in one such scenario agents can start to automatically learn to use multiple cameras and the triangulation cue to estimate the depth of an unknown object in the scene without access to prior information about the camera, the algorithm, or the object. S.M. 2023-08-30T16:00:00Z 2023-08-30T16:00:00Z 2023-06 2023-08-16T20:34:33.582Z Thesis https://hdl.handle.net/1721.1/152014 https://orcid.org/0000-0003-3964-8771 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Tiwary, Kushagra
Discovering, Learning, and Exploiting Visual Cues
title Discovering, Learning, and Exploiting Visual Cues
title_full Discovering, Learning, and Exploiting Visual Cues
title_fullStr Discovering, Learning, and Exploiting Visual Cues
title_full_unstemmed Discovering, Learning, and Exploiting Visual Cues
title_short Discovering, Learning, and Exploiting Visual Cues
title_sort discovering learning and exploiting visual cues
url https://hdl.handle.net/1721.1/152014
https://orcid.org/0000-0003-3964-8771
work_keys_str_mv AT tiwarykushagra discoveringlearningandexploitingvisualcues