Discovering, Learning, and Exploiting Visual Cues
Animals have evolved over millions of years to exploit the faintest visual cues for perception, navigation, and survival. Complex and intricate vision systems found in animals, such as bee eyes, exploit cues like polarization of light relative to the Sun’s position to navigate and process motion at...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/152014 https://orcid.org/0000-0003-3964-8771 |
_version_ | 1826204570930380800 |
---|---|
author | Tiwary, Kushagra |
author2 | Raskar, Ramesh |
author_facet | Raskar, Ramesh Tiwary, Kushagra |
author_sort | Tiwary, Kushagra |
collection | MIT |
description | Animals have evolved over millions of years to exploit the faintest visual cues for perception, navigation, and survival. Complex and intricate vision systems found in animals, such as bee eyes, exploit cues like polarization of light relative to the Sun’s position to navigate and process motion at one three-hundredth of a second. In humans, the evolution of the eyes and the processing of visual cues are also tightly intertwined. Babies develop depth-of-field at 6 months, are often scared of their own shadows, and confuse their reflections with the real world. As the infant matures into an adult, they intuitively learn from their experiences how these cues instead provide valuable hidden information about their environments and can be exploited for depth perception and driving.
Inspired by our usage of visual cues, this thesis explores visual cues in the modern context of data-driven imaging techniques. We first explore how visual cues can be learned from and exploited by combining physics-based forward models with data-driven AI systems. We first map the space of physics-based and data-driven systems and show the future of vision lies in the intersection of both regimes. Next, we show how shadows can be exploited to image and 3D reconstruct the hidden parts of the scene. We then exploit multi-view reflections to convert household objects into radiance-field cameras that can image the world from the object's perspective in 5D. This enables applications of occlusion imaging, beyond field-of-view novel-view synthesis, and depth estimation from objects to their environments.
Finally, we discuss how current approaches rely on humans to design imaging systems that can learn and exploit visual cues. However, as sensing in space, time, and different modalities become ubiquitous, relying on human-designed systems is not sufficient to build complex vision systems. We then propose a technique that combines reinforcement learning with computer vision to automatically learn which cues to exploit to accomplish the task without human intervention. We show how in one such scenario agents can start to automatically learn to use multiple cameras and the triangulation cue to estimate the depth of an unknown object in the scene without access to prior information about the camera, the algorithm, or the object. |
first_indexed | 2024-09-23T12:57:34Z |
format | Thesis |
id | mit-1721.1/152014 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T12:57:34Z |
publishDate | 2023 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1520142023-09-01T03:28:51Z Discovering, Learning, and Exploiting Visual Cues Tiwary, Kushagra Raskar, Ramesh Program in Media Arts and Sciences (Massachusetts Institute of Technology) Animals have evolved over millions of years to exploit the faintest visual cues for perception, navigation, and survival. Complex and intricate vision systems found in animals, such as bee eyes, exploit cues like polarization of light relative to the Sun’s position to navigate and process motion at one three-hundredth of a second. In humans, the evolution of the eyes and the processing of visual cues are also tightly intertwined. Babies develop depth-of-field at 6 months, are often scared of their own shadows, and confuse their reflections with the real world. As the infant matures into an adult, they intuitively learn from their experiences how these cues instead provide valuable hidden information about their environments and can be exploited for depth perception and driving. Inspired by our usage of visual cues, this thesis explores visual cues in the modern context of data-driven imaging techniques. We first explore how visual cues can be learned from and exploited by combining physics-based forward models with data-driven AI systems. We first map the space of physics-based and data-driven systems and show the future of vision lies in the intersection of both regimes. Next, we show how shadows can be exploited to image and 3D reconstruct the hidden parts of the scene. We then exploit multi-view reflections to convert household objects into radiance-field cameras that can image the world from the object's perspective in 5D. This enables applications of occlusion imaging, beyond field-of-view novel-view synthesis, and depth estimation from objects to their environments. Finally, we discuss how current approaches rely on humans to design imaging systems that can learn and exploit visual cues. However, as sensing in space, time, and different modalities become ubiquitous, relying on human-designed systems is not sufficient to build complex vision systems. We then propose a technique that combines reinforcement learning with computer vision to automatically learn which cues to exploit to accomplish the task without human intervention. We show how in one such scenario agents can start to automatically learn to use multiple cameras and the triangulation cue to estimate the depth of an unknown object in the scene without access to prior information about the camera, the algorithm, or the object. S.M. 2023-08-30T16:00:00Z 2023-08-30T16:00:00Z 2023-06 2023-08-16T20:34:33.582Z Thesis https://hdl.handle.net/1721.1/152014 https://orcid.org/0000-0003-3964-8771 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Tiwary, Kushagra Discovering, Learning, and Exploiting Visual Cues |
title | Discovering, Learning, and Exploiting Visual Cues |
title_full | Discovering, Learning, and Exploiting Visual Cues |
title_fullStr | Discovering, Learning, and Exploiting Visual Cues |
title_full_unstemmed | Discovering, Learning, and Exploiting Visual Cues |
title_short | Discovering, Learning, and Exploiting Visual Cues |
title_sort | discovering learning and exploiting visual cues |
url | https://hdl.handle.net/1721.1/152014 https://orcid.org/0000-0003-3964-8771 |
work_keys_str_mv | AT tiwarykushagra discoveringlearningandexploitingvisualcues |