Detecting incidents, accelerating dataset annotation, and estimating depth with multi-view invariants

Computer vision has seen incredible growth since the introduction of large datasets, deep neural networks, and modern computing resources. Current algorithms can perform scene understanding, or the ability to understand and interpret the world through visual perception (e.g., images or videos). In t...

Full description

Bibliographic Details
Main Author: Weber, Ethan
Other Authors: Torralba, Antonio
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/139162
_version_ 1826194386523783168
author Weber, Ethan
author2 Torralba, Antonio
author_facet Torralba, Antonio
Weber, Ethan
author_sort Weber, Ethan
collection MIT
description Computer vision has seen incredible growth since the introduction of large datasets, deep neural networks, and modern computing resources. Current algorithms can perform scene understanding, or the ability to understand and interpret the world through visual perception (e.g., images or videos). In this thesis, we push the boundaries of current scene understanding algorithms with three distinct projects. (1) In the first project, we address limitations of current algorithms to understand natural disasters, damage, and incidents through images. To do this, we create the Incidents Dataset, train a detection model, and present applications to identify incidents in social media streams to inform emergency responders during disaster relief situations. (2) In the second project, we address the issue of costly dataset construction and present a novel framework that reduces the cost of creating large-scale instance annotation datasets. (3) In the third and final project, we move to 3D scene understanding and present an intuitive technique to train monocular depth estimation networks by enforcing consistency of multi-view geometric invariants between image pairs observing the same scene or objects from the same category.
first_indexed 2024-09-23T09:54:53Z
format Thesis
id mit-1721.1/139162
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T09:54:53Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1391622022-01-15T03:57:04Z Detecting incidents, accelerating dataset annotation, and estimating depth with multi-view invariants Weber, Ethan Torralba, Antonio Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Computer vision has seen incredible growth since the introduction of large datasets, deep neural networks, and modern computing resources. Current algorithms can perform scene understanding, or the ability to understand and interpret the world through visual perception (e.g., images or videos). In this thesis, we push the boundaries of current scene understanding algorithms with three distinct projects. (1) In the first project, we address limitations of current algorithms to understand natural disasters, damage, and incidents through images. To do this, we create the Incidents Dataset, train a detection model, and present applications to identify incidents in social media streams to inform emergency responders during disaster relief situations. (2) In the second project, we address the issue of costly dataset construction and present a novel framework that reduces the cost of creating large-scale instance annotation datasets. (3) In the third and final project, we move to 3D scene understanding and present an intuitive technique to train monocular depth estimation networks by enforcing consistency of multi-view geometric invariants between image pairs observing the same scene or objects from the same category. M.Eng. 2022-01-14T14:53:49Z 2022-01-14T14:53:49Z 2021-06 2021-06-17T20:14:42.007Z Thesis https://hdl.handle.net/1721.1/139162 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Weber, Ethan
Detecting incidents, accelerating dataset annotation, and estimating depth with multi-view invariants
title Detecting incidents, accelerating dataset annotation, and estimating depth with multi-view invariants
title_full Detecting incidents, accelerating dataset annotation, and estimating depth with multi-view invariants
title_fullStr Detecting incidents, accelerating dataset annotation, and estimating depth with multi-view invariants
title_full_unstemmed Detecting incidents, accelerating dataset annotation, and estimating depth with multi-view invariants
title_short Detecting incidents, accelerating dataset annotation, and estimating depth with multi-view invariants
title_sort detecting incidents accelerating dataset annotation and estimating depth with multi view invariants
url https://hdl.handle.net/1721.1/139162
work_keys_str_mv AT weberethan detectingincidentsacceleratingdatasetannotationandestimatingdepthwithmultiviewinvariants