Detecting incidents, accelerating dataset annotation, and estimating depth with multi-view invariants
Computer vision has seen incredible growth since the introduction of large datasets, deep neural networks, and modern computing resources. Current algorithms can perform scene understanding, or the ability to understand and interpret the world through visual perception (e.g., images or videos). In t...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/139162 |
_version_ | 1826194386523783168 |
---|---|
author | Weber, Ethan |
author2 | Torralba, Antonio |
author_facet | Torralba, Antonio Weber, Ethan |
author_sort | Weber, Ethan |
collection | MIT |
description | Computer vision has seen incredible growth since the introduction of large datasets, deep neural networks, and modern computing resources. Current algorithms can perform scene understanding, or the ability to understand and interpret the world through visual perception (e.g., images or videos). In this thesis, we push the boundaries of current scene understanding algorithms with three distinct projects. (1) In the first project, we address limitations of current algorithms to understand natural disasters, damage, and incidents through images. To do this, we create the Incidents Dataset, train a detection model, and present applications to identify incidents in social media streams to inform emergency responders during disaster relief situations. (2) In the second project, we address the issue of costly dataset construction and present a novel framework that reduces the cost of creating large-scale instance annotation datasets. (3) In the third and final project, we move to 3D scene understanding and present an intuitive technique to train monocular depth estimation networks by enforcing consistency of multi-view geometric invariants between image pairs observing the same scene or objects from the same category. |
first_indexed | 2024-09-23T09:54:53Z |
format | Thesis |
id | mit-1721.1/139162 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T09:54:53Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1391622022-01-15T03:57:04Z Detecting incidents, accelerating dataset annotation, and estimating depth with multi-view invariants Weber, Ethan Torralba, Antonio Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Computer vision has seen incredible growth since the introduction of large datasets, deep neural networks, and modern computing resources. Current algorithms can perform scene understanding, or the ability to understand and interpret the world through visual perception (e.g., images or videos). In this thesis, we push the boundaries of current scene understanding algorithms with three distinct projects. (1) In the first project, we address limitations of current algorithms to understand natural disasters, damage, and incidents through images. To do this, we create the Incidents Dataset, train a detection model, and present applications to identify incidents in social media streams to inform emergency responders during disaster relief situations. (2) In the second project, we address the issue of costly dataset construction and present a novel framework that reduces the cost of creating large-scale instance annotation datasets. (3) In the third and final project, we move to 3D scene understanding and present an intuitive technique to train monocular depth estimation networks by enforcing consistency of multi-view geometric invariants between image pairs observing the same scene or objects from the same category. M.Eng. 2022-01-14T14:53:49Z 2022-01-14T14:53:49Z 2021-06 2021-06-17T20:14:42.007Z Thesis https://hdl.handle.net/1721.1/139162 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Weber, Ethan Detecting incidents, accelerating dataset annotation, and estimating depth with multi-view invariants |
title | Detecting incidents, accelerating dataset annotation, and estimating depth with multi-view invariants |
title_full | Detecting incidents, accelerating dataset annotation, and estimating depth with multi-view invariants |
title_fullStr | Detecting incidents, accelerating dataset annotation, and estimating depth with multi-view invariants |
title_full_unstemmed | Detecting incidents, accelerating dataset annotation, and estimating depth with multi-view invariants |
title_short | Detecting incidents, accelerating dataset annotation, and estimating depth with multi-view invariants |
title_sort | detecting incidents accelerating dataset annotation and estimating depth with multi view invariants |
url | https://hdl.handle.net/1721.1/139162 |
work_keys_str_mv | AT weberethan detectingincidentsacceleratingdatasetannotationandestimatingdepthwithmultiviewinvariants |