Living in a dynamic world: semantic segmentation of large scale 3D environments

<p>As we navigate the world, for example when driving a car from our home to the work place, we continuously perceive the 3D structure of our surroundings and intuitively recognise the objects we see. Such capabilities help us in our everyday lives and enable free and accurate movement even in...

Full description

Bibliographic Details
Main Author: Miksik, O
Other Authors: Torr, P
Format: Thesis
Language:English
Published: 2017
Subjects:
_version_ 1826315985530912768
author Miksik, O
author2 Torr, P
author_facet Torr, P
Miksik, O
author_sort Miksik, O
collection OXFORD
description <p>As we navigate the world, for example when driving a car from our home to the work place, we continuously perceive the 3D structure of our surroundings and intuitively recognise the objects we see. Such capabilities help us in our everyday lives and enable free and accurate movement even in completely unfamiliar places. We largely take these abilities for granted, but for robots, the task of understanding large outdoor scenes remains extremely challenging.</p> <p>In this thesis, I develop novel algorithms for (near) real-time dense 3D reconstruction and semantic segmentation of large-scale outdoor scenes from passive cameras. Motivated by "smart glasses" for partially sighted users, I show how such modeling can be integrated into an interactive augmented reality system which puts the user in the loop and allows her to physically interact with the world to learn personalized semantically segmented dense 3D models. In the next part, I show how sparse but very accurate 3D measurements can be incorporated directly into the dense depth estimation process and propose a probabilistic model for incremental dense scene reconstruction. To relax the assumption of a stereo camera, I address dense 3D reconstruction in its monocular form and show how the local model can be improved by joint optimization over depth and pose.</p> <p>The world around us is not stationary. However, reconstructing dynamically moving and potentially non-rigidly deforming texture-less objects typically require "contour correspondences" for shape-from-silhouettes. Hence, I propose a video segmentation model which encodes a single object instance as a closed curve, maintains correspondences across time and provide very accurate segmentation close to object boundaries.</p> <p>Finally, instead of evaluating the performance in an isolated setup (IoU scores) which does not measure the impact on decision-making, I show how semantic 3D reconstruction can be incorporated into standard Deep Q-learning to improve decision-making of agents navigating complex 3D environments.</p>
first_indexed 2024-03-06T20:03:10Z
format Thesis
id oxford-uuid:28050b9e-5e42-46b5-9a54-004450f812ec
institution University of Oxford
language English
last_indexed 2024-12-09T03:36:01Z
publishDate 2017
record_format dspace
spelling oxford-uuid:28050b9e-5e42-46b5-9a54-004450f812ec2024-12-01T19:36:29ZLiving in a dynamic world: semantic segmentation of large scale 3D environmentsThesishttp://purl.org/coar/resource_type/c_db06uuid:28050b9e-5e42-46b5-9a54-004450f812ecMachine learningMobile roboticsComputer visionEnglishORA Deposit2017Miksik, OTorr, PPerez, P<p>As we navigate the world, for example when driving a car from our home to the work place, we continuously perceive the 3D structure of our surroundings and intuitively recognise the objects we see. Such capabilities help us in our everyday lives and enable free and accurate movement even in completely unfamiliar places. We largely take these abilities for granted, but for robots, the task of understanding large outdoor scenes remains extremely challenging.</p> <p>In this thesis, I develop novel algorithms for (near) real-time dense 3D reconstruction and semantic segmentation of large-scale outdoor scenes from passive cameras. Motivated by "smart glasses" for partially sighted users, I show how such modeling can be integrated into an interactive augmented reality system which puts the user in the loop and allows her to physically interact with the world to learn personalized semantically segmented dense 3D models. In the next part, I show how sparse but very accurate 3D measurements can be incorporated directly into the dense depth estimation process and propose a probabilistic model for incremental dense scene reconstruction. To relax the assumption of a stereo camera, I address dense 3D reconstruction in its monocular form and show how the local model can be improved by joint optimization over depth and pose.</p> <p>The world around us is not stationary. However, reconstructing dynamically moving and potentially non-rigidly deforming texture-less objects typically require "contour correspondences" for shape-from-silhouettes. Hence, I propose a video segmentation model which encodes a single object instance as a closed curve, maintains correspondences across time and provide very accurate segmentation close to object boundaries.</p> <p>Finally, instead of evaluating the performance in an isolated setup (IoU scores) which does not measure the impact on decision-making, I show how semantic 3D reconstruction can be incorporated into standard Deep Q-learning to improve decision-making of agents navigating complex 3D environments.</p>
spellingShingle Machine learning
Mobile robotics
Computer vision
Miksik, O
Living in a dynamic world: semantic segmentation of large scale 3D environments
title Living in a dynamic world: semantic segmentation of large scale 3D environments
title_full Living in a dynamic world: semantic segmentation of large scale 3D environments
title_fullStr Living in a dynamic world: semantic segmentation of large scale 3D environments
title_full_unstemmed Living in a dynamic world: semantic segmentation of large scale 3D environments
title_short Living in a dynamic world: semantic segmentation of large scale 3D environments
title_sort living in a dynamic world semantic segmentation of large scale 3d environments
topic Machine learning
Mobile robotics
Computer vision
work_keys_str_mv AT miksiko livinginadynamicworldsemanticsegmentationoflargescale3denvironments