Living in a dynamic world: semantic segmentation of large scale 3D environments

<p>As we navigate the world, for example when driving a car from our home to the work place, we continuously perceive the 3D structure of our surroundings and intuitively recognise the objects we see. Such capabilities help us in our everyday lives and enable free and accurate movement even in...

Full description

Bibliographic Details
Main Author:	Miksik, O
Other Authors:	Torr, P
Format:	Thesis
Language:	English
Published:	2017
Subjects:	Machine learning Mobile robotics Computer vision

_version_	1826315985530912768
author	Miksik, O
author2	Torr, P
author_facet	Torr, P Miksik, O
author_sort	Miksik, O
collection	OXFORD
description	<p>As we navigate the world, for example when driving a car from our home to the work place, we continuously perceive the 3D structure of our surroundings and intuitively recognise the objects we see. Such capabilities help us in our everyday lives and enable free and accurate movement even in completely unfamiliar places. We largely take these abilities for granted, but for robots, the task of understanding large outdoor scenes remains extremely challenging.</p> <p>In this thesis, I develop novel algorithms for (near) real-time dense 3D reconstruction and semantic segmentation of large-scale outdoor scenes from passive cameras. Motivated by "smart glasses" for partially sighted users, I show how such modeling can be integrated into an interactive augmented reality system which puts the user in the loop and allows her to physically interact with the world to learn personalized semantically segmented dense 3D models. In the next part, I show how sparse but very accurate 3D measurements can be incorporated directly into the dense depth estimation process and propose a probabilistic model for incremental dense scene reconstruction. To relax the assumption of a stereo camera, I address dense 3D reconstruction in its monocular form and show how the local model can be improved by joint optimization over depth and pose.</p> <p>The world around us is not stationary. However, reconstructing dynamically moving and potentially non-rigidly deforming texture-less objects typically require "contour correspondences" for shape-from-silhouettes. Hence, I propose a video segmentation model which encodes a single object instance as a closed curve, maintains correspondences across time and provide very accurate segmentation close to object boundaries.</p> <p>Finally, instead of evaluating the performance in an isolated setup (IoU scores) which does not measure the impact on decision-making, I show how semantic 3D reconstruction can be incorporated into standard Deep Q-learning to improve decision-making of agents navigating complex 3D environments.</p>
first_indexed	2024-03-06T20:03:10Z
format	Thesis
id	oxford-uuid:28050b9e-5e42-46b5-9a54-004450f812ec
institution	University of Oxford
language	English
last_indexed	2024-12-09T03:36:01Z
publishDate	2017
record_format	dspace
spelling	oxford-uuid:28050b9e-5e42-46b5-9a54-004450f812ec2024-12-01T19:36:29ZLiving in a dynamic world: semantic segmentation of large scale 3D environmentsThesishttp://purl.org/coar/resource_type/c_db06uuid:28050b9e-5e42-46b5-9a54-004450f812ecMachine learningMobile roboticsComputer visionEnglishORA Deposit2017Miksik, OTorr, PPerez, P<p>As we navigate the world, for example when driving a car from our home to the work place, we continuously perceive the 3D structure of our surroundings and intuitively recognise the objects we see. Such capabilities help us in our everyday lives and enable free and accurate movement even in completely unfamiliar places. We largely take these abilities for granted, but for robots, the task of understanding large outdoor scenes remains extremely challenging.</p> <p>In this thesis, I develop novel algorithms for (near) real-time dense 3D reconstruction and semantic segmentation of large-scale outdoor scenes from passive cameras. Motivated by "smart glasses" for partially sighted users, I show how such modeling can be integrated into an interactive augmented reality system which puts the user in the loop and allows her to physically interact with the world to learn personalized semantically segmented dense 3D models. In the next part, I show how sparse but very accurate 3D measurements can be incorporated directly into the dense depth estimation process and propose a probabilistic model for incremental dense scene reconstruction. To relax the assumption of a stereo camera, I address dense 3D reconstruction in its monocular form and show how the local model can be improved by joint optimization over depth and pose.</p> <p>The world around us is not stationary. However, reconstructing dynamically moving and potentially non-rigidly deforming texture-less objects typically require "contour correspondences" for shape-from-silhouettes. Hence, I propose a video segmentation model which encodes a single object instance as a closed curve, maintains correspondences across time and provide very accurate segmentation close to object boundaries.</p> <p>Finally, instead of evaluating the performance in an isolated setup (IoU scores) which does not measure the impact on decision-making, I show how semantic 3D reconstruction can be incorporated into standard Deep Q-learning to improve decision-making of agents navigating complex 3D environments.</p>
spellingShingle	Machine learning Mobile robotics Computer vision Miksik, O Living in a dynamic world: semantic segmentation of large scale 3D environments
title	Living in a dynamic world: semantic segmentation of large scale 3D environments
title_full	Living in a dynamic world: semantic segmentation of large scale 3D environments
title_fullStr	Living in a dynamic world: semantic segmentation of large scale 3D environments
title_full_unstemmed	Living in a dynamic world: semantic segmentation of large scale 3D environments
title_short	Living in a dynamic world: semantic segmentation of large scale 3D environments
title_sort	living in a dynamic world semantic segmentation of large scale 3d environments
topic	Machine learning Mobile robotics Computer vision
work_keys_str_mv	AT miksiko livinginadynamicworldsemanticsegmentationoflargescale3denvironments

Living in a dynamic world: semantic segmentation of large scale 3D environments

Similar Items