Effective representations for road scene understanding with scalable learning

<p>Autonomous vehicles require an accurate understanding of the scene for safe operation in real-world driving scenarios. This thesis examines and offers effective representations for road scene understanding based on in-situ perception, which can be employed directly for planning and decision...

Full description

Bibliographic Details
Main Author: Bruls, TAH
Other Authors: Newman, P
Format: Thesis
Language:English
Published: 2020
Subjects:
_version_ 1826315938093334528
author Bruls, TAH
author2 Newman, P
author_facet Newman, P
Bruls, TAH
author_sort Bruls, TAH
collection OXFORD
description <p>Autonomous vehicles require an accurate understanding of the scene for safe operation in real-world driving scenarios. This thesis examines and offers effective representations for road scene understanding based on in-situ perception, which can be employed directly for planning and decision making in a variety of complex urban environments and under wide-ranging environmental conditions.</p> <p>A common, versatile scene representation is the pixel-wise semantic segmentation obtained by deep neural networks. However, this representation is limited in its direct usefulness for several reasons. Firstly, the pixel-wise semantic segmentation does not naturally support the high-level reasoning required for complex driving manoeuvres. This thesis resolves that limitation by focussing on a hierarchical, graph-based representation, the scene graph, which combines segmented entities and object-centric perception in bird's-eye view at a suitable abstraction level for decision making. The road markings are a crucial prerequisite in this representation as their underlying meaning dictates the desired driving behaviour. Secondly, semantic segmentation often lacks this semantic understanding of the road markings due to the inordinate cost of labelling adequate training data. Instead, this thesis presents and compares a model-driven and data-driven approach for self-supervised road marking classification. Whereas the former leverages additional sensor modalities and domain knowledge, the latter employs state-of-the-art image-to-image translation techniques to synthesize training data. Thirdly, semantic segmentation is commonly performed in the front-facing perspective, which does not explicitly encode distances or directly link to the vehicle's action space. We tackle this problem by learning an improved bird's-eye-view mapping, called "boosted IPM", which aids scene graph generation in real-world scenarios.</p> <p>In addressing the above, we introduce scalable, self-supervised learning techniques by employing design principles such as transfer learning, leveraging domain knowledge, and data synthesis. Furthermore, we improve the robustness of the representations under wide-ranging environmental conditions by image restoration and learning appearance-invariant representations. The presented methodologies serve as a valuable starting point to devise effective and efficient representations for road scene understanding based on in-situ perception in real-world scenarios. </p>
first_indexed 2024-03-06T19:52:57Z
format Thesis
id oxford-uuid:24943f86-67db-44b6-95c4-1dcedd6422aa
institution University of Oxford
language English
last_indexed 2024-12-09T03:35:13Z
publishDate 2020
record_format dspace
spelling oxford-uuid:24943f86-67db-44b6-95c4-1dcedd6422aa2024-12-01T18:46:38ZEffective representations for road scene understanding with scalable learningThesishttp://purl.org/coar/resource_type/c_db06uuid:24943f86-67db-44b6-95c4-1dcedd6422aaRobot visionAutomated vehiclesEnglishHyrax Deposit2020Bruls, TAHNewman, PHawes, NNebot, E<p>Autonomous vehicles require an accurate understanding of the scene for safe operation in real-world driving scenarios. This thesis examines and offers effective representations for road scene understanding based on in-situ perception, which can be employed directly for planning and decision making in a variety of complex urban environments and under wide-ranging environmental conditions.</p> <p>A common, versatile scene representation is the pixel-wise semantic segmentation obtained by deep neural networks. However, this representation is limited in its direct usefulness for several reasons. Firstly, the pixel-wise semantic segmentation does not naturally support the high-level reasoning required for complex driving manoeuvres. This thesis resolves that limitation by focussing on a hierarchical, graph-based representation, the scene graph, which combines segmented entities and object-centric perception in bird's-eye view at a suitable abstraction level for decision making. The road markings are a crucial prerequisite in this representation as their underlying meaning dictates the desired driving behaviour. Secondly, semantic segmentation often lacks this semantic understanding of the road markings due to the inordinate cost of labelling adequate training data. Instead, this thesis presents and compares a model-driven and data-driven approach for self-supervised road marking classification. Whereas the former leverages additional sensor modalities and domain knowledge, the latter employs state-of-the-art image-to-image translation techniques to synthesize training data. Thirdly, semantic segmentation is commonly performed in the front-facing perspective, which does not explicitly encode distances or directly link to the vehicle's action space. We tackle this problem by learning an improved bird's-eye-view mapping, called "boosted IPM", which aids scene graph generation in real-world scenarios.</p> <p>In addressing the above, we introduce scalable, self-supervised learning techniques by employing design principles such as transfer learning, leveraging domain knowledge, and data synthesis. Furthermore, we improve the robustness of the representations under wide-ranging environmental conditions by image restoration and learning appearance-invariant representations. The presented methodologies serve as a valuable starting point to devise effective and efficient representations for road scene understanding based on in-situ perception in real-world scenarios. </p>
spellingShingle Robot vision
Automated vehicles
Bruls, TAH
Effective representations for road scene understanding with scalable learning
title Effective representations for road scene understanding with scalable learning
title_full Effective representations for road scene understanding with scalable learning
title_fullStr Effective representations for road scene understanding with scalable learning
title_full_unstemmed Effective representations for road scene understanding with scalable learning
title_short Effective representations for road scene understanding with scalable learning
title_sort effective representations for road scene understanding with scalable learning
topic Robot vision
Automated vehicles
work_keys_str_mv AT brulstah effectiverepresentationsforroadsceneunderstandingwithscalablelearning