Learning to understand large-scale 3D point clouds

<p>Giving machines the ability to precisely perceive and understand the 3D visual world is the fundamental step to allow them to interact competently within our physical world. However, the research on large-scale 3D scene understanding and perception is still in its infancy, due to the comple...

Full description

Bibliographic Details
Main Author: Qingyong, H
Other Authors: Trigoni, A
Format: Thesis
Language:English
Published: 2022
Subjects:
_version_ 1797109441585217536
author Qingyong, H
author2 Trigoni, A
author_facet Trigoni, A
Qingyong, H
author_sort Qingyong, H
collection OXFORD
description <p>Giving machines the ability to precisely perceive and understand the 3D visual world is the fundamental step to allow them to interact competently within our physical world. However, the research on large-scale 3D scene understanding and perception is still in its infancy, due to the complex geometrical structure of 3D shapes and limited high-quality data resources. Among various 3D representations, point clouds have attracted increasing attention due to its flexibility, compactness, and the characteristic of being close to raw sensory data. Despite this, the semantic understanding of large-scale 3D point clouds remains a challenge due to its orderless, unstructured, and non-uniform proper-ties. To this end, this thesis makes three core contributions from a high-quality urban-scale dataset, and then fully-supervised semantic understanding to weakly supervised label efficient learning of large-scale 3D point clouds.</p> <p>The main contributions of this thesis are three-fold. In Chapter 3, we start by building an urban-scale photogrammetric point cloud dataset with nearly three billion richly annotated points. We also identify a number of open and unique challenges faced by urban-scale 3D scene understanding and conduct comprehensive experimental analysis to address these challenges. This work points out the limitations of existing algorithms and provides several insightful conclusions for understanding large-scale 3D point clouds in urban environments.</p> <p>With high-quality large-scale 3D datasets, we further investigate the research problem of efficient semantic segmentation of large-scale 3D point clouds in Chapter 4. We begin by analyzing the strengths and weaknesses of existing downsampling strategies and find that random sampling is a suitable component for efficient learning on large-scale point clouds. Additionally, we propose a local feature aggregation module to increase the receptive field hierarchically and preserve important features. We then build an efficient, lightweight neural architecture called RandLA-Net that can directly infer per-point semantics for large-scale point clouds with millions of points that span hundreds of meters.</p> <p>In Chapter 5, we take a step further to investigate the label-efficient learning of large-scale 3D point clouds. That is, achieving high-quality semantic segmentation with limited annotations. We first examine key issues in weakly supervised learning of 3D point clouds, including different weak supervision schemes and the critical point of weak annotations. Through a pilot study, we find that dense 3D annotations are actually redundant and unnecessary. Motivated by this, we propose a novel weak supervision framework that implicitly augments the total amount of available supervision signals by leveraging the semantic homogeneity between neighboring points. This is achieved using a point neighborhood query to allow the sparse training signals to be back-propagated to a wider context.</p> <p>All of the algorithms and datasets presented in this thesis have been open-source on GitHub to facilitate future research. The RandLA-Net algorithm was recognized as one of the most influential papers at the CVPR 2020 conference and has been integrated into code libraries such as Open3D and TorchPoints3D. The SensatUrban dataset was used as the platform for two Urban3D challenges, contributing to the advancement of semantic understanding of city-scale point clouds.</p> <p>Overall, this thesis presents a high-quality dataset and two novel data-driven algorithms, aiming to achieve efficient, scalable, and effective learning-based semantic understanding of large-scale 3D point clouds, eventually improving the real-time 3D perception capacity of intelligent machines in practice.</p>
first_indexed 2024-03-07T07:40:25Z
format Thesis
id oxford-uuid:19e2b720-831a-48e1-9db3-b1a614a2a336
institution University of Oxford
language English
last_indexed 2024-03-07T07:40:25Z
publishDate 2022
record_format dspace
spelling oxford-uuid:19e2b720-831a-48e1-9db3-b1a614a2a3362023-04-28T07:55:22ZLearning to understand large-scale 3D point cloudsThesishttp://purl.org/coar/resource_type/c_db06uuid:19e2b720-831a-48e1-9db3-b1a614a2a336RoboticsDeep learning (Machine learning)Computer visionEnglishHyrax Deposit2022Qingyong, HTrigoni, AMarkham, AKwiatkowska, MCalway, A<p>Giving machines the ability to precisely perceive and understand the 3D visual world is the fundamental step to allow them to interact competently within our physical world. However, the research on large-scale 3D scene understanding and perception is still in its infancy, due to the complex geometrical structure of 3D shapes and limited high-quality data resources. Among various 3D representations, point clouds have attracted increasing attention due to its flexibility, compactness, and the characteristic of being close to raw sensory data. Despite this, the semantic understanding of large-scale 3D point clouds remains a challenge due to its orderless, unstructured, and non-uniform proper-ties. To this end, this thesis makes three core contributions from a high-quality urban-scale dataset, and then fully-supervised semantic understanding to weakly supervised label efficient learning of large-scale 3D point clouds.</p> <p>The main contributions of this thesis are three-fold. In Chapter 3, we start by building an urban-scale photogrammetric point cloud dataset with nearly three billion richly annotated points. We also identify a number of open and unique challenges faced by urban-scale 3D scene understanding and conduct comprehensive experimental analysis to address these challenges. This work points out the limitations of existing algorithms and provides several insightful conclusions for understanding large-scale 3D point clouds in urban environments.</p> <p>With high-quality large-scale 3D datasets, we further investigate the research problem of efficient semantic segmentation of large-scale 3D point clouds in Chapter 4. We begin by analyzing the strengths and weaknesses of existing downsampling strategies and find that random sampling is a suitable component for efficient learning on large-scale point clouds. Additionally, we propose a local feature aggregation module to increase the receptive field hierarchically and preserve important features. We then build an efficient, lightweight neural architecture called RandLA-Net that can directly infer per-point semantics for large-scale point clouds with millions of points that span hundreds of meters.</p> <p>In Chapter 5, we take a step further to investigate the label-efficient learning of large-scale 3D point clouds. That is, achieving high-quality semantic segmentation with limited annotations. We first examine key issues in weakly supervised learning of 3D point clouds, including different weak supervision schemes and the critical point of weak annotations. Through a pilot study, we find that dense 3D annotations are actually redundant and unnecessary. Motivated by this, we propose a novel weak supervision framework that implicitly augments the total amount of available supervision signals by leveraging the semantic homogeneity between neighboring points. This is achieved using a point neighborhood query to allow the sparse training signals to be back-propagated to a wider context.</p> <p>All of the algorithms and datasets presented in this thesis have been open-source on GitHub to facilitate future research. The RandLA-Net algorithm was recognized as one of the most influential papers at the CVPR 2020 conference and has been integrated into code libraries such as Open3D and TorchPoints3D. The SensatUrban dataset was used as the platform for two Urban3D challenges, contributing to the advancement of semantic understanding of city-scale point clouds.</p> <p>Overall, this thesis presents a high-quality dataset and two novel data-driven algorithms, aiming to achieve efficient, scalable, and effective learning-based semantic understanding of large-scale 3D point clouds, eventually improving the real-time 3D perception capacity of intelligent machines in practice.</p>
spellingShingle Robotics
Deep learning (Machine learning)
Computer vision
Qingyong, H
Learning to understand large-scale 3D point clouds
title Learning to understand large-scale 3D point clouds
title_full Learning to understand large-scale 3D point clouds
title_fullStr Learning to understand large-scale 3D point clouds
title_full_unstemmed Learning to understand large-scale 3D point clouds
title_short Learning to understand large-scale 3D point clouds
title_sort learning to understand large scale 3d point clouds
topic Robotics
Deep learning (Machine learning)
Computer vision
work_keys_str_mv AT qingyongh learningtounderstandlargescale3dpointclouds