Learning dense prediction: from correspondence to segmentation

<p>Dense prediction is the task of predicting a label for each pixel in the image. Given 3D data (point clouds or RGB-D images) as input, dense prediction can also be extended to 3D space and assign each 3D point/location a label. According to the label type, dense prediction can be mainly cat...

Full description

Bibliographic Details
Main Author: Zhang, F
Other Authors: Torr, P
Format: Thesis
Language:English
Published: 2022
Subjects:
_version_ 1797109591735009280
author Zhang, F
author2 Torr, P
author_facet Torr, P
Zhang, F
author_sort Zhang, F
collection OXFORD
description <p>Dense prediction is the task of predicting a label for each pixel in the image. Given 3D data (point clouds or RGB-D images) as input, dense prediction can also be extended to 3D space and assign each 3D point/location a label. According to the label type, dense prediction can be mainly categorized as depth estimation, motion prediction, segmentation, and other related tasks. There are four major challenges for learning dense predictions: i) how to significantly improve the accuracy and resolve the ambiguous regions, ii) high memory and computational costs, iii) the dependency on a large amount of labeled data for training, and iv) the poor cross-domain generalization to novel datasets.</p> <p>This integrated thesis focuses on dense prediction tasks, from correspondence estimation (stereo matching and optical flow) to 2D/3D semantic segmentation. Seven robust deep neural network models are proposed to achieve state-of-the-art accuracy, to realize effective training with just synthetic data or unlabeled real data, and to boost the cross-domain generalization to various unseen datasets.</p> <p>For the first task, traditional 3D geometry constraints are embedded into end-to-end trainable stereo matching networks to achieve state-of-the-art accuracy on two stereo matching benchmarks (by publication date). Based on this work, a domain-invariant stereo matching network is proposed. It is trained on the synthetic data but outperforms many models fine-tuned on real data. For the second task, a Separable Flow network is developed for optical flow estimation, which ranks the first on two standard optical flow benchmarks (by the time of publication). It's also one of the best methods for predicting optical flow on various unseen datasets. Moreover, research is also conducted on unsupervised pre-training and domain adaptation for semantic image segmentation. Finally, the 2D image segmentation knowledge is further leveraged for tackling 3D segmentation. The proposed 3D segmentation networks achieve the leading position on large-scale point-cloud segmentation benchmarks (at the time of publication).</p>
first_indexed 2024-03-07T07:42:20Z
format Thesis
id oxford-uuid:51cb4805-f932-41dd-9dab-d2c64b932ce7
institution University of Oxford
language English
last_indexed 2024-03-07T07:42:20Z
publishDate 2022
record_format dspace
spelling oxford-uuid:51cb4805-f932-41dd-9dab-d2c64b932ce72023-05-15T14:11:01ZLearning dense prediction: from correspondence to segmentationThesishttp://purl.org/coar/resource_type/c_db06uuid:51cb4805-f932-41dd-9dab-d2c64b932ce7Computer visionDeep learning (Machine learning)EnglishHyrax Deposit2022Zhang, FTorr, PPrisacariu, V<p>Dense prediction is the task of predicting a label for each pixel in the image. Given 3D data (point clouds or RGB-D images) as input, dense prediction can also be extended to 3D space and assign each 3D point/location a label. According to the label type, dense prediction can be mainly categorized as depth estimation, motion prediction, segmentation, and other related tasks. There are four major challenges for learning dense predictions: i) how to significantly improve the accuracy and resolve the ambiguous regions, ii) high memory and computational costs, iii) the dependency on a large amount of labeled data for training, and iv) the poor cross-domain generalization to novel datasets.</p> <p>This integrated thesis focuses on dense prediction tasks, from correspondence estimation (stereo matching and optical flow) to 2D/3D semantic segmentation. Seven robust deep neural network models are proposed to achieve state-of-the-art accuracy, to realize effective training with just synthetic data or unlabeled real data, and to boost the cross-domain generalization to various unseen datasets.</p> <p>For the first task, traditional 3D geometry constraints are embedded into end-to-end trainable stereo matching networks to achieve state-of-the-art accuracy on two stereo matching benchmarks (by publication date). Based on this work, a domain-invariant stereo matching network is proposed. It is trained on the synthetic data but outperforms many models fine-tuned on real data. For the second task, a Separable Flow network is developed for optical flow estimation, which ranks the first on two standard optical flow benchmarks (by the time of publication). It's also one of the best methods for predicting optical flow on various unseen datasets. Moreover, research is also conducted on unsupervised pre-training and domain adaptation for semantic image segmentation. Finally, the 2D image segmentation knowledge is further leveraged for tackling 3D segmentation. The proposed 3D segmentation networks achieve the leading position on large-scale point-cloud segmentation benchmarks (at the time of publication).</p>
spellingShingle Computer vision
Deep learning (Machine learning)
Zhang, F
Learning dense prediction: from correspondence to segmentation
title Learning dense prediction: from correspondence to segmentation
title_full Learning dense prediction: from correspondence to segmentation
title_fullStr Learning dense prediction: from correspondence to segmentation
title_full_unstemmed Learning dense prediction: from correspondence to segmentation
title_short Learning dense prediction: from correspondence to segmentation
title_sort learning dense prediction from correspondence to segmentation
topic Computer vision
Deep learning (Machine learning)
work_keys_str_mv AT zhangf learningdensepredictionfromcorrespondencetosegmentation