Self-Supervised Monocular Depth Learning in Low-Texture Areas

For the task of monocular depth estimation, self-supervised learning supervises training by calculating the pixel difference between the target image and the warped reference image, obtaining results comparable to those with full supervision. However, the problematic pixels in low-texture regions ar...

Full description

Bibliographic Details
Main Authors:	Wanpeng Xu, Ling Zou, Lingda Wu, Zhipeng Fu
Format:	Article
Language:	English
Published:	MDPI AG 2021-04-01
Series:	Remote Sensing
Subjects:	self-supervised depth estimation low-texture
Online Access:	https://www.mdpi.com/2072-4292/13/9/1673

_version_	1797536330560831488
author	Wanpeng Xu Ling Zou Lingda Wu Zhipeng Fu
author_facet	Wanpeng Xu Ling Zou Lingda Wu Zhipeng Fu
author_sort	Wanpeng Xu
collection	DOAJ
description	For the task of monocular depth estimation, self-supervised learning supervises training by calculating the pixel difference between the target image and the warped reference image, obtaining results comparable to those with full supervision. However, the problematic pixels in low-texture regions are ignored, since most researchers think that no pixels violate the assumption of camera motion, taking stereo pairs as the input in self-supervised learning, which leads to the optimization problem in these regions. To tackle this problem, we perform photometric loss using the lowest-level feature maps instead and implement first- and second-order smoothing to the depth, ensuring consistent gradients ring optimization. Given the shortcomings of ResNet as the backbone, we propose a new depth estimation network architecture to improve edge location accuracy and obtain clear outline information even in smoothed low-texture boundaries. To acquire more stable and reliable quantitative evaluation results, we introce a virtual data set in the self-supervised task because these have dense depth maps corresponding to pixel by pixel. We achieve performance that exceeds that of the prior methods on both the Eigen Splits of the KITTI and VKITTI2 data sets taking stereo pairs as the input.
first_indexed	2024-03-10T11:58:11Z
format	Article
id	doaj.art-a6df0433cace480f8d2dfc9bd799fb57
institution	Directory Open Access Journal
issn	2072-4292
language	English
last_indexed	2024-03-10T11:58:11Z
publishDate	2021-04-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj.art-a6df0433cace480f8d2dfc9bd799fb572023-11-21T17:10:04ZengMDPI AGRemote Sensing2072-42922021-04-01139167310.3390/rs13091673Self-Supervised Monocular Depth Learning in Low-Texture AreasWanpeng Xu0Ling Zou1Lingda Wu2Zhipeng Fu3Science and Technology on Complex Electronic System Simulation Laboratory, Space Engineering University, Beijing 101416, ChinaDigital Media School, Beijing Film Academy, Beijing 100088, ChinaScience and Technology on Complex Electronic System Simulation Laboratory, Space Engineering University, Beijing 101416, ChinaPeng Cheng Laboratory, Shenzhen 518055, ChinaFor the task of monocular depth estimation, self-supervised learning supervises training by calculating the pixel difference between the target image and the warped reference image, obtaining results comparable to those with full supervision. However, the problematic pixels in low-texture regions are ignored, since most researchers think that no pixels violate the assumption of camera motion, taking stereo pairs as the input in self-supervised learning, which leads to the optimization problem in these regions. To tackle this problem, we perform photometric loss using the lowest-level feature maps instead and implement first- and second-order smoothing to the depth, ensuring consistent gradients ring optimization. Given the shortcomings of ResNet as the backbone, we propose a new depth estimation network architecture to improve edge location accuracy and obtain clear outline information even in smoothed low-texture boundaries. To acquire more stable and reliable quantitative evaluation results, we introce a virtual data set in the self-supervised task because these have dense depth maps corresponding to pixel by pixel. We achieve performance that exceeds that of the prior methods on both the Eigen Splits of the KITTI and VKITTI2 data sets taking stereo pairs as the input.https://www.mdpi.com/2072-4292/13/9/1673self-superviseddepth estimationlow-texture
spellingShingle	Wanpeng Xu Ling Zou Lingda Wu Zhipeng Fu Self-Supervised Monocular Depth Learning in Low-Texture Areas Remote Sensing self-supervised depth estimation low-texture
title	Self-Supervised Monocular Depth Learning in Low-Texture Areas
title_full	Self-Supervised Monocular Depth Learning in Low-Texture Areas
title_fullStr	Self-Supervised Monocular Depth Learning in Low-Texture Areas
title_full_unstemmed	Self-Supervised Monocular Depth Learning in Low-Texture Areas
title_short	Self-Supervised Monocular Depth Learning in Low-Texture Areas
title_sort	self supervised monocular depth learning in low texture areas
topic	self-supervised depth estimation low-texture
url	https://www.mdpi.com/2072-4292/13/9/1673
work_keys_str_mv	AT wanpengxu selfsupervisedmonoculardepthlearninginlowtextureareas AT lingzou selfsupervisedmonoculardepthlearninginlowtextureareas AT lingdawu selfsupervisedmonoculardepthlearninginlowtextureareas AT zhipengfu selfsupervisedmonoculardepthlearninginlowtextureareas

Self-Supervised Monocular Depth Learning in Low-Texture Areas

Similar Items