Learning Depth for Scene Reconstruction Using an Encoder-Decoder Model

Depth estimation has received considerable attention and is often applied to visual simultaneous localization and mapping (SLAM) for scene reconstruction. At least to our knowledge, sufficiently reliable depth always fails to be provided for monocular depth estimation-based SLAM because new image fe...

Full description

Bibliographic Details
Main Authors:	Xiaohan Tu, Cheng Xu, Siping Liu, Guoqi Xie, Jing Huang, Renfa Li, Junsong Yuan
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Convolutional neural networks depth estimation decoder encoder simultaneous localization and mapping
Online Access:	https://ieeexplore.ieee.org/document/9091077/

_version_	1818428868811292672
author	Xiaohan Tu Cheng Xu Siping Liu Guoqi Xie Jing Huang Renfa Li Junsong Yuan
author_facet	Xiaohan Tu Cheng Xu Siping Liu Guoqi Xie Jing Huang Renfa Li Junsong Yuan
author_sort	Xiaohan Tu
collection	DOAJ
description	Depth estimation has received considerable attention and is often applied to visual simultaneous localization and mapping (SLAM) for scene reconstruction. At least to our knowledge, sufficiently reliable depth always fails to be provided for monocular depth estimation-based SLAM because new image features are rarely re-exploited effectively, local features are easily lost, and relative depth relationships among depth pixels are readily ignored in previous depth estimation methods. Based on inaccurate monocular depth estimation, SLAM still faces scale ambiguity problems. To accurately achieve scene reconstruction based on monocular depth estimation, this paper makes three contributions. (1) We design a depth estimation model (DEM), consisting of a precise encoder to re-exploit new features and a decoder to learn local features effectively. (2) We propose a loss function using the depth relationship of pixels to guide the training of DEM. (3) We design a modular SLAM system containing DEM, feature detection, descriptor computation, feature matching, pose prediction, keyframe extraction, loop closure detection, and pose-graph optimization for pixel-level scene reconstruction. Extensive experiments demonstrate that the DEM and DEM-based SLAM are effective. (1) Our DEM predicts more reliable depth than the state of the arts when inputs are RGB images, sparse depth, or the fusion of both on public datasets. (2) The DEM-based SLAM system achieves comparable accuracy as compared with well-known modular SLAM systems.
first_indexed	2024-12-14T15:08:28Z
format	Article
id	doaj.art-092e86a694be4cf58fd61a3d11cd4983
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-14T15:08:28Z
publishDate	2020-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-092e86a694be4cf58fd61a3d11cd49832022-12-21T22:56:38ZengIEEEIEEE Access2169-35362020-01-018893008931710.1109/ACCESS.2020.29934949091077Learning Depth for Scene Reconstruction Using an Encoder-Decoder ModelXiaohan Tu0https://orcid.org/0000-0002-4330-240XCheng Xu1https://orcid.org/0000-0002-1323-3175Siping Liu2https://orcid.org/0000-0003-0019-5154Guoqi Xie3https://orcid.org/0000-0001-6625-0350Jing Huang4https://orcid.org/0000-0001-8812-2691Renfa Li5https://orcid.org/0000-0003-4573-7375Junsong Yuan6https://orcid.org/0000-0002-7901-8793Key Laboratory for Embedded and Network Computing of Hunan Province, Changsha, ChinaKey Laboratory for Embedded and Network Computing of Hunan Province, Changsha, ChinaKey Laboratory for Embedded and Network Computing of Hunan Province, Changsha, ChinaKey Laboratory for Embedded and Network Computing of Hunan Province, Changsha, ChinaKey Laboratory for Embedded and Network Computing of Hunan Province, Changsha, ChinaKey Laboratory for Embedded and Network Computing of Hunan Province, Changsha, ChinaDepartment of Computer Science and Engineering, State University of New~York at Buffalo, Buffalo, NY, USADepth estimation has received considerable attention and is often applied to visual simultaneous localization and mapping (SLAM) for scene reconstruction. At least to our knowledge, sufficiently reliable depth always fails to be provided for monocular depth estimation-based SLAM because new image features are rarely re-exploited effectively, local features are easily lost, and relative depth relationships among depth pixels are readily ignored in previous depth estimation methods. Based on inaccurate monocular depth estimation, SLAM still faces scale ambiguity problems. To accurately achieve scene reconstruction based on monocular depth estimation, this paper makes three contributions. (1) We design a depth estimation model (DEM), consisting of a precise encoder to re-exploit new features and a decoder to learn local features effectively. (2) We propose a loss function using the depth relationship of pixels to guide the training of DEM. (3) We design a modular SLAM system containing DEM, feature detection, descriptor computation, feature matching, pose prediction, keyframe extraction, loop closure detection, and pose-graph optimization for pixel-level scene reconstruction. Extensive experiments demonstrate that the DEM and DEM-based SLAM are effective. (1) Our DEM predicts more reliable depth than the state of the arts when inputs are RGB images, sparse depth, or the fusion of both on public datasets. (2) The DEM-based SLAM system achieves comparable accuracy as compared with well-known modular SLAM systems.https://ieeexplore.ieee.org/document/9091077/Convolutional neural networksdepth estimationdecoderencodersimultaneous localization and mapping
spellingShingle	Xiaohan Tu Cheng Xu Siping Liu Guoqi Xie Jing Huang Renfa Li Junsong Yuan Learning Depth for Scene Reconstruction Using an Encoder-Decoder Model IEEE Access Convolutional neural networks depth estimation decoder encoder simultaneous localization and mapping
title	Learning Depth for Scene Reconstruction Using an Encoder-Decoder Model
title_full	Learning Depth for Scene Reconstruction Using an Encoder-Decoder Model
title_fullStr	Learning Depth for Scene Reconstruction Using an Encoder-Decoder Model
title_full_unstemmed	Learning Depth for Scene Reconstruction Using an Encoder-Decoder Model
title_short	Learning Depth for Scene Reconstruction Using an Encoder-Decoder Model
title_sort	learning depth for scene reconstruction using an encoder decoder model
topic	Convolutional neural networks depth estimation decoder encoder simultaneous localization and mapping
url	https://ieeexplore.ieee.org/document/9091077/
work_keys_str_mv	AT xiaohantu learningdepthforscenereconstructionusinganencoderdecodermodel AT chengxu learningdepthforscenereconstructionusinganencoderdecodermodel AT sipingliu learningdepthforscenereconstructionusinganencoderdecodermodel AT guoqixie learningdepthforscenereconstructionusinganencoderdecodermodel AT jinghuang learningdepthforscenereconstructionusinganencoderdecodermodel AT renfali learningdepthforscenereconstructionusinganencoderdecodermodel AT junsongyuan learningdepthforscenereconstructionusinganencoderdecodermodel

Learning Depth for Scene Reconstruction Using an Encoder-Decoder Model

Similar Items