Deep Monocular Depth Estimation Based on Content and Contextual Features

Recently, significant progress has been achieved in developing deep learning-based approaches for estimating depth maps from monocular images. However, many existing methods rely on content and structure information extracted from RGB photographs, which often results in inaccurate depth estimation,...

Full description

Bibliographic Details
Main Authors:	Saddam Abdulwahab, Hatem A. Rashwan, Najwa Sharaf, Saif Khalid, Domenec Puig
Format:	Article
Language:	English
Published:	MDPI AG 2023-03-01
Series:	Sensors
Subjects:	deep learning monocular depth estimation autoencoder network contextual semantic information
Online Access:	https://www.mdpi.com/1424-8220/23/6/2919

_version_	1797609149221044224
author	Saddam Abdulwahab Hatem A. Rashwan Najwa Sharaf Saif Khalid Domenec Puig
author_facet	Saddam Abdulwahab Hatem A. Rashwan Najwa Sharaf Saif Khalid Domenec Puig
author_sort	Saddam Abdulwahab
collection	DOAJ
description	Recently, significant progress has been achieved in developing deep learning-based approaches for estimating depth maps from monocular images. However, many existing methods rely on content and structure information extracted from RGB photographs, which often results in inaccurate depth estimation, particularly for regions with low texture or occlusions. To overcome these limitations, we propose a novel method that exploits contextual semantic information to predict precise depth maps from monocular images. Our approach leverages a deep autoencoder network incorporating high-quality semantic features from the state-of-the-art HRNet-v2 semantic segmentation model. By feeding the autoencoder network with these features, our method can effectively preserve the discontinuities of the depth images and enhance monocular depth estimation. Specifically, we exploit the semantic features related to the localization and boundaries of the objects in the image to improve the accuracy and robustness of the depth estimation. To validate the effectiveness of our approach, we tested our model on two publicly available datasets, NYU Depth v2 and SUN RGB-D. Our method outperformed several state-of-the-art monocular depth estimation techniques, achieving an accuracy of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>85</mn><mo>%</mo></mrow></semantics></math></inline-formula>, while minimizing the error <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>R</mi><mi>e</mi><mi>l</mi></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.12</mn></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>R</mi><mi>M</mi><mi>S</mi></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.523</mn></mrow></semantics></math></inline-formula>, and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>10</mn></msub></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0527</mn></mrow></semantics></math></inline-formula>. Our approach also demonstrated exceptional performance in preserving object boundaries and faithfully detecting small object structures in the scene.
first_indexed	2024-03-11T05:56:30Z
format	Article
id	doaj.art-b30a2329672d4c058cdf84dad908f0e6
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-11T05:56:30Z
publishDate	2023-03-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-b30a2329672d4c058cdf84dad908f0e62023-11-17T13:43:22ZengMDPI AGSensors1424-82202023-03-01236291910.3390/s23062919Deep Monocular Depth Estimation Based on Content and Contextual FeaturesSaddam Abdulwahab0Hatem A. Rashwan1Najwa Sharaf2Saif Khalid3Domenec Puig4Department of Computer Engineering and Mathematics, Universitat Rovira i Virgil, Campus Sescelades, Avinguda dels Paisos Catalans, 26, 43007 Tarragona, SpainDepartment of Computer Engineering and Mathematics, Universitat Rovira i Virgil, Campus Sescelades, Avinguda dels Paisos Catalans, 26, 43007 Tarragona, SpainDepartment of Computer Engineering and Mathematics, Universitat Rovira i Virgil, Campus Sescelades, Avinguda dels Paisos Catalans, 26, 43007 Tarragona, SpainDepartment of Computer Engineering and Mathematics, Universitat Rovira i Virgil, Campus Sescelades, Avinguda dels Paisos Catalans, 26, 43007 Tarragona, SpainDepartment of Computer Engineering and Mathematics, Universitat Rovira i Virgil, Campus Sescelades, Avinguda dels Paisos Catalans, 26, 43007 Tarragona, SpainRecently, significant progress has been achieved in developing deep learning-based approaches for estimating depth maps from monocular images. However, many existing methods rely on content and structure information extracted from RGB photographs, which often results in inaccurate depth estimation, particularly for regions with low texture or occlusions. To overcome these limitations, we propose a novel method that exploits contextual semantic information to predict precise depth maps from monocular images. Our approach leverages a deep autoencoder network incorporating high-quality semantic features from the state-of-the-art HRNet-v2 semantic segmentation model. By feeding the autoencoder network with these features, our method can effectively preserve the discontinuities of the depth images and enhance monocular depth estimation. Specifically, we exploit the semantic features related to the localization and boundaries of the objects in the image to improve the accuracy and robustness of the depth estimation. To validate the effectiveness of our approach, we tested our model on two publicly available datasets, NYU Depth v2 and SUN RGB-D. Our method outperformed several state-of-the-art monocular depth estimation techniques, achieving an accuracy of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>85</mn><mo>%</mo></mrow></semantics></math></inline-formula>, while minimizing the error <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>R</mi><mi>e</mi><mi>l</mi></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.12</mn></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>R</mi><mi>M</mi><mi>S</mi></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.523</mn></mrow></semantics></math></inline-formula>, and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>10</mn></msub></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0527</mn></mrow></semantics></math></inline-formula>. Our approach also demonstrated exceptional performance in preserving object boundaries and faithfully detecting small object structures in the scene.https://www.mdpi.com/1424-8220/23/6/2919deep learningmonocular depth estimationautoencoder networkcontextual semantic information
spellingShingle	Saddam Abdulwahab Hatem A. Rashwan Najwa Sharaf Saif Khalid Domenec Puig Deep Monocular Depth Estimation Based on Content and Contextual Features Sensors deep learning monocular depth estimation autoencoder network contextual semantic information
title	Deep Monocular Depth Estimation Based on Content and Contextual Features
title_full	Deep Monocular Depth Estimation Based on Content and Contextual Features
title_fullStr	Deep Monocular Depth Estimation Based on Content and Contextual Features
title_full_unstemmed	Deep Monocular Depth Estimation Based on Content and Contextual Features
title_short	Deep Monocular Depth Estimation Based on Content and Contextual Features
title_sort	deep monocular depth estimation based on content and contextual features
topic	deep learning monocular depth estimation autoencoder network contextual semantic information
url	https://www.mdpi.com/1424-8220/23/6/2919
work_keys_str_mv	AT saddamabdulwahab deepmonoculardepthestimationbasedoncontentandcontextualfeatures AT hatemarashwan deepmonoculardepthestimationbasedoncontentandcontextualfeatures AT najwasharaf deepmonoculardepthestimationbasedoncontentandcontextualfeatures AT saifkhalid deepmonoculardepthestimationbasedoncontentandcontextualfeatures AT domenecpuig deepmonoculardepthestimationbasedoncontentandcontextualfeatures

Deep Monocular Depth Estimation Based on Content and Contextual Features

Similar Items