Deep Monocular Depth Estimation Based on Content and Contextual Features

Recently, significant progress has been achieved in developing deep learning-based approaches for estimating depth maps from monocular images. However, many existing methods rely on content and structure information extracted from RGB photographs, which often results in inaccurate depth estimation,...

Full description

Bibliographic Details
Main Authors: Saddam Abdulwahab, Hatem A. Rashwan, Najwa Sharaf, Saif Khalid, Domenec Puig
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/23/6/2919
_version_ 1797609149221044224
author Saddam Abdulwahab
Hatem A. Rashwan
Najwa Sharaf
Saif Khalid
Domenec Puig
author_facet Saddam Abdulwahab
Hatem A. Rashwan
Najwa Sharaf
Saif Khalid
Domenec Puig
author_sort Saddam Abdulwahab
collection DOAJ
description Recently, significant progress has been achieved in developing deep learning-based approaches for estimating depth maps from monocular images. However, many existing methods rely on content and structure information extracted from RGB photographs, which often results in inaccurate depth estimation, particularly for regions with low texture or occlusions. To overcome these limitations, we propose a novel method that exploits contextual semantic information to predict precise depth maps from monocular images. Our approach leverages a deep autoencoder network incorporating high-quality semantic features from the state-of-the-art HRNet-v2 semantic segmentation model. By feeding the autoencoder network with these features, our method can effectively preserve the discontinuities of the depth images and enhance monocular depth estimation. Specifically, we exploit the semantic features related to the localization and boundaries of the objects in the image to improve the accuracy and robustness of the depth estimation. To validate the effectiveness of our approach, we tested our model on two publicly available datasets, NYU Depth v2 and SUN RGB-D. Our method outperformed several state-of-the-art monocular depth estimation techniques, achieving an accuracy of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>85</mn><mo>%</mo></mrow></semantics></math></inline-formula>, while minimizing the error <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>R</mi><mi>e</mi><mi>l</mi></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.12</mn></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>R</mi><mi>M</mi><mi>S</mi></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.523</mn></mrow></semantics></math></inline-formula>, and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>10</mn></msub></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0527</mn></mrow></semantics></math></inline-formula>. Our approach also demonstrated exceptional performance in preserving object boundaries and faithfully detecting small object structures in the scene.
first_indexed 2024-03-11T05:56:30Z
format Article
id doaj.art-b30a2329672d4c058cdf84dad908f0e6
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-11T05:56:30Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-b30a2329672d4c058cdf84dad908f0e62023-11-17T13:43:22ZengMDPI AGSensors1424-82202023-03-01236291910.3390/s23062919Deep Monocular Depth Estimation Based on Content and Contextual FeaturesSaddam Abdulwahab0Hatem A. Rashwan1Najwa Sharaf2Saif Khalid3Domenec Puig4Department of Computer Engineering and Mathematics, Universitat Rovira i Virgil, Campus Sescelades, Avinguda dels Paisos Catalans, 26, 43007 Tarragona, SpainDepartment of Computer Engineering and Mathematics, Universitat Rovira i Virgil, Campus Sescelades, Avinguda dels Paisos Catalans, 26, 43007 Tarragona, SpainDepartment of Computer Engineering and Mathematics, Universitat Rovira i Virgil, Campus Sescelades, Avinguda dels Paisos Catalans, 26, 43007 Tarragona, SpainDepartment of Computer Engineering and Mathematics, Universitat Rovira i Virgil, Campus Sescelades, Avinguda dels Paisos Catalans, 26, 43007 Tarragona, SpainDepartment of Computer Engineering and Mathematics, Universitat Rovira i Virgil, Campus Sescelades, Avinguda dels Paisos Catalans, 26, 43007 Tarragona, SpainRecently, significant progress has been achieved in developing deep learning-based approaches for estimating depth maps from monocular images. However, many existing methods rely on content and structure information extracted from RGB photographs, which often results in inaccurate depth estimation, particularly for regions with low texture or occlusions. To overcome these limitations, we propose a novel method that exploits contextual semantic information to predict precise depth maps from monocular images. Our approach leverages a deep autoencoder network incorporating high-quality semantic features from the state-of-the-art HRNet-v2 semantic segmentation model. By feeding the autoencoder network with these features, our method can effectively preserve the discontinuities of the depth images and enhance monocular depth estimation. Specifically, we exploit the semantic features related to the localization and boundaries of the objects in the image to improve the accuracy and robustness of the depth estimation. To validate the effectiveness of our approach, we tested our model on two publicly available datasets, NYU Depth v2 and SUN RGB-D. Our method outperformed several state-of-the-art monocular depth estimation techniques, achieving an accuracy of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>85</mn><mo>%</mo></mrow></semantics></math></inline-formula>, while minimizing the error <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>R</mi><mi>e</mi><mi>l</mi></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.12</mn></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>R</mi><mi>M</mi><mi>S</mi></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.523</mn></mrow></semantics></math></inline-formula>, and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>10</mn></msub></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0527</mn></mrow></semantics></math></inline-formula>. Our approach also demonstrated exceptional performance in preserving object boundaries and faithfully detecting small object structures in the scene.https://www.mdpi.com/1424-8220/23/6/2919deep learningmonocular depth estimationautoencoder networkcontextual semantic information
spellingShingle Saddam Abdulwahab
Hatem A. Rashwan
Najwa Sharaf
Saif Khalid
Domenec Puig
Deep Monocular Depth Estimation Based on Content and Contextual Features
Sensors
deep learning
monocular depth estimation
autoencoder network
contextual semantic information
title Deep Monocular Depth Estimation Based on Content and Contextual Features
title_full Deep Monocular Depth Estimation Based on Content and Contextual Features
title_fullStr Deep Monocular Depth Estimation Based on Content and Contextual Features
title_full_unstemmed Deep Monocular Depth Estimation Based on Content and Contextual Features
title_short Deep Monocular Depth Estimation Based on Content and Contextual Features
title_sort deep monocular depth estimation based on content and contextual features
topic deep learning
monocular depth estimation
autoencoder network
contextual semantic information
url https://www.mdpi.com/1424-8220/23/6/2919
work_keys_str_mv AT saddamabdulwahab deepmonoculardepthestimationbasedoncontentandcontextualfeatures
AT hatemarashwan deepmonoculardepthestimationbasedoncontentandcontextualfeatures
AT najwasharaf deepmonoculardepthestimationbasedoncontentandcontextualfeatures
AT saifkhalid deepmonoculardepthestimationbasedoncontentandcontextualfeatures
AT domenecpuig deepmonoculardepthestimationbasedoncontentandcontextualfeatures