Deep Monocular Depth Estimation Based on Content and Contextual Features
Recently, significant progress has been achieved in developing deep learning-based approaches for estimating depth maps from monocular images. However, many existing methods rely on content and structure information extracted from RGB photographs, which often results in inaccurate depth estimation,...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-03-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/23/6/2919 |
_version_ | 1797609149221044224 |
---|---|
author | Saddam Abdulwahab Hatem A. Rashwan Najwa Sharaf Saif Khalid Domenec Puig |
author_facet | Saddam Abdulwahab Hatem A. Rashwan Najwa Sharaf Saif Khalid Domenec Puig |
author_sort | Saddam Abdulwahab |
collection | DOAJ |
description | Recently, significant progress has been achieved in developing deep learning-based approaches for estimating depth maps from monocular images. However, many existing methods rely on content and structure information extracted from RGB photographs, which often results in inaccurate depth estimation, particularly for regions with low texture or occlusions. To overcome these limitations, we propose a novel method that exploits contextual semantic information to predict precise depth maps from monocular images. Our approach leverages a deep autoencoder network incorporating high-quality semantic features from the state-of-the-art HRNet-v2 semantic segmentation model. By feeding the autoencoder network with these features, our method can effectively preserve the discontinuities of the depth images and enhance monocular depth estimation. Specifically, we exploit the semantic features related to the localization and boundaries of the objects in the image to improve the accuracy and robustness of the depth estimation. To validate the effectiveness of our approach, we tested our model on two publicly available datasets, NYU Depth v2 and SUN RGB-D. Our method outperformed several state-of-the-art monocular depth estimation techniques, achieving an accuracy of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>85</mn><mo>%</mo></mrow></semantics></math></inline-formula>, while minimizing the error <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>R</mi><mi>e</mi><mi>l</mi></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.12</mn></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>R</mi><mi>M</mi><mi>S</mi></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.523</mn></mrow></semantics></math></inline-formula>, and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>10</mn></msub></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0527</mn></mrow></semantics></math></inline-formula>. Our approach also demonstrated exceptional performance in preserving object boundaries and faithfully detecting small object structures in the scene. |
first_indexed | 2024-03-11T05:56:30Z |
format | Article |
id | doaj.art-b30a2329672d4c058cdf84dad908f0e6 |
institution | Directory Open Access Journal |
issn | 1424-8220 |
language | English |
last_indexed | 2024-03-11T05:56:30Z |
publishDate | 2023-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj.art-b30a2329672d4c058cdf84dad908f0e62023-11-17T13:43:22ZengMDPI AGSensors1424-82202023-03-01236291910.3390/s23062919Deep Monocular Depth Estimation Based on Content and Contextual FeaturesSaddam Abdulwahab0Hatem A. Rashwan1Najwa Sharaf2Saif Khalid3Domenec Puig4Department of Computer Engineering and Mathematics, Universitat Rovira i Virgil, Campus Sescelades, Avinguda dels Paisos Catalans, 26, 43007 Tarragona, SpainDepartment of Computer Engineering and Mathematics, Universitat Rovira i Virgil, Campus Sescelades, Avinguda dels Paisos Catalans, 26, 43007 Tarragona, SpainDepartment of Computer Engineering and Mathematics, Universitat Rovira i Virgil, Campus Sescelades, Avinguda dels Paisos Catalans, 26, 43007 Tarragona, SpainDepartment of Computer Engineering and Mathematics, Universitat Rovira i Virgil, Campus Sescelades, Avinguda dels Paisos Catalans, 26, 43007 Tarragona, SpainDepartment of Computer Engineering and Mathematics, Universitat Rovira i Virgil, Campus Sescelades, Avinguda dels Paisos Catalans, 26, 43007 Tarragona, SpainRecently, significant progress has been achieved in developing deep learning-based approaches for estimating depth maps from monocular images. However, many existing methods rely on content and structure information extracted from RGB photographs, which often results in inaccurate depth estimation, particularly for regions with low texture or occlusions. To overcome these limitations, we propose a novel method that exploits contextual semantic information to predict precise depth maps from monocular images. Our approach leverages a deep autoencoder network incorporating high-quality semantic features from the state-of-the-art HRNet-v2 semantic segmentation model. By feeding the autoencoder network with these features, our method can effectively preserve the discontinuities of the depth images and enhance monocular depth estimation. Specifically, we exploit the semantic features related to the localization and boundaries of the objects in the image to improve the accuracy and robustness of the depth estimation. To validate the effectiveness of our approach, we tested our model on two publicly available datasets, NYU Depth v2 and SUN RGB-D. Our method outperformed several state-of-the-art monocular depth estimation techniques, achieving an accuracy of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>85</mn><mo>%</mo></mrow></semantics></math></inline-formula>, while minimizing the error <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>R</mi><mi>e</mi><mi>l</mi></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.12</mn></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>R</mi><mi>M</mi><mi>S</mi></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.523</mn></mrow></semantics></math></inline-formula>, and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>l</mi><mi>o</mi><msub><mi>g</mi><mn>10</mn></msub></mrow></semantics></math></inline-formula> by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0527</mn></mrow></semantics></math></inline-formula>. Our approach also demonstrated exceptional performance in preserving object boundaries and faithfully detecting small object structures in the scene.https://www.mdpi.com/1424-8220/23/6/2919deep learningmonocular depth estimationautoencoder networkcontextual semantic information |
spellingShingle | Saddam Abdulwahab Hatem A. Rashwan Najwa Sharaf Saif Khalid Domenec Puig Deep Monocular Depth Estimation Based on Content and Contextual Features Sensors deep learning monocular depth estimation autoencoder network contextual semantic information |
title | Deep Monocular Depth Estimation Based on Content and Contextual Features |
title_full | Deep Monocular Depth Estimation Based on Content and Contextual Features |
title_fullStr | Deep Monocular Depth Estimation Based on Content and Contextual Features |
title_full_unstemmed | Deep Monocular Depth Estimation Based on Content and Contextual Features |
title_short | Deep Monocular Depth Estimation Based on Content and Contextual Features |
title_sort | deep monocular depth estimation based on content and contextual features |
topic | deep learning monocular depth estimation autoencoder network contextual semantic information |
url | https://www.mdpi.com/1424-8220/23/6/2919 |
work_keys_str_mv | AT saddamabdulwahab deepmonoculardepthestimationbasedoncontentandcontextualfeatures AT hatemarashwan deepmonoculardepthestimationbasedoncontentandcontextualfeatures AT najwasharaf deepmonoculardepthestimationbasedoncontentandcontextualfeatures AT saifkhalid deepmonoculardepthestimationbasedoncontentandcontextualfeatures AT domenecpuig deepmonoculardepthestimationbasedoncontentandcontextualfeatures |