Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image
We address the problem of depth estimation from a single monocular image in the paper. Depth estimation from a single image is an ill-posed and inherently ambiguous problem. In the paper, we propose an encoder-decoder structure with the feature pyramid to predict the depth map from a single RGB imag...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9340320/ |
_version_ | 1818664393889546240 |
---|---|
author | Mengxia Tang Songnan Chen Ruifang Dong Jiangming Kan |
author_facet | Mengxia Tang Songnan Chen Ruifang Dong Jiangming Kan |
author_sort | Mengxia Tang |
collection | DOAJ |
description | We address the problem of depth estimation from a single monocular image in the paper. Depth estimation from a single image is an ill-posed and inherently ambiguous problem. In the paper, we propose an encoder-decoder structure with the feature pyramid to predict the depth map from a single RGB image. More specifically, the feature pyramid is used to detect objects of different scales in the image. The encoder structure aims to extract the most representative information from the original image through a series of convolution operations and to reduce the resolution of the input image. We adopt Res2-50 as the encoder to extract important features. The decoder section uses a novel upsampling structure to improve the output resolution. Then, we also propose a novel loss function that adds gradient loss and surface normal loss to the depth loss, which can predict not only the global depth but also the depth of fuzzy edges and small objects. Additionally, we use Adam as our optimization function to optimize our network and speed up convergence. Our extensive experimental evaluation proves the efficiency and effectiveness of the method, which is competitive with previous methods on the Make3D dataset and outperforms state-of-the-art methods on the NYU Depth v2 dataset. |
first_indexed | 2024-12-17T05:32:02Z |
format | Article |
id | doaj.art-1e8106baf5c946328a4fb1f0d3794bd4 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-17T05:32:02Z |
publishDate | 2021-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-1e8106baf5c946328a4fb1f0d3794bd42022-12-21T22:01:42ZengIEEEIEEE Access2169-35362021-01-019226402265010.1109/ACCESS.2021.30554979340320Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single ImageMengxia Tang0https://orcid.org/0000-0003-4504-7163Songnan Chen1https://orcid.org/0000-0003-0314-1194Ruifang Dong2https://orcid.org/0000-0001-7247-4131Jiangming Kan3https://orcid.org/0000-0002-7326-7078School of Technology, Beijing Forestry University, Beijing, ChinaSchool of Technology, Beijing Forestry University, Beijing, ChinaSchool of Technology, Beijing Forestry University, Beijing, ChinaSchool of Technology, Beijing Forestry University, Beijing, ChinaWe address the problem of depth estimation from a single monocular image in the paper. Depth estimation from a single image is an ill-posed and inherently ambiguous problem. In the paper, we propose an encoder-decoder structure with the feature pyramid to predict the depth map from a single RGB image. More specifically, the feature pyramid is used to detect objects of different scales in the image. The encoder structure aims to extract the most representative information from the original image through a series of convolution operations and to reduce the resolution of the input image. We adopt Res2-50 as the encoder to extract important features. The decoder section uses a novel upsampling structure to improve the output resolution. Then, we also propose a novel loss function that adds gradient loss and surface normal loss to the depth loss, which can predict not only the global depth but also the depth of fuzzy edges and small objects. Additionally, we use Adam as our optimization function to optimize our network and speed up convergence. Our extensive experimental evaluation proves the efficiency and effectiveness of the method, which is competitive with previous methods on the Make3D dataset and outperforms state-of-the-art methods on the NYU Depth v2 dataset.https://ieeexplore.ieee.org/document/9340320/Depth predictionencoder-decoderfeature pyramidsingle image |
spellingShingle | Mengxia Tang Songnan Chen Ruifang Dong Jiangming Kan Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image IEEE Access Depth prediction encoder-decoder feature pyramid single image |
title | Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image |
title_full | Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image |
title_fullStr | Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image |
title_full_unstemmed | Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image |
title_short | Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image |
title_sort | encoder decoder structure with the feature pyramid for depth estimation from a single image |
topic | Depth prediction encoder-decoder feature pyramid single image |
url | https://ieeexplore.ieee.org/document/9340320/ |
work_keys_str_mv | AT mengxiatang encoderdecoderstructurewiththefeaturepyramidfordepthestimationfromasingleimage AT songnanchen encoderdecoderstructurewiththefeaturepyramidfordepthestimationfromasingleimage AT ruifangdong encoderdecoderstructurewiththefeaturepyramidfordepthestimationfromasingleimage AT jiangmingkan encoderdecoderstructurewiththefeaturepyramidfordepthestimationfromasingleimage |