Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image

We address the problem of depth estimation from a single monocular image in the paper. Depth estimation from a single image is an ill-posed and inherently ambiguous problem. In the paper, we propose an encoder-decoder structure with the feature pyramid to predict the depth map from a single RGB imag...

Full description

Bibliographic Details
Main Authors: Mengxia Tang, Songnan Chen, Ruifang Dong, Jiangming Kan
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9340320/
_version_ 1818664393889546240
author Mengxia Tang
Songnan Chen
Ruifang Dong
Jiangming Kan
author_facet Mengxia Tang
Songnan Chen
Ruifang Dong
Jiangming Kan
author_sort Mengxia Tang
collection DOAJ
description We address the problem of depth estimation from a single monocular image in the paper. Depth estimation from a single image is an ill-posed and inherently ambiguous problem. In the paper, we propose an encoder-decoder structure with the feature pyramid to predict the depth map from a single RGB image. More specifically, the feature pyramid is used to detect objects of different scales in the image. The encoder structure aims to extract the most representative information from the original image through a series of convolution operations and to reduce the resolution of the input image. We adopt Res2-50 as the encoder to extract important features. The decoder section uses a novel upsampling structure to improve the output resolution. Then, we also propose a novel loss function that adds gradient loss and surface normal loss to the depth loss, which can predict not only the global depth but also the depth of fuzzy edges and small objects. Additionally, we use Adam as our optimization function to optimize our network and speed up convergence. Our extensive experimental evaluation proves the efficiency and effectiveness of the method, which is competitive with previous methods on the Make3D dataset and outperforms state-of-the-art methods on the NYU Depth v2 dataset.
first_indexed 2024-12-17T05:32:02Z
format Article
id doaj.art-1e8106baf5c946328a4fb1f0d3794bd4
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-17T05:32:02Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-1e8106baf5c946328a4fb1f0d3794bd42022-12-21T22:01:42ZengIEEEIEEE Access2169-35362021-01-019226402265010.1109/ACCESS.2021.30554979340320Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single ImageMengxia Tang0https://orcid.org/0000-0003-4504-7163Songnan Chen1https://orcid.org/0000-0003-0314-1194Ruifang Dong2https://orcid.org/0000-0001-7247-4131Jiangming Kan3https://orcid.org/0000-0002-7326-7078School of Technology, Beijing Forestry University, Beijing, ChinaSchool of Technology, Beijing Forestry University, Beijing, ChinaSchool of Technology, Beijing Forestry University, Beijing, ChinaSchool of Technology, Beijing Forestry University, Beijing, ChinaWe address the problem of depth estimation from a single monocular image in the paper. Depth estimation from a single image is an ill-posed and inherently ambiguous problem. In the paper, we propose an encoder-decoder structure with the feature pyramid to predict the depth map from a single RGB image. More specifically, the feature pyramid is used to detect objects of different scales in the image. The encoder structure aims to extract the most representative information from the original image through a series of convolution operations and to reduce the resolution of the input image. We adopt Res2-50 as the encoder to extract important features. The decoder section uses a novel upsampling structure to improve the output resolution. Then, we also propose a novel loss function that adds gradient loss and surface normal loss to the depth loss, which can predict not only the global depth but also the depth of fuzzy edges and small objects. Additionally, we use Adam as our optimization function to optimize our network and speed up convergence. Our extensive experimental evaluation proves the efficiency and effectiveness of the method, which is competitive with previous methods on the Make3D dataset and outperforms state-of-the-art methods on the NYU Depth v2 dataset.https://ieeexplore.ieee.org/document/9340320/Depth predictionencoder-decoderfeature pyramidsingle image
spellingShingle Mengxia Tang
Songnan Chen
Ruifang Dong
Jiangming Kan
Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image
IEEE Access
Depth prediction
encoder-decoder
feature pyramid
single image
title Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image
title_full Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image
title_fullStr Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image
title_full_unstemmed Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image
title_short Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image
title_sort encoder decoder structure with the feature pyramid for depth estimation from a single image
topic Depth prediction
encoder-decoder
feature pyramid
single image
url https://ieeexplore.ieee.org/document/9340320/
work_keys_str_mv AT mengxiatang encoderdecoderstructurewiththefeaturepyramidfordepthestimationfromasingleimage
AT songnanchen encoderdecoderstructurewiththefeaturepyramidfordepthestimationfromasingleimage
AT ruifangdong encoderdecoderstructurewiththefeaturepyramidfordepthestimationfromasingleimage
AT jiangmingkan encoderdecoderstructurewiththefeaturepyramidfordepthestimationfromasingleimage