Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image

This paper proposes a novel 3D representation, namely, a latent 3D volume, for joint depth estimation and semantic segmentation. Most previous studies encoded an input scene (typically given as a 2D image) into a set of feature vectors arranged over a 2D plane. However, considering the real world is...

Full description

Bibliographic Details
Main Authors:	Seiya Ito, Naoshi Kaneko, Kazuhiko Sumi
Format:	Article
Language:	English
Published:	MDPI AG 2020-10-01
Series:	Sensors
Subjects:	multi-task learning latent 3D volume depth estimation semantic segmentation
Online Access:	https://www.mdpi.com/1424-8220/20/20/5765

_version_	1797551321282248704
author	Seiya Ito Naoshi Kaneko Kazuhiko Sumi
author_facet	Seiya Ito Naoshi Kaneko Kazuhiko Sumi
author_sort	Seiya Ito
collection	DOAJ
description	This paper proposes a novel 3D representation, namely, a latent 3D volume, for joint depth estimation and semantic segmentation. Most previous studies encoded an input scene (typically given as a 2D image) into a set of feature vectors arranged over a 2D plane. However, considering the real world is three-dimensional, this 2D arrangement reduces one dimension and may limit the capacity of feature representation. In contrast, we examine the idea of arranging the feature vectors in 3D space rather than in a 2D plane. We refer to this 3D volumetric arrangement as a latent 3D volume. We will show that the latent 3D volume is beneficial to the tasks of depth estimation and semantic segmentation because these tasks require an understanding of the 3D structure of the scene. Our network first constructs an initial 3D volume using image features and then generates latent 3D volume by passing the initial 3D volume through several 3D convolutional layers. We apply depth regression and semantic segmentation by projecting the latent 3D volume onto a 2D plane. The evaluation results show that our method outperforms previous approaches on the NYU Depth v2 dataset.
first_indexed	2024-03-10T15:43:01Z
format	Article
id	doaj.art-f9c0e02cc1e0413faf9ebdbc328bc0ec
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-10T15:43:01Z
publishDate	2020-10-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-f9c0e02cc1e0413faf9ebdbc328bc0ec2023-11-20T16:42:17ZengMDPI AGSensors1424-82202020-10-012020576510.3390/s20205765Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single ImageSeiya Ito0Naoshi Kaneko1Kazuhiko Sumi2Graduate School of Science and Engineering, Aoyama Gakuin University, 5-10-1 Fuchinobe, Chuo-ku, Sagamihara, Kanagawa 252-5258, JapanDepartment of Integrated Information Technology, Aoyama Gakuin University, 5-10-1 Fuchinobe, Chuo-ku, Sagamihara, Kanagawa 252-5258, JapanDepartment of Integrated Information Technology, Aoyama Gakuin University, 5-10-1 Fuchinobe, Chuo-ku, Sagamihara, Kanagawa 252-5258, JapanThis paper proposes a novel 3D representation, namely, a latent 3D volume, for joint depth estimation and semantic segmentation. Most previous studies encoded an input scene (typically given as a 2D image) into a set of feature vectors arranged over a 2D plane. However, considering the real world is three-dimensional, this 2D arrangement reduces one dimension and may limit the capacity of feature representation. In contrast, we examine the idea of arranging the feature vectors in 3D space rather than in a 2D plane. We refer to this 3D volumetric arrangement as a latent 3D volume. We will show that the latent 3D volume is beneficial to the tasks of depth estimation and semantic segmentation because these tasks require an understanding of the 3D structure of the scene. Our network first constructs an initial 3D volume using image features and then generates latent 3D volume by passing the initial 3D volume through several 3D convolutional layers. We apply depth regression and semantic segmentation by projecting the latent 3D volume onto a 2D plane. The evaluation results show that our method outperforms previous approaches on the NYU Depth v2 dataset.https://www.mdpi.com/1424-8220/20/20/5765multi-task learninglatent 3D volumedepth estimationsemantic segmentation
spellingShingle	Seiya Ito Naoshi Kaneko Kazuhiko Sumi Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image Sensors multi-task learning latent 3D volume depth estimation semantic segmentation
title	Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image
title_full	Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image
title_fullStr	Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image
title_full_unstemmed	Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image
title_short	Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image
title_sort	latent 3d volume for joint depth estimation and semantic segmentation from a single image
topic	multi-task learning latent 3D volume depth estimation semantic segmentation
url	https://www.mdpi.com/1424-8220/20/20/5765
work_keys_str_mv	AT seiyaito latent3dvolumeforjointdepthestimationandsemanticsegmentationfromasingleimage AT naoshikaneko latent3dvolumeforjointdepthestimationandsemanticsegmentationfromasingleimage AT kazuhikosumi latent3dvolumeforjointdepthestimationandsemanticsegmentationfromasingleimage

Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image

Similar Items