Self‐supervised monocular depth estimation via asymmetric convolution block

Abstract Without the dependence of depth ground truth, self‐supervised learning is a promising alternative to train monocular depth estimation. It builds its own supervision signal with the help of other tools, such as view synthesis and pose networks. However, more training parameters and time cons...

Full description

Bibliographic Details
Main Authors: Lingling Hu, Hao Zhang, Zhuping Wang, Chao Huang, Changzhu Zhang
Format: Article
Language:English
Published: Wiley 2022-06-01
Series:IET Cyber-systems and Robotics
Subjects:
Online Access:https://doi.org/10.1049/csy2.12051
_version_ 1811344553928556544
author Lingling Hu
Hao Zhang
Zhuping Wang
Chao Huang
Changzhu Zhang
author_facet Lingling Hu
Hao Zhang
Zhuping Wang
Chao Huang
Changzhu Zhang
author_sort Lingling Hu
collection DOAJ
description Abstract Without the dependence of depth ground truth, self‐supervised learning is a promising alternative to train monocular depth estimation. It builds its own supervision signal with the help of other tools, such as view synthesis and pose networks. However, more training parameters and time consumption may be involved. This paper proposes a monocular depth prediction framework that can jointly learn the depth value and pose transformation between images in an end‐to‐end manner. The depth network creatively employs an asymmetric convolution block instead of every square kernel layer to strengthen the learning ability of extracting image features when training. During inference time, the asymmetric kernels are fused and converted to the original network to predict more accurate image depth, thus bringing no extra computations anymore. The network is trained and tested on the KITTI monocular dataset. The evaluated results demonstrate that the depth model outperforms some State of the Arts (SOTA) approaches and can reduce the inference time of depth prediction. Additionally, the proposed model performs great adaptability on the Make3D dataset.
first_indexed 2024-04-13T19:49:02Z
format Article
id doaj.art-3e0f67bfef7e447e8f86f827004dd99e
institution Directory Open Access Journal
issn 2631-6315
language English
last_indexed 2024-04-13T19:49:02Z
publishDate 2022-06-01
publisher Wiley
record_format Article
series IET Cyber-systems and Robotics
spelling doaj.art-3e0f67bfef7e447e8f86f827004dd99e2022-12-22T02:32:36ZengWileyIET Cyber-systems and Robotics2631-63152022-06-014213113810.1049/csy2.12051Self‐supervised monocular depth estimation via asymmetric convolution blockLingling Hu0Hao Zhang1Zhuping Wang2Chao Huang3Changzhu Zhang4Department of Control Science and Engineering Tongji University Shanghai ChinaDepartment of Control Science and Engineering Tongji University Shanghai ChinaDepartment of Control Science and Engineering Tongji University Shanghai ChinaDepartment of Control Science and Engineering Tongji University Shanghai ChinaDepartment of Control Science and Engineering Tongji University Shanghai ChinaAbstract Without the dependence of depth ground truth, self‐supervised learning is a promising alternative to train monocular depth estimation. It builds its own supervision signal with the help of other tools, such as view synthesis and pose networks. However, more training parameters and time consumption may be involved. This paper proposes a monocular depth prediction framework that can jointly learn the depth value and pose transformation between images in an end‐to‐end manner. The depth network creatively employs an asymmetric convolution block instead of every square kernel layer to strengthen the learning ability of extracting image features when training. During inference time, the asymmetric kernels are fused and converted to the original network to predict more accurate image depth, thus bringing no extra computations anymore. The network is trained and tested on the KITTI monocular dataset. The evaluated results demonstrate that the depth model outperforms some State of the Arts (SOTA) approaches and can reduce the inference time of depth prediction. Additionally, the proposed model performs great adaptability on the Make3D dataset.https://doi.org/10.1049/csy2.12051asymmetric convolution block (ACB)KITTI datasetself‐supervised depth estimation
spellingShingle Lingling Hu
Hao Zhang
Zhuping Wang
Chao Huang
Changzhu Zhang
Self‐supervised monocular depth estimation via asymmetric convolution block
IET Cyber-systems and Robotics
asymmetric convolution block (ACB)
KITTI dataset
self‐supervised depth estimation
title Self‐supervised monocular depth estimation via asymmetric convolution block
title_full Self‐supervised monocular depth estimation via asymmetric convolution block
title_fullStr Self‐supervised monocular depth estimation via asymmetric convolution block
title_full_unstemmed Self‐supervised monocular depth estimation via asymmetric convolution block
title_short Self‐supervised monocular depth estimation via asymmetric convolution block
title_sort self supervised monocular depth estimation via asymmetric convolution block
topic asymmetric convolution block (ACB)
KITTI dataset
self‐supervised depth estimation
url https://doi.org/10.1049/csy2.12051
work_keys_str_mv AT linglinghu selfsupervisedmonoculardepthestimationviaasymmetricconvolutionblock
AT haozhang selfsupervisedmonoculardepthestimationviaasymmetricconvolutionblock
AT zhupingwang selfsupervisedmonoculardepthestimationviaasymmetricconvolutionblock
AT chaohuang selfsupervisedmonoculardepthestimationviaasymmetricconvolutionblock
AT changzhuzhang selfsupervisedmonoculardepthestimationviaasymmetricconvolutionblock