Self‐supervised monocular depth estimation via asymmetric convolution block
Abstract Without the dependence of depth ground truth, self‐supervised learning is a promising alternative to train monocular depth estimation. It builds its own supervision signal with the help of other tools, such as view synthesis and pose networks. However, more training parameters and time cons...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2022-06-01
|
Series: | IET Cyber-systems and Robotics |
Subjects: | |
Online Access: | https://doi.org/10.1049/csy2.12051 |
_version_ | 1811344553928556544 |
---|---|
author | Lingling Hu Hao Zhang Zhuping Wang Chao Huang Changzhu Zhang |
author_facet | Lingling Hu Hao Zhang Zhuping Wang Chao Huang Changzhu Zhang |
author_sort | Lingling Hu |
collection | DOAJ |
description | Abstract Without the dependence of depth ground truth, self‐supervised learning is a promising alternative to train monocular depth estimation. It builds its own supervision signal with the help of other tools, such as view synthesis and pose networks. However, more training parameters and time consumption may be involved. This paper proposes a monocular depth prediction framework that can jointly learn the depth value and pose transformation between images in an end‐to‐end manner. The depth network creatively employs an asymmetric convolution block instead of every square kernel layer to strengthen the learning ability of extracting image features when training. During inference time, the asymmetric kernels are fused and converted to the original network to predict more accurate image depth, thus bringing no extra computations anymore. The network is trained and tested on the KITTI monocular dataset. The evaluated results demonstrate that the depth model outperforms some State of the Arts (SOTA) approaches and can reduce the inference time of depth prediction. Additionally, the proposed model performs great adaptability on the Make3D dataset. |
first_indexed | 2024-04-13T19:49:02Z |
format | Article |
id | doaj.art-3e0f67bfef7e447e8f86f827004dd99e |
institution | Directory Open Access Journal |
issn | 2631-6315 |
language | English |
last_indexed | 2024-04-13T19:49:02Z |
publishDate | 2022-06-01 |
publisher | Wiley |
record_format | Article |
series | IET Cyber-systems and Robotics |
spelling | doaj.art-3e0f67bfef7e447e8f86f827004dd99e2022-12-22T02:32:36ZengWileyIET Cyber-systems and Robotics2631-63152022-06-014213113810.1049/csy2.12051Self‐supervised monocular depth estimation via asymmetric convolution blockLingling Hu0Hao Zhang1Zhuping Wang2Chao Huang3Changzhu Zhang4Department of Control Science and Engineering Tongji University Shanghai ChinaDepartment of Control Science and Engineering Tongji University Shanghai ChinaDepartment of Control Science and Engineering Tongji University Shanghai ChinaDepartment of Control Science and Engineering Tongji University Shanghai ChinaDepartment of Control Science and Engineering Tongji University Shanghai ChinaAbstract Without the dependence of depth ground truth, self‐supervised learning is a promising alternative to train monocular depth estimation. It builds its own supervision signal with the help of other tools, such as view synthesis and pose networks. However, more training parameters and time consumption may be involved. This paper proposes a monocular depth prediction framework that can jointly learn the depth value and pose transformation between images in an end‐to‐end manner. The depth network creatively employs an asymmetric convolution block instead of every square kernel layer to strengthen the learning ability of extracting image features when training. During inference time, the asymmetric kernels are fused and converted to the original network to predict more accurate image depth, thus bringing no extra computations anymore. The network is trained and tested on the KITTI monocular dataset. The evaluated results demonstrate that the depth model outperforms some State of the Arts (SOTA) approaches and can reduce the inference time of depth prediction. Additionally, the proposed model performs great adaptability on the Make3D dataset.https://doi.org/10.1049/csy2.12051asymmetric convolution block (ACB)KITTI datasetself‐supervised depth estimation |
spellingShingle | Lingling Hu Hao Zhang Zhuping Wang Chao Huang Changzhu Zhang Self‐supervised monocular depth estimation via asymmetric convolution block IET Cyber-systems and Robotics asymmetric convolution block (ACB) KITTI dataset self‐supervised depth estimation |
title | Self‐supervised monocular depth estimation via asymmetric convolution block |
title_full | Self‐supervised monocular depth estimation via asymmetric convolution block |
title_fullStr | Self‐supervised monocular depth estimation via asymmetric convolution block |
title_full_unstemmed | Self‐supervised monocular depth estimation via asymmetric convolution block |
title_short | Self‐supervised monocular depth estimation via asymmetric convolution block |
title_sort | self supervised monocular depth estimation via asymmetric convolution block |
topic | asymmetric convolution block (ACB) KITTI dataset self‐supervised depth estimation |
url | https://doi.org/10.1049/csy2.12051 |
work_keys_str_mv | AT linglinghu selfsupervisedmonoculardepthestimationviaasymmetricconvolutionblock AT haozhang selfsupervisedmonoculardepthestimationviaasymmetricconvolutionblock AT zhupingwang selfsupervisedmonoculardepthestimationviaasymmetricconvolutionblock AT chaohuang selfsupervisedmonoculardepthestimationviaasymmetricconvolutionblock AT changzhuzhang selfsupervisedmonoculardepthestimationviaasymmetricconvolutionblock |