Switchable-Encoder-Based Self-Supervised Learning Framework for Monocular Depth and Pose Estimation

Monocular depth prediction research is essential for expanding meaning from 2D to 3D. Recent studies have focused on the application of a newly proposed encoder; however, the development within the self-supervised learning framework remains unexplored, an aspect critical for advancing foundational m...

Full description

Bibliographic Details
Main Authors:	Junoh Kim, Rui Gao, Jisun Park, Jinsoo Yoon, Kyungeun Cho
Format:	Article
Language:	English
Published:	MDPI AG 2023-12-01
Series:	Remote Sensing
Subjects:	structure from motion self-supervised learning monocular depth estimation
Online Access:	https://www.mdpi.com/2072-4292/15/24/5739

_version_	1797379509740109824
author	Junoh Kim Rui Gao Jisun Park Jinsoo Yoon Kyungeun Cho
author_facet	Junoh Kim Rui Gao Jisun Park Jinsoo Yoon Kyungeun Cho
author_sort	Junoh Kim
collection	DOAJ
description	Monocular depth prediction research is essential for expanding meaning from 2D to 3D. Recent studies have focused on the application of a newly proposed encoder; however, the development within the self-supervised learning framework remains unexplored, an aspect critical for advancing foundational models of 3D semantic interpretation. Addressing the dynamic nature of encoder-based research, especially in performance evaluations for feature extraction and pre-trained models, this research proposes the switchable encoder learning framework (SELF). SELF enhances versatility by enabling the seamless integration of diverse encoders in a self-supervised learning context for depth prediction. This integration is realized through the direct transfer of feature information from the encoder and by standardizing the input structure of the decoder to accommodate various encoder architectures. Furthermore, the framework is extended and incorporated into an adaptable decoder for depth prediction and camera pose learning, employing standard loss functions. Comparative experiments with previous frameworks using the same encoder reveal that SELF achieves a 7% reduction in parameters while enhancing performance. Remarkably, substituting newly proposed algorithms in place of an encoder improves the outcomes as well as significantly decreases the number of parameters by 23%. The experimental findings highlight the ability of SELF to broaden depth factors, such as depth consistency. This framework facilitates the objective selection of algorithms as a backbone for extended research in monocular depth prediction.
first_indexed	2024-03-08T20:24:18Z
format	Article
id	doaj.art-9d44aa917a794da692037e23f90755d5
institution	Directory Open Access Journal
issn	2072-4292
language	English
last_indexed	2024-03-08T20:24:18Z
publishDate	2023-12-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj.art-9d44aa917a794da692037e23f90755d52023-12-22T14:39:12ZengMDPI AGRemote Sensing2072-42922023-12-011524573910.3390/rs15245739Switchable-Encoder-Based Self-Supervised Learning Framework for Monocular Depth and Pose EstimationJunoh Kim0Rui Gao1Jisun Park2Jinsoo Yoon3Kyungeun Cho4Department of Multimedia Engineering, Dongguk University-Seoul, 30, Pildongro-1-gil, Jung-gu, Seoul 04620, Republic of KoreaDepartment of Multimedia Engineering, Dongguk University-Seoul, 30, Pildongro-1-gil, Jung-gu, Seoul 04620, Republic of KoreaDepartment of Multimedia Engineering, Dongguk University-Seoul, 30, Pildongro-1-gil, Jung-gu, Seoul 04620, Republic of KoreaAutonomous Driving Research Department, KoROAD (Korea Road Traffic Authority) 2, Hyeoksin-ro, Wonu-si, Gangwon-do 26466, Republic of KoreaDivision of AI Software Convergence, Dongguk University-Seoul, 30, Pildongro-1-gil, Jung-gu, Seoul 04620, Republic of KoreaMonocular depth prediction research is essential for expanding meaning from 2D to 3D. Recent studies have focused on the application of a newly proposed encoder; however, the development within the self-supervised learning framework remains unexplored, an aspect critical for advancing foundational models of 3D semantic interpretation. Addressing the dynamic nature of encoder-based research, especially in performance evaluations for feature extraction and pre-trained models, this research proposes the switchable encoder learning framework (SELF). SELF enhances versatility by enabling the seamless integration of diverse encoders in a self-supervised learning context for depth prediction. This integration is realized through the direct transfer of feature information from the encoder and by standardizing the input structure of the decoder to accommodate various encoder architectures. Furthermore, the framework is extended and incorporated into an adaptable decoder for depth prediction and camera pose learning, employing standard loss functions. Comparative experiments with previous frameworks using the same encoder reveal that SELF achieves a 7% reduction in parameters while enhancing performance. Remarkably, substituting newly proposed algorithms in place of an encoder improves the outcomes as well as significantly decreases the number of parameters by 23%. The experimental findings highlight the ability of SELF to broaden depth factors, such as depth consistency. This framework facilitates the objective selection of algorithms as a backbone for extended research in monocular depth prediction.https://www.mdpi.com/2072-4292/15/24/5739structure from motionself-supervised learningmonocular depth estimation
spellingShingle	Junoh Kim Rui Gao Jisun Park Jinsoo Yoon Kyungeun Cho Switchable-Encoder-Based Self-Supervised Learning Framework for Monocular Depth and Pose Estimation Remote Sensing structure from motion self-supervised learning monocular depth estimation
title	Switchable-Encoder-Based Self-Supervised Learning Framework for Monocular Depth and Pose Estimation
title_full	Switchable-Encoder-Based Self-Supervised Learning Framework for Monocular Depth and Pose Estimation
title_fullStr	Switchable-Encoder-Based Self-Supervised Learning Framework for Monocular Depth and Pose Estimation
title_full_unstemmed	Switchable-Encoder-Based Self-Supervised Learning Framework for Monocular Depth and Pose Estimation
title_short	Switchable-Encoder-Based Self-Supervised Learning Framework for Monocular Depth and Pose Estimation
title_sort	switchable encoder based self supervised learning framework for monocular depth and pose estimation
topic	structure from motion self-supervised learning monocular depth estimation
url	https://www.mdpi.com/2072-4292/15/24/5739
work_keys_str_mv	AT junohkim switchableencoderbasedselfsupervisedlearningframeworkformonoculardepthandposeestimation AT ruigao switchableencoderbasedselfsupervisedlearningframeworkformonoculardepthandposeestimation AT jisunpark switchableencoderbasedselfsupervisedlearningframeworkformonoculardepthandposeestimation AT jinsooyoon switchableencoderbasedselfsupervisedlearningframeworkformonoculardepthandposeestimation AT kyungeuncho switchableencoderbasedselfsupervisedlearningframeworkformonoculardepthandposeestimation

Switchable-Encoder-Based Self-Supervised Learning Framework for Monocular Depth and Pose Estimation

Similar Items