Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle

To find an economical solution to infer the depth of the surrounding environment of unmanned agricultural vehicles (UAV), a lightweight depth estimation model called MonoDA based on a convolutional neural network is proposed. A series of sequential frames from monocular videos are used to train the...

Full description

Bibliographic Details
Main Authors:	Xue-Zhi Cui, Quan Feng, Shu-Zhi Wang, Jian-Hua Zhang
Format:	Article
Language:	English
Published:	MDPI AG 2022-01-01
Series:	Sensors
Subjects:	edge computing device monocular depth estimation self-supervised learning vineyard scene
Online Access:	https://www.mdpi.com/1424-8220/22/3/721

_version_	1827658853717639168
author	Xue-Zhi Cui Quan Feng Shu-Zhi Wang Jian-Hua Zhang
author_facet	Xue-Zhi Cui Quan Feng Shu-Zhi Wang Jian-Hua Zhang
author_sort	Xue-Zhi Cui
collection	DOAJ
description	To find an economical solution to infer the depth of the surrounding environment of unmanned agricultural vehicles (UAV), a lightweight depth estimation model called MonoDA based on a convolutional neural network is proposed. A series of sequential frames from monocular videos are used to train the model. The model is composed of two subnetworks—the depth estimation subnetwork and the pose estimation subnetwork. The former is a modified version of U-Net that reduces the number of bridges, while the latter takes EfficientNet-B0 as its backbone network to extract the features of sequential frames and predict the pose transformation relations between the frames. The self-supervised strategy is adopted during the training, which means the depth information labels of frames are not needed. Instead, the adjacent frames in the image sequence and the reprojection relation of the pose are used to train the model. Subnetworks’ outputs (depth map and pose relation) are used to reconstruct the input frame, then a self-supervised loss between the reconstructed input and the original input is calculated. Finally, the loss is employed to update the parameters of the two subnetworks through the backward pass. Several experiments are conducted to evaluate the model’s performance, and the results show that MonoDA has competitive accuracy over the KITTI raw dataset as well as our vineyard dataset. Besides, our method also possessed the advantage of non-sensitivity to color. On the computing platform of our UAV’s environment perceptual system NVIDIA JETSON TX2, the model could run at 18.92 FPS. To sum up, our approach provides an economical solution for depth estimation by using monocular cameras, which achieves a good trade-off between accuracy and speed and can be used as a novel auxiliary depth detection paradigm for UAVs.
first_indexed	2024-03-09T23:11:25Z
format	Article
id	doaj.art-18845d06c7ed4ebfb315bbf403b18a26
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-09T23:11:25Z
publishDate	2022-01-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-18845d06c7ed4ebfb315bbf403b18a262023-11-23T17:44:07ZengMDPI AGSensors1424-82202022-01-0122372110.3390/s22030721Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural VehicleXue-Zhi Cui0Quan Feng1Shu-Zhi Wang2Jian-Hua Zhang3School of Mechanical and Electrical Engineering, Gansu Agriculture University, Lanzhou 730070, ChinaSchool of Mechanical and Electrical Engineering, Gansu Agriculture University, Lanzhou 730070, ChinaCollege of Electrical Engineering, Northwest University for Nationalities, Lanzhou 730030, ChinaAgricultural Information Institute of CAAS, Beijing 100081, ChinaTo find an economical solution to infer the depth of the surrounding environment of unmanned agricultural vehicles (UAV), a lightweight depth estimation model called MonoDA based on a convolutional neural network is proposed. A series of sequential frames from monocular videos are used to train the model. The model is composed of two subnetworks—the depth estimation subnetwork and the pose estimation subnetwork. The former is a modified version of U-Net that reduces the number of bridges, while the latter takes EfficientNet-B0 as its backbone network to extract the features of sequential frames and predict the pose transformation relations between the frames. The self-supervised strategy is adopted during the training, which means the depth information labels of frames are not needed. Instead, the adjacent frames in the image sequence and the reprojection relation of the pose are used to train the model. Subnetworks’ outputs (depth map and pose relation) are used to reconstruct the input frame, then a self-supervised loss between the reconstructed input and the original input is calculated. Finally, the loss is employed to update the parameters of the two subnetworks through the backward pass. Several experiments are conducted to evaluate the model’s performance, and the results show that MonoDA has competitive accuracy over the KITTI raw dataset as well as our vineyard dataset. Besides, our method also possessed the advantage of non-sensitivity to color. On the computing platform of our UAV’s environment perceptual system NVIDIA JETSON TX2, the model could run at 18.92 FPS. To sum up, our approach provides an economical solution for depth estimation by using monocular cameras, which achieves a good trade-off between accuracy and speed and can be used as a novel auxiliary depth detection paradigm for UAVs.https://www.mdpi.com/1424-8220/22/3/721edge computing devicemonocular depth estimationself-supervised learningvineyard scene
spellingShingle	Xue-Zhi Cui Quan Feng Shu-Zhi Wang Jian-Hua Zhang Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle Sensors edge computing device monocular depth estimation self-supervised learning vineyard scene
title	Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle
title_full	Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle
title_fullStr	Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle
title_full_unstemmed	Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle
title_short	Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle
title_sort	monocular depth estimation with self supervised learning for vineyard unmanned agricultural vehicle
topic	edge computing device monocular depth estimation self-supervised learning vineyard scene
url	https://www.mdpi.com/1424-8220/22/3/721
work_keys_str_mv	AT xuezhicui monoculardepthestimationwithselfsupervisedlearningforvineyardunmannedagriculturalvehicle AT quanfeng monoculardepthestimationwithselfsupervisedlearningforvineyardunmannedagriculturalvehicle AT shuzhiwang monoculardepthestimationwithselfsupervisedlearningforvineyardunmannedagriculturalvehicle AT jianhuazhang monoculardepthestimationwithselfsupervisedlearningforvineyardunmannedagriculturalvehicle

Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle

Similar Items