Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera Sensor

We propose a completely unsupervised approach to simultaneously estimate scene depth, ego-pose, ground segmentation and ground normal vector from only monocular RGB video sequences. In our approach, estimation for different scene structures can mutually benefit each other by the joint optimization....

Full description

Bibliographic Details
Main Authors: Lu Xiong, Yongkun Wen, Yuyao Huang, Junqiao Zhao, Wei Tian
Format: Article
Language:English
Published: MDPI AG 2020-07-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/20/13/3737
_version_ 1797563373636812800
author Lu Xiong
Yongkun Wen
Yuyao Huang
Junqiao Zhao
Wei Tian
author_facet Lu Xiong
Yongkun Wen
Yuyao Huang
Junqiao Zhao
Wei Tian
author_sort Lu Xiong
collection DOAJ
description We propose a completely unsupervised approach to simultaneously estimate scene depth, ego-pose, ground segmentation and ground normal vector from only monocular RGB video sequences. In our approach, estimation for different scene structures can mutually benefit each other by the joint optimization. Specifically, we use the mutual information loss to pre-train the ground segmentation network and before adding the corresponding self-learning label obtained by a geometric method. By using the static nature of the ground and its normal vector, the scene depth and ego-motion can be efficiently learned by the self-supervised learning procedure. Extensive experimental results on both Cityscapes and KITTI benchmark demonstrate the significant improvement on the estimation accuracy for both scene depth and ego-pose by our approach. We also achieve an average error of about 3<inline-formula> <math display="inline"> <semantics> <msup> <mrow></mrow> <mo>∘</mo> </msup> </semantics> </math> </inline-formula> for estimated ground normal vectors. By deploying our proposed geometric constraints, the IOU accuracy of unsupervised ground segmentation is increased by 35% on the Cityscapes dataset.
first_indexed 2024-03-10T18:41:45Z
format Article
id doaj.art-a1db9a45186d463d91df6ccafe4dc7e9
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-10T18:41:45Z
publishDate 2020-07-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-a1db9a45186d463d91df6ccafe4dc7e92023-11-20T05:49:09ZengMDPI AGSensors1424-82202020-07-012013373710.3390/s20133737Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera SensorLu Xiong0Yongkun Wen1Yuyao Huang2Junqiao Zhao3Wei Tian4Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University, Shanghai 201804, ChinaInstitute of Intelligent Vehicles, School of Automotive Studies, Tongji University, Shanghai 201804, ChinaInstitute of Intelligent Vehicles, School of Automotive Studies, Tongji University, Shanghai 201804, ChinaInstitute of Intelligent Vehicles, School of Automotive Studies, Tongji University, Shanghai 201804, ChinaInstitute of Intelligent Vehicles, School of Automotive Studies, Tongji University, Shanghai 201804, ChinaWe propose a completely unsupervised approach to simultaneously estimate scene depth, ego-pose, ground segmentation and ground normal vector from only monocular RGB video sequences. In our approach, estimation for different scene structures can mutually benefit each other by the joint optimization. Specifically, we use the mutual information loss to pre-train the ground segmentation network and before adding the corresponding self-learning label obtained by a geometric method. By using the static nature of the ground and its normal vector, the scene depth and ego-motion can be efficiently learned by the self-supervised learning procedure. Extensive experimental results on both Cityscapes and KITTI benchmark demonstrate the significant improvement on the estimation accuracy for both scene depth and ego-pose by our approach. We also achieve an average error of about 3<inline-formula> <math display="inline"> <semantics> <msup> <mrow></mrow> <mo>∘</mo> </msup> </semantics> </math> </inline-formula> for estimated ground normal vectors. By deploying our proposed geometric constraints, the IOU accuracy of unsupervised ground segmentation is increased by 35% on the Cityscapes dataset.https://www.mdpi.com/1424-8220/20/13/3737unsupervised learningscene depthego-motionground segmentationground normal vector
spellingShingle Lu Xiong
Yongkun Wen
Yuyao Huang
Junqiao Zhao
Wei Tian
Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera Sensor
Sensors
unsupervised learning
scene depth
ego-motion
ground segmentation
ground normal vector
title Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera Sensor
title_full Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera Sensor
title_fullStr Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera Sensor
title_full_unstemmed Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera Sensor
title_short Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera Sensor
title_sort joint unsupervised learning of depth pose ground normal vector and ground segmentation by a monocular camera sensor
topic unsupervised learning
scene depth
ego-motion
ground segmentation
ground normal vector
url https://www.mdpi.com/1424-8220/20/13/3737
work_keys_str_mv AT luxiong jointunsupervisedlearningofdepthposegroundnormalvectorandgroundsegmentationbyamonocularcamerasensor
AT yongkunwen jointunsupervisedlearningofdepthposegroundnormalvectorandgroundsegmentationbyamonocularcamerasensor
AT yuyaohuang jointunsupervisedlearningofdepthposegroundnormalvectorandgroundsegmentationbyamonocularcamerasensor
AT junqiaozhao jointunsupervisedlearningofdepthposegroundnormalvectorandgroundsegmentationbyamonocularcamerasensor
AT weitian jointunsupervisedlearningofdepthposegroundnormalvectorandgroundsegmentationbyamonocularcamerasensor