Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera Sensor
We propose a completely unsupervised approach to simultaneously estimate scene depth, ego-pose, ground segmentation and ground normal vector from only monocular RGB video sequences. In our approach, estimation for different scene structures can mutually benefit each other by the joint optimization....
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-07-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/20/13/3737 |
_version_ | 1797563373636812800 |
---|---|
author | Lu Xiong Yongkun Wen Yuyao Huang Junqiao Zhao Wei Tian |
author_facet | Lu Xiong Yongkun Wen Yuyao Huang Junqiao Zhao Wei Tian |
author_sort | Lu Xiong |
collection | DOAJ |
description | We propose a completely unsupervised approach to simultaneously estimate scene depth, ego-pose, ground segmentation and ground normal vector from only monocular RGB video sequences. In our approach, estimation for different scene structures can mutually benefit each other by the joint optimization. Specifically, we use the mutual information loss to pre-train the ground segmentation network and before adding the corresponding self-learning label obtained by a geometric method. By using the static nature of the ground and its normal vector, the scene depth and ego-motion can be efficiently learned by the self-supervised learning procedure. Extensive experimental results on both Cityscapes and KITTI benchmark demonstrate the significant improvement on the estimation accuracy for both scene depth and ego-pose by our approach. We also achieve an average error of about 3<inline-formula> <math display="inline"> <semantics> <msup> <mrow></mrow> <mo>∘</mo> </msup> </semantics> </math> </inline-formula> for estimated ground normal vectors. By deploying our proposed geometric constraints, the IOU accuracy of unsupervised ground segmentation is increased by 35% on the Cityscapes dataset. |
first_indexed | 2024-03-10T18:41:45Z |
format | Article |
id | doaj.art-a1db9a45186d463d91df6ccafe4dc7e9 |
institution | Directory Open Access Journal |
issn | 1424-8220 |
language | English |
last_indexed | 2024-03-10T18:41:45Z |
publishDate | 2020-07-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj.art-a1db9a45186d463d91df6ccafe4dc7e92023-11-20T05:49:09ZengMDPI AGSensors1424-82202020-07-012013373710.3390/s20133737Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera SensorLu Xiong0Yongkun Wen1Yuyao Huang2Junqiao Zhao3Wei Tian4Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University, Shanghai 201804, ChinaInstitute of Intelligent Vehicles, School of Automotive Studies, Tongji University, Shanghai 201804, ChinaInstitute of Intelligent Vehicles, School of Automotive Studies, Tongji University, Shanghai 201804, ChinaInstitute of Intelligent Vehicles, School of Automotive Studies, Tongji University, Shanghai 201804, ChinaInstitute of Intelligent Vehicles, School of Automotive Studies, Tongji University, Shanghai 201804, ChinaWe propose a completely unsupervised approach to simultaneously estimate scene depth, ego-pose, ground segmentation and ground normal vector from only monocular RGB video sequences. In our approach, estimation for different scene structures can mutually benefit each other by the joint optimization. Specifically, we use the mutual information loss to pre-train the ground segmentation network and before adding the corresponding self-learning label obtained by a geometric method. By using the static nature of the ground and its normal vector, the scene depth and ego-motion can be efficiently learned by the self-supervised learning procedure. Extensive experimental results on both Cityscapes and KITTI benchmark demonstrate the significant improvement on the estimation accuracy for both scene depth and ego-pose by our approach. We also achieve an average error of about 3<inline-formula> <math display="inline"> <semantics> <msup> <mrow></mrow> <mo>∘</mo> </msup> </semantics> </math> </inline-formula> for estimated ground normal vectors. By deploying our proposed geometric constraints, the IOU accuracy of unsupervised ground segmentation is increased by 35% on the Cityscapes dataset.https://www.mdpi.com/1424-8220/20/13/3737unsupervised learningscene depthego-motionground segmentationground normal vector |
spellingShingle | Lu Xiong Yongkun Wen Yuyao Huang Junqiao Zhao Wei Tian Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera Sensor Sensors unsupervised learning scene depth ego-motion ground segmentation ground normal vector |
title | Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera Sensor |
title_full | Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera Sensor |
title_fullStr | Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera Sensor |
title_full_unstemmed | Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera Sensor |
title_short | Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera Sensor |
title_sort | joint unsupervised learning of depth pose ground normal vector and ground segmentation by a monocular camera sensor |
topic | unsupervised learning scene depth ego-motion ground segmentation ground normal vector |
url | https://www.mdpi.com/1424-8220/20/13/3737 |
work_keys_str_mv | AT luxiong jointunsupervisedlearningofdepthposegroundnormalvectorandgroundsegmentationbyamonocularcamerasensor AT yongkunwen jointunsupervisedlearningofdepthposegroundnormalvectorandgroundsegmentationbyamonocularcamerasensor AT yuyaohuang jointunsupervisedlearningofdepthposegroundnormalvectorandgroundsegmentationbyamonocularcamerasensor AT junqiaozhao jointunsupervisedlearningofdepthposegroundnormalvectorandgroundsegmentationbyamonocularcamerasensor AT weitian jointunsupervisedlearningofdepthposegroundnormalvectorandgroundsegmentationbyamonocularcamerasensor |