MonoDCN: Monocular 3D object detection based on dynamic convolution.

3D object detection is vital in the environment perception of autonomous driving. The current monocular 3D object detection technology mainly uses RGB images and pseudo radar point clouds as input. The methods of taking RGB images as input need to learn with geometric constraints and ignore the dept...

Full description

Bibliographic Details
Main Authors: Shenming Qu, Xinyu Yang, Yiming Gao, Shengbin Liang
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0275438
_version_ 1811182104297340928
author Shenming Qu
Xinyu Yang
Yiming Gao
Shengbin Liang
author_facet Shenming Qu
Xinyu Yang
Yiming Gao
Shengbin Liang
author_sort Shenming Qu
collection DOAJ
description 3D object detection is vital in the environment perception of autonomous driving. The current monocular 3D object detection technology mainly uses RGB images and pseudo radar point clouds as input. The methods of taking RGB images as input need to learn with geometric constraints and ignore the depth information in the picture, leading to the method being too complicated and inefficient. Although some image-based methods use depth map information for post-calibration and correction, such methods usually require a high-precision depth estimation network. The methods of using the pseudo radar point cloud as input easily introduce noise in the conversion process of depth information to the pseudo radar point cloud, which cause a large deviation in the detection process and ignores semantic information simultaneously. We introduce dynamic convolution guided by the depth map into the feature extraction network, the convolution kernel of dynamic convolution automatically learns from the depth map of the image. It solves the problem that depth information and semantic information cannot be used simultaneously and improves the accuracy of monocular 3D object detection. MonoDCN is able to significantly improve the performance of both monocular 3D object detection and Bird's Eye View tasks within the KITTI urban autonomous driving dataset.
first_indexed 2024-04-11T09:26:26Z
format Article
id doaj.art-089a17a590824c549ffc2874401e2c23
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-11T09:26:26Z
publishDate 2022-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-089a17a590824c549ffc2874401e2c232022-12-22T04:32:00ZengPublic Library of Science (PLoS)PLoS ONE1932-62032022-01-011710e027543810.1371/journal.pone.0275438MonoDCN: Monocular 3D object detection based on dynamic convolution.Shenming QuXinyu YangYiming GaoShengbin Liang3D object detection is vital in the environment perception of autonomous driving. The current monocular 3D object detection technology mainly uses RGB images and pseudo radar point clouds as input. The methods of taking RGB images as input need to learn with geometric constraints and ignore the depth information in the picture, leading to the method being too complicated and inefficient. Although some image-based methods use depth map information for post-calibration and correction, such methods usually require a high-precision depth estimation network. The methods of using the pseudo radar point cloud as input easily introduce noise in the conversion process of depth information to the pseudo radar point cloud, which cause a large deviation in the detection process and ignores semantic information simultaneously. We introduce dynamic convolution guided by the depth map into the feature extraction network, the convolution kernel of dynamic convolution automatically learns from the depth map of the image. It solves the problem that depth information and semantic information cannot be used simultaneously and improves the accuracy of monocular 3D object detection. MonoDCN is able to significantly improve the performance of both monocular 3D object detection and Bird's Eye View tasks within the KITTI urban autonomous driving dataset.https://doi.org/10.1371/journal.pone.0275438
spellingShingle Shenming Qu
Xinyu Yang
Yiming Gao
Shengbin Liang
MonoDCN: Monocular 3D object detection based on dynamic convolution.
PLoS ONE
title MonoDCN: Monocular 3D object detection based on dynamic convolution.
title_full MonoDCN: Monocular 3D object detection based on dynamic convolution.
title_fullStr MonoDCN: Monocular 3D object detection based on dynamic convolution.
title_full_unstemmed MonoDCN: Monocular 3D object detection based on dynamic convolution.
title_short MonoDCN: Monocular 3D object detection based on dynamic convolution.
title_sort monodcn monocular 3d object detection based on dynamic convolution
url https://doi.org/10.1371/journal.pone.0275438
work_keys_str_mv AT shenmingqu monodcnmonocular3dobjectdetectionbasedondynamicconvolution
AT xinyuyang monodcnmonocular3dobjectdetectionbasedondynamicconvolution
AT yiminggao monodcnmonocular3dobjectdetectionbasedondynamicconvolution
AT shengbinliang monodcnmonocular3dobjectdetectionbasedondynamicconvolution