MultiNet-GS: Structured Road Perception Model Based on Multi-Task Convolutional Neural Network

In order to address the issue of environmental perception in autonomous driving on structured roads, we propose MultiNet-GS, a convolutional neural network model based on an encoder–decoder architecture that tackles multiple tasks simultaneously. We use the main structure of the latest object detect...

Full description

Bibliographic Details
Main Authors:	Ang Li, Zhaoyang Zhang, Shijie Sun, Mingtao Feng, Chengzhong Wu
Format:	Article
Language:	English
Published:	MDPI AG 2023-09-01
Series:	Electronics
Subjects:	object detection semantic segmentation lane detection multi-task model
Online Access:	https://www.mdpi.com/2079-9292/12/19/3994

_version_	1797576049901436928
author	Ang Li Zhaoyang Zhang Shijie Sun Mingtao Feng Chengzhong Wu
author_facet	Ang Li Zhaoyang Zhang Shijie Sun Mingtao Feng Chengzhong Wu
author_sort	Ang Li
collection	DOAJ
description	In order to address the issue of environmental perception in autonomous driving on structured roads, we propose MultiNet-GS, a convolutional neural network model based on an encoder–decoder architecture that tackles multiple tasks simultaneously. We use the main structure of the latest object detection model, the YOLOv8 model, as the encoder structure of our model. We introduce a new dynamic sparse attention mechanism, BiFormer, in the feature extraction part of the model to achieve more flexible computing resource allocation, which can significantly improve the computational efficiency and occupy a small computational overhead. We introduce a lightweight convolution, GSConv, in the feature fusion part of the network, which is used to build the neck part into a new slim-neck structure so as to reduce the computational complexity and inference time of the detector. We also add an additional detector for tiny objects to the conventional three-head detector structure. Finally, we introduce a lane detection method based on guide lines in the lane detection part, which can aggregate the lane feature information into multiple key points, obtain the lane heat map response through conditional convolution, and then describe the lane line through the adaptive decoder, which effectively makes up for the shortcomings of the traditional lane detection method. Our comparative experiments on the BDD100K dataset on the embedded platform NVIDIA Jetson TX2 show that compared with SOTA(YOLOPv2), the mAP@0.5 of the model in traffic object detection reaches 82.1%, which is increased by 2.7%. The accuracy of the model in drivable area detection reaches 93.2%, which is increased by 0.5%. The accuracy of the model in lane detection reaches 85.7%, which is increased by 4.3%. The Params and FLOPs of the model reach 47.5 M and 117.5, which are reduced by 6.6 M and 8.3, respectively. The model achieves 72 FPS, which is increased by 5. Our MultiNet-GS model has the highest detection accuracy among the current mainstream models while maintaining a good detection speed and has certain superiority.
first_indexed	2024-03-10T21:47:02Z
format	Article
id	doaj.art-c67a09a987114b2caec5991b5cb17db2
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-10T21:47:02Z
publishDate	2023-09-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-c67a09a987114b2caec5991b5cb17db22023-11-19T14:15:43ZengMDPI AGElectronics2079-92922023-09-011219399410.3390/electronics12193994MultiNet-GS: Structured Road Perception Model Based on Multi-Task Convolutional Neural NetworkAng Li0Zhaoyang Zhang1Shijie Sun2Mingtao Feng3Chengzhong Wu4School of Information Engineering, Chang’an University, Xi’an 710064, ChinaSchool of Information Engineering, Chang’an University, Xi’an 710064, ChinaSchool of Information Engineering, Chang’an University, Xi’an 710064, ChinaSchool of Computer Science and Technology, Xidian University, Xi’an 710126, ChinaNational Engineering Laboratory of Robot Visual Perception and Control Technology, Hunan University, Changsha 410082, ChinaIn order to address the issue of environmental perception in autonomous driving on structured roads, we propose MultiNet-GS, a convolutional neural network model based on an encoder–decoder architecture that tackles multiple tasks simultaneously. We use the main structure of the latest object detection model, the YOLOv8 model, as the encoder structure of our model. We introduce a new dynamic sparse attention mechanism, BiFormer, in the feature extraction part of the model to achieve more flexible computing resource allocation, which can significantly improve the computational efficiency and occupy a small computational overhead. We introduce a lightweight convolution, GSConv, in the feature fusion part of the network, which is used to build the neck part into a new slim-neck structure so as to reduce the computational complexity and inference time of the detector. We also add an additional detector for tiny objects to the conventional three-head detector structure. Finally, we introduce a lane detection method based on guide lines in the lane detection part, which can aggregate the lane feature information into multiple key points, obtain the lane heat map response through conditional convolution, and then describe the lane line through the adaptive decoder, which effectively makes up for the shortcomings of the traditional lane detection method. Our comparative experiments on the BDD100K dataset on the embedded platform NVIDIA Jetson TX2 show that compared with SOTA(YOLOPv2), the mAP@0.5 of the model in traffic object detection reaches 82.1%, which is increased by 2.7%. The accuracy of the model in drivable area detection reaches 93.2%, which is increased by 0.5%. The accuracy of the model in lane detection reaches 85.7%, which is increased by 4.3%. The Params and FLOPs of the model reach 47.5 M and 117.5, which are reduced by 6.6 M and 8.3, respectively. The model achieves 72 FPS, which is increased by 5. Our MultiNet-GS model has the highest detection accuracy among the current mainstream models while maintaining a good detection speed and has certain superiority.https://www.mdpi.com/2079-9292/12/19/3994object detectionsemantic segmentationlane detectionmulti-task model
spellingShingle	Ang Li Zhaoyang Zhang Shijie Sun Mingtao Feng Chengzhong Wu MultiNet-GS: Structured Road Perception Model Based on Multi-Task Convolutional Neural Network Electronics object detection semantic segmentation lane detection multi-task model
title	MultiNet-GS: Structured Road Perception Model Based on Multi-Task Convolutional Neural Network
title_full	MultiNet-GS: Structured Road Perception Model Based on Multi-Task Convolutional Neural Network
title_fullStr	MultiNet-GS: Structured Road Perception Model Based on Multi-Task Convolutional Neural Network
title_full_unstemmed	MultiNet-GS: Structured Road Perception Model Based on Multi-Task Convolutional Neural Network
title_short	MultiNet-GS: Structured Road Perception Model Based on Multi-Task Convolutional Neural Network
title_sort	multinet gs structured road perception model based on multi task convolutional neural network
topic	object detection semantic segmentation lane detection multi-task model
url	https://www.mdpi.com/2079-9292/12/19/3994
work_keys_str_mv	AT angli multinetgsstructuredroadperceptionmodelbasedonmultitaskconvolutionalneuralnetwork AT zhaoyangzhang multinetgsstructuredroadperceptionmodelbasedonmultitaskconvolutionalneuralnetwork AT shijiesun multinetgsstructuredroadperceptionmodelbasedonmultitaskconvolutionalneuralnetwork AT mingtaofeng multinetgsstructuredroadperceptionmodelbasedonmultitaskconvolutionalneuralnetwork AT chengzhongwu multinetgsstructuredroadperceptionmodelbasedonmultitaskconvolutionalneuralnetwork

MultiNet-GS: Structured Road Perception Model Based on Multi-Task Convolutional Neural Network

Similar Items