Efficient perception methods for autonomous driving via bird's-eye-view representation

The advent of autonomous driving technologies marks a significant leap forward in the evolution of transportation systems, promising to enhance vehicle safety, efficiency and navigation capabilities. Central to these advancements is the development of sophisticated perception systems capable of inte...

Full description

Bibliographic Details
Main Author:	Li, Yu Xin
Other Authors:	Yeo Chai Kiat
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2025
Subjects:	Computer and Information Science Bird's-eye-view Efficient perception
Online Access:	https://hdl.handle.net/10356/182732

_version_	1826112744573632512
author	Li, Yu Xin
author2	Yeo Chai Kiat
author_facet	Yeo Chai Kiat Li, Yu Xin
author_sort	Li, Yu Xin
collection	NTU
description	The advent of autonomous driving technologies marks a significant leap forward in the evolution of transportation systems, promising to enhance vehicle safety, efficiency and navigation capabilities. Central to these advancements is the development of sophisticated perception systems capable of interpreting complex and dynamic environments. Bird’s-Eye-View (BEV) perception, in particular, has emerged as a pivotal technology due to its ability to amalgamate data from multiple sensors into a coherent top-down view of the vehicle’s surroundings. This thesis addresses the critical challenges associated with BEV perception, particularly the computational demands and efficiency of integrating data from diverse sensor modalities in real-world automotive applications. The initial study presented in this thesis, BEVENet, challenges the traditional reliance on Vision Transformers (ViT), which, despite their ability to capture global semantic information, impose significant computational burdens. BEVENet advocates for a convolutional neural network (CNN)-based approach, tailored to enhance computational efficiency without compromising the accuracy and speed required for real-time perception in autonomous vehicles. By redesigning the BEV perception framework to utilize CNNs exclusively, this study achieves substantial reductions in GPU memory usage and computational complexity. The results demonstrate that BEVENet not only matches but often surpasses the performance metrics of state-of-the-art methods, achieving superior inference speeds and reducing computational overhead, thereby making it well-suited for deployment in vehicles with limited computational resources. Building on the groundwork laid by BEVENet, the second part of this thesis, BEVPrumer, advances sensor fusion techniques within the BEV framework. This study introduces a novel approach to data pruning, which strategically processes inputs from multimodal sensors to eliminate redundant data without sacrificing the quality of perception. This content-aware pruning method significantly reduces the computational load by selectively focusing on regions of the environment that are crucial for the perception tasks at hand, thereby optimizing the efficiency of data integration and processing. Experimental results indicate that this approach can reduce model complexity by 35% while maintaining competitive performance with state-of-the-art systems, suggesting a scalable solution for enhancing the compu- tational efficiency of sensor fusion in autonomous vehicles. The third and final study, QuadBEV, explores the integration of multiple perception tasks into a single, cohesive BEV framework. This multitask learning approach addresses the challenge of operational inefficiency in traditional systems by combining tasks such as 3D object detection, lane detection, map segmentation, and occupancy prediction into one unified system. QuadBEV leverages shared spatial and contextual information across these tasks to minimize redundant computations and optimize overall system performance. A tailored training strategy is employed to manage the unique learning rate sensitivities and potential conflicts between different task objectives, facilitating a harmonious integration that enhances the overall efficacy and robustness of the perception system. The framework’s effec- tiveness is validated through extensive testing, confirming its capability to operate effectively in real-world scenarios. In summary, this thesis presents a series of innovative studies that collectively enhance the efficiency, accuracy and practicality of BEV-based perception systems for autonomous driving. Through methodological advancements and rigorous testing, it establishes new benchmarks for the deployment of these technologies in real-world settings, significantly contributing to the field of autonomous vehicle perception and paving the way for future research and development in this critical area of automotive technology.
first_indexed	2025-03-09T10:40:23Z
format	Thesis-Doctor of Philosophy
id	ntu-10356/182732
institution	Nanyang Technological University
language	English
last_indexed	2025-03-09T10:40:23Z
publishDate	2025
publisher	Nanyang Technological University
record_format	dspace
spelling	ntu-10356/1827322025-03-04T02:57:33Z Efficient perception methods for autonomous driving via bird's-eye-view representation Li, Yu Xin Yeo Chai Kiat College of Computing and Data Science ASCKYEO@ntu.edu.sg Computer and Information Science Bird's-eye-view Efficient perception The advent of autonomous driving technologies marks a significant leap forward in the evolution of transportation systems, promising to enhance vehicle safety, efficiency and navigation capabilities. Central to these advancements is the development of sophisticated perception systems capable of interpreting complex and dynamic environments. Bird’s-Eye-View (BEV) perception, in particular, has emerged as a pivotal technology due to its ability to amalgamate data from multiple sensors into a coherent top-down view of the vehicle’s surroundings. This thesis addresses the critical challenges associated with BEV perception, particularly the computational demands and efficiency of integrating data from diverse sensor modalities in real-world automotive applications. The initial study presented in this thesis, BEVENet, challenges the traditional reliance on Vision Transformers (ViT), which, despite their ability to capture global semantic information, impose significant computational burdens. BEVENet advocates for a convolutional neural network (CNN)-based approach, tailored to enhance computational efficiency without compromising the accuracy and speed required for real-time perception in autonomous vehicles. By redesigning the BEV perception framework to utilize CNNs exclusively, this study achieves substantial reductions in GPU memory usage and computational complexity. The results demonstrate that BEVENet not only matches but often surpasses the performance metrics of state-of-the-art methods, achieving superior inference speeds and reducing computational overhead, thereby making it well-suited for deployment in vehicles with limited computational resources. Building on the groundwork laid by BEVENet, the second part of this thesis, BEVPrumer, advances sensor fusion techniques within the BEV framework. This study introduces a novel approach to data pruning, which strategically processes inputs from multimodal sensors to eliminate redundant data without sacrificing the quality of perception. This content-aware pruning method significantly reduces the computational load by selectively focusing on regions of the environment that are crucial for the perception tasks at hand, thereby optimizing the efficiency of data integration and processing. Experimental results indicate that this approach can reduce model complexity by 35% while maintaining competitive performance with state-of-the-art systems, suggesting a scalable solution for enhancing the compu- tational efficiency of sensor fusion in autonomous vehicles. The third and final study, QuadBEV, explores the integration of multiple perception tasks into a single, cohesive BEV framework. This multitask learning approach addresses the challenge of operational inefficiency in traditional systems by combining tasks such as 3D object detection, lane detection, map segmentation, and occupancy prediction into one unified system. QuadBEV leverages shared spatial and contextual information across these tasks to minimize redundant computations and optimize overall system performance. A tailored training strategy is employed to manage the unique learning rate sensitivities and potential conflicts between different task objectives, facilitating a harmonious integration that enhances the overall efficacy and robustness of the perception system. The framework’s effec- tiveness is validated through extensive testing, confirming its capability to operate effectively in real-world scenarios. In summary, this thesis presents a series of innovative studies that collectively enhance the efficiency, accuracy and practicality of BEV-based perception systems for autonomous driving. Through methodological advancements and rigorous testing, it establishes new benchmarks for the deployment of these technologies in real-world settings, significantly contributing to the field of autonomous vehicle perception and paving the way for future research and development in this critical area of automotive technology. Doctor of Philosophy 2025-02-20T06:47:31Z 2025-02-20T06:47:31Z 2025 Thesis-Doctor of Philosophy Li, Y. X. (2025). Efficient perception methods for autonomous driving via bird's-eye-view representation. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182732 https://hdl.handle.net/10356/182732 10.32657/10356/182732 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
spellingShingle	Computer and Information Science Bird's-eye-view Efficient perception Li, Yu Xin Efficient perception methods for autonomous driving via bird's-eye-view representation
title	Efficient perception methods for autonomous driving via bird's-eye-view representation
title_full	Efficient perception methods for autonomous driving via bird's-eye-view representation
title_fullStr	Efficient perception methods for autonomous driving via bird's-eye-view representation
title_full_unstemmed	Efficient perception methods for autonomous driving via bird's-eye-view representation
title_short	Efficient perception methods for autonomous driving via bird's-eye-view representation
title_sort	efficient perception methods for autonomous driving via bird s eye view representation
topic	Computer and Information Science Bird's-eye-view Efficient perception
url	https://hdl.handle.net/10356/182732
work_keys_str_mv	AT liyuxin efficientperceptionmethodsforautonomousdrivingviabirdseyeviewrepresentation

Efficient perception methods for autonomous driving via bird's-eye-view representation

Similar Items