Efficient perception methods for autonomous driving via bird's-eye-view representation
The advent of autonomous driving technologies marks a significant leap forward in the evolution of transportation systems, promising to enhance vehicle safety, efficiency and navigation capabilities. Central to these advancements is the development of sophisticated perception systems capable of inte...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2025
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/182732 |
_version_ | 1826112744573632512 |
---|---|
author | Li, Yu Xin |
author2 | Yeo Chai Kiat |
author_facet | Yeo Chai Kiat Li, Yu Xin |
author_sort | Li, Yu Xin |
collection | NTU |
description | The advent of autonomous driving technologies marks a significant leap forward in the evolution of transportation systems, promising to enhance vehicle safety, efficiency and navigation capabilities. Central to these advancements is the development of sophisticated perception systems capable of interpreting complex and dynamic environments. Bird’s-Eye-View (BEV) perception, in particular, has emerged as a pivotal technology due to its ability to amalgamate data from multiple sensors into a coherent top-down view of the vehicle’s surroundings. This thesis addresses the critical challenges associated with BEV perception, particularly the computational demands and efficiency of integrating data from diverse
sensor modalities in real-world automotive applications.
The initial study presented in this thesis, BEVENet, challenges the traditional reliance on Vision Transformers (ViT), which, despite their ability to capture global semantic information, impose significant computational burdens. BEVENet advocates for a convolutional neural network (CNN)-based approach, tailored to enhance computational efficiency without compromising the accuracy and speed
required for real-time perception in autonomous vehicles. By redesigning the BEV perception framework to utilize CNNs exclusively, this study achieves substantial reductions in GPU memory usage and computational complexity. The results demonstrate that BEVENet not only matches but often surpasses the performance metrics of state-of-the-art methods, achieving superior inference speeds and reducing computational overhead, thereby making it well-suited for deployment in vehicles with limited computational resources.
Building on the groundwork laid by BEVENet, the second part of this thesis, BEVPrumer, advances sensor fusion techniques within the BEV framework. This study introduces a novel approach to data pruning, which strategically processes inputs from multimodal sensors to eliminate redundant data without sacrificing the quality of perception. This content-aware pruning method significantly reduces the
computational load by selectively focusing on regions of the environment that are crucial for the perception tasks at hand, thereby optimizing the efficiency of data integration and processing. Experimental results indicate that this approach can reduce model complexity by 35% while maintaining competitive performance with state-of-the-art systems, suggesting a scalable solution for enhancing the compu-
tational efficiency of sensor fusion in autonomous vehicles.
The third and final study, QuadBEV, explores the integration of multiple perception tasks into a single, cohesive BEV framework. This multitask learning approach addresses the challenge of operational inefficiency in traditional systems by combining tasks such as 3D object detection, lane detection, map segmentation, and occupancy prediction into one unified system. QuadBEV leverages shared spatial
and contextual information across these tasks to minimize redundant computations and optimize overall system performance. A tailored training strategy is employed to manage the unique learning rate sensitivities and potential conflicts between different task objectives, facilitating a harmonious integration that enhances the overall efficacy and robustness of the perception system. The framework’s effec-
tiveness is validated through extensive testing, confirming its capability to operate effectively in real-world scenarios.
In summary, this thesis presents a series of innovative studies that collectively enhance the efficiency, accuracy and practicality of BEV-based perception systems for autonomous driving. Through methodological advancements and rigorous testing, it establishes new benchmarks for the deployment of these technologies in real-world settings, significantly contributing to the field of autonomous vehicle
perception and paving the way for future research and development in this critical area of automotive technology. |
first_indexed | 2025-03-09T10:40:23Z |
format | Thesis-Doctor of Philosophy |
id | ntu-10356/182732 |
institution | Nanyang Technological University |
language | English |
last_indexed | 2025-03-09T10:40:23Z |
publishDate | 2025 |
publisher | Nanyang Technological University |
record_format | dspace |
spelling | ntu-10356/1827322025-03-04T02:57:33Z Efficient perception methods for autonomous driving via bird's-eye-view representation Li, Yu Xin Yeo Chai Kiat College of Computing and Data Science ASCKYEO@ntu.edu.sg Computer and Information Science Bird's-eye-view Efficient perception The advent of autonomous driving technologies marks a significant leap forward in the evolution of transportation systems, promising to enhance vehicle safety, efficiency and navigation capabilities. Central to these advancements is the development of sophisticated perception systems capable of interpreting complex and dynamic environments. Bird’s-Eye-View (BEV) perception, in particular, has emerged as a pivotal technology due to its ability to amalgamate data from multiple sensors into a coherent top-down view of the vehicle’s surroundings. This thesis addresses the critical challenges associated with BEV perception, particularly the computational demands and efficiency of integrating data from diverse sensor modalities in real-world automotive applications. The initial study presented in this thesis, BEVENet, challenges the traditional reliance on Vision Transformers (ViT), which, despite their ability to capture global semantic information, impose significant computational burdens. BEVENet advocates for a convolutional neural network (CNN)-based approach, tailored to enhance computational efficiency without compromising the accuracy and speed required for real-time perception in autonomous vehicles. By redesigning the BEV perception framework to utilize CNNs exclusively, this study achieves substantial reductions in GPU memory usage and computational complexity. The results demonstrate that BEVENet not only matches but often surpasses the performance metrics of state-of-the-art methods, achieving superior inference speeds and reducing computational overhead, thereby making it well-suited for deployment in vehicles with limited computational resources. Building on the groundwork laid by BEVENet, the second part of this thesis, BEVPrumer, advances sensor fusion techniques within the BEV framework. This study introduces a novel approach to data pruning, which strategically processes inputs from multimodal sensors to eliminate redundant data without sacrificing the quality of perception. This content-aware pruning method significantly reduces the computational load by selectively focusing on regions of the environment that are crucial for the perception tasks at hand, thereby optimizing the efficiency of data integration and processing. Experimental results indicate that this approach can reduce model complexity by 35% while maintaining competitive performance with state-of-the-art systems, suggesting a scalable solution for enhancing the compu- tational efficiency of sensor fusion in autonomous vehicles. The third and final study, QuadBEV, explores the integration of multiple perception tasks into a single, cohesive BEV framework. This multitask learning approach addresses the challenge of operational inefficiency in traditional systems by combining tasks such as 3D object detection, lane detection, map segmentation, and occupancy prediction into one unified system. QuadBEV leverages shared spatial and contextual information across these tasks to minimize redundant computations and optimize overall system performance. A tailored training strategy is employed to manage the unique learning rate sensitivities and potential conflicts between different task objectives, facilitating a harmonious integration that enhances the overall efficacy and robustness of the perception system. The framework’s effec- tiveness is validated through extensive testing, confirming its capability to operate effectively in real-world scenarios. In summary, this thesis presents a series of innovative studies that collectively enhance the efficiency, accuracy and practicality of BEV-based perception systems for autonomous driving. Through methodological advancements and rigorous testing, it establishes new benchmarks for the deployment of these technologies in real-world settings, significantly contributing to the field of autonomous vehicle perception and paving the way for future research and development in this critical area of automotive technology. Doctor of Philosophy 2025-02-20T06:47:31Z 2025-02-20T06:47:31Z 2025 Thesis-Doctor of Philosophy Li, Y. X. (2025). Efficient perception methods for autonomous driving via bird's-eye-view representation. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182732 https://hdl.handle.net/10356/182732 10.32657/10356/182732 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |
spellingShingle | Computer and Information Science Bird's-eye-view Efficient perception Li, Yu Xin Efficient perception methods for autonomous driving via bird's-eye-view representation |
title | Efficient perception methods for autonomous driving via bird's-eye-view representation |
title_full | Efficient perception methods for autonomous driving via bird's-eye-view representation |
title_fullStr | Efficient perception methods for autonomous driving via bird's-eye-view representation |
title_full_unstemmed | Efficient perception methods for autonomous driving via bird's-eye-view representation |
title_short | Efficient perception methods for autonomous driving via bird's-eye-view representation |
title_sort | efficient perception methods for autonomous driving via bird s eye view representation |
topic | Computer and Information Science Bird's-eye-view Efficient perception |
url | https://hdl.handle.net/10356/182732 |
work_keys_str_mv | AT liyuxin efficientperceptionmethodsforautonomousdrivingviabirdseyeviewrepresentation |