Multi-modal semantic segmentation in poor lighting conditions

Semantic segmentation is a complicate dense prediction task that consumes significant computational resources, and the use of multi-modal RGB-T data makes its computational burden even more severe. This dissertation presents a novel and lightweight network for RGB-T semantic segmentation with a para...

Full description

Bibliographic Details
Main Author: Li, Zifeng
Other Authors: Wang Dan Wei
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/169137
_version_ 1811694711541334016
author Li, Zifeng
author2 Wang Dan Wei
author_facet Wang Dan Wei
Li, Zifeng
author_sort Li, Zifeng
collection NTU
description Semantic segmentation is a complicate dense prediction task that consumes significant computational resources, and the use of multi-modal RGB-T data makes its computational burden even more severe. This dissertation presents a novel and lightweight network for RGB-T semantic segmentation with a parameter-free feature fusion module that facilitates efficient fusion between modalities. The proposed method integrates both modalities by leveraging multi-scale features from both RGB and T domains in different feature extraction stages. Specifically, we employ a dual-encoder architecture to extract RGB-T features and fuse them with a parameter-free cross-modal attention mechanism, taking the advantage of the complementary information provided by the two modalities to improve segmentation accuracy. Besides, we further investigate the impact of different pretrained strategies on the performance of the model. We evaluate our approach on several benchmark datasets, including the MFNet and PST900 datasets. Experimental results show that our approach outperforms real-time state-of-the-art methods in the literature while showing comparable performance with state-of the-art methods that require up to 100 times the computational complexity. Our findings demonstrate the effectiveness of lightweight RGB-T model for semantic segmentation and highlight the potential of this approach for various real-world applications.
first_indexed 2024-10-01T07:11:55Z
format Thesis-Master by Coursework
id ntu-10356/169137
institution Nanyang Technological University
language English
last_indexed 2024-10-01T07:11:55Z
publishDate 2023
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1691372023-07-04T15:09:46Z Multi-modal semantic segmentation in poor lighting conditions Li, Zifeng Wang Dan Wei School of Electrical and Electronic Engineering EDWWANG@ntu.edu.sg Engineering::Computer science and engineering::Computer applications Semantic segmentation is a complicate dense prediction task that consumes significant computational resources, and the use of multi-modal RGB-T data makes its computational burden even more severe. This dissertation presents a novel and lightweight network for RGB-T semantic segmentation with a parameter-free feature fusion module that facilitates efficient fusion between modalities. The proposed method integrates both modalities by leveraging multi-scale features from both RGB and T domains in different feature extraction stages. Specifically, we employ a dual-encoder architecture to extract RGB-T features and fuse them with a parameter-free cross-modal attention mechanism, taking the advantage of the complementary information provided by the two modalities to improve segmentation accuracy. Besides, we further investigate the impact of different pretrained strategies on the performance of the model. We evaluate our approach on several benchmark datasets, including the MFNet and PST900 datasets. Experimental results show that our approach outperforms real-time state-of-the-art methods in the literature while showing comparable performance with state-of the-art methods that require up to 100 times the computational complexity. Our findings demonstrate the effectiveness of lightweight RGB-T model for semantic segmentation and highlight the potential of this approach for various real-world applications. Master of Science (Computer Control and Automation) 2023-07-03T06:44:22Z 2023-07-03T06:44:22Z 2023 Thesis-Master by Coursework Li, Z. (2023). Multi-modal semantic segmentation in poor lighting conditions. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/169137 https://hdl.handle.net/10356/169137 en application/pdf Nanyang Technological University
spellingShingle Engineering::Computer science and engineering::Computer applications
Li, Zifeng
Multi-modal semantic segmentation in poor lighting conditions
title Multi-modal semantic segmentation in poor lighting conditions
title_full Multi-modal semantic segmentation in poor lighting conditions
title_fullStr Multi-modal semantic segmentation in poor lighting conditions
title_full_unstemmed Multi-modal semantic segmentation in poor lighting conditions
title_short Multi-modal semantic segmentation in poor lighting conditions
title_sort multi modal semantic segmentation in poor lighting conditions
topic Engineering::Computer science and engineering::Computer applications
url https://hdl.handle.net/10356/169137
work_keys_str_mv AT lizifeng multimodalsemanticsegmentationinpoorlightingconditions