Vehicle Detection Based on Adaptive Multimodal Feature Fusion and Cross-Modal Vehicle Index Using RGB-T Images

Target detection is a critical task in interpreting aerial images. Small target detection, such as vehicles, is challenging. Different lighting conditions affect the accuracy of vehicle detection. For example, vehicles are difficult to distinguish from the background in red, green, blue (RGB) images...

Full description

Bibliographic Details
Main Authors:	Yuanfeng Wu, Xinran Guan, Boya Zhao, Li Ni, Min Huang
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:	Adaptive feature fusion aerial images channel attention cross-modal vehicle index vehicle detection
Online Access:	https://ieeexplore.ieee.org/document/10179923/

_version_	1797357974207856640
author	Yuanfeng Wu Xinran Guan Boya Zhao Li Ni Min Huang
author_facet	Yuanfeng Wu Xinran Guan Boya Zhao Li Ni Min Huang
author_sort	Yuanfeng Wu
collection	DOAJ
description	Target detection is a critical task in interpreting aerial images. Small target detection, such as vehicles, is challenging. Different lighting conditions affect the accuracy of vehicle detection. For example, vehicles are difficult to distinguish from the background in red, green, blue (RGB) images under low illumination conditions. In contrast, under high-illumination conditions, the color and texture of vehicles are not significantly different in thermal infrared (TIR) images. To improve the accuracy of vehicle detection under various illumination conditions, we propose an adaptive multimodal feature fusion and cross-modal vehicle index (AFFCM) model for vehicle detection. Based on the single-stage object detection model, AFFCM uses RGB and TIR images. It comprises three parts: 1) the softpooling channel attention (SCA) mechanism calculates the cross-modal feature weights of the RGB and TIR features using a fully connected layer during global weighted pooling; 2) we design a multimodal adaptive feature fusion (MAFF) module based on the cross-modal feature weights derived from the SCA mechanism; the MAFF selects features with high weight, compresses redundant features with low weight, and performs adaptive fusion using a multiscale feature pyramid; and 3) a cross-modal vehicle index is established to extract the target area, suppress complex background information, and minimize false alarms in vehicle detection. The mean average precision (mAP) on the Drone Vehicle dataset is 14.44% and 5.02% higher than that obtained using only RGB or TIR images. The mAP is 2.63% higher than that of state-of-the-art methods that utilize RGB and TIR images.
first_indexed	2024-03-08T14:53:26Z
format	Article
id	doaj.art-86caf3f2a2b14adf8d89ce6454da81ed
institution	Directory Open Access Journal
issn	2151-1535
language	English
last_indexed	2024-03-08T14:53:26Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling	doaj.art-86caf3f2a2b14adf8d89ce6454da81ed2024-01-11T00:00:56ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing2151-15352023-01-01168166817710.1109/JSTARS.2023.329462410179923Vehicle Detection Based on Adaptive Multimodal Feature Fusion and Cross-Modal Vehicle Index Using RGB-T ImagesYuanfeng Wu0https://orcid.org/0000-0001-8427-9851Xinran Guan1https://orcid.org/0000-0001-7131-4531Boya Zhao2https://orcid.org/0000-0001-5620-406XLi Ni3Min Huang4Key Laboratory of Computational Optical Imaging Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaKey Laboratory of Computational Optical Imaging Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaKey Laboratory of Computational Optical Imaging Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaKey Laboratory of Computational Optical Imaging Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaKey Laboratory of Computational Optical Imaging Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaTarget detection is a critical task in interpreting aerial images. Small target detection, such as vehicles, is challenging. Different lighting conditions affect the accuracy of vehicle detection. For example, vehicles are difficult to distinguish from the background in red, green, blue (RGB) images under low illumination conditions. In contrast, under high-illumination conditions, the color and texture of vehicles are not significantly different in thermal infrared (TIR) images. To improve the accuracy of vehicle detection under various illumination conditions, we propose an adaptive multimodal feature fusion and cross-modal vehicle index (AFFCM) model for vehicle detection. Based on the single-stage object detection model, AFFCM uses RGB and TIR images. It comprises three parts: 1) the softpooling channel attention (SCA) mechanism calculates the cross-modal feature weights of the RGB and TIR features using a fully connected layer during global weighted pooling; 2) we design a multimodal adaptive feature fusion (MAFF) module based on the cross-modal feature weights derived from the SCA mechanism; the MAFF selects features with high weight, compresses redundant features with low weight, and performs adaptive fusion using a multiscale feature pyramid; and 3) a cross-modal vehicle index is established to extract the target area, suppress complex background information, and minimize false alarms in vehicle detection. The mean average precision (mAP) on the Drone Vehicle dataset is 14.44% and 5.02% higher than that obtained using only RGB or TIR images. The mAP is 2.63% higher than that of state-of-the-art methods that utilize RGB and TIR images.https://ieeexplore.ieee.org/document/10179923/Adaptive feature fusionaerial imageschannel attentioncross-modal vehicle indexvehicle detection
spellingShingle	Yuanfeng Wu Xinran Guan Boya Zhao Li Ni Min Huang Vehicle Detection Based on Adaptive Multimodal Feature Fusion and Cross-Modal Vehicle Index Using RGB-T Images IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Adaptive feature fusion aerial images channel attention cross-modal vehicle index vehicle detection
title	Vehicle Detection Based on Adaptive Multimodal Feature Fusion and Cross-Modal Vehicle Index Using RGB-T Images
title_full	Vehicle Detection Based on Adaptive Multimodal Feature Fusion and Cross-Modal Vehicle Index Using RGB-T Images
title_fullStr	Vehicle Detection Based on Adaptive Multimodal Feature Fusion and Cross-Modal Vehicle Index Using RGB-T Images
title_full_unstemmed	Vehicle Detection Based on Adaptive Multimodal Feature Fusion and Cross-Modal Vehicle Index Using RGB-T Images
title_short	Vehicle Detection Based on Adaptive Multimodal Feature Fusion and Cross-Modal Vehicle Index Using RGB-T Images
title_sort	vehicle detection based on adaptive multimodal feature fusion and cross modal vehicle index using rgb t images
topic	Adaptive feature fusion aerial images channel attention cross-modal vehicle index vehicle detection
url	https://ieeexplore.ieee.org/document/10179923/
work_keys_str_mv	AT yuanfengwu vehicledetectionbasedonadaptivemultimodalfeaturefusionandcrossmodalvehicleindexusingrgbtimages AT xinranguan vehicledetectionbasedonadaptivemultimodalfeaturefusionandcrossmodalvehicleindexusingrgbtimages AT boyazhao vehicledetectionbasedonadaptivemultimodalfeaturefusionandcrossmodalvehicleindexusingrgbtimages AT lini vehicledetectionbasedonadaptivemultimodalfeaturefusionandcrossmodalvehicleindexusingrgbtimages AT minhuang vehicledetectionbasedonadaptivemultimodalfeaturefusionandcrossmodalvehicleindexusingrgbtimages

Vehicle Detection Based on Adaptive Multimodal Feature Fusion and Cross-Modal Vehicle Index Using RGB-T Images

Similar Items