Cross-modality feature fusion for night pedestrian detection

Night pedestrian detection with visible image only suffers from the dilemma of high miss rate due to poor illumination conditions. Cross-modality fusion can ameliorate this dilemma by providing complementary information to each other through infrared and visible images. In this paper, we propose a c...

Full description

Bibliographic Details
Main Authors: Yong Feng, Enbo Luo, Hai Lu, SuWei Zhai
Format: Article
Language:English
Published: Frontiers Media S.A. 2024-03-01
Series:Frontiers in Physics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fphy.2024.1356248/full
_version_ 1797244456320106496
author Yong Feng
Enbo Luo
Hai Lu
SuWei Zhai
author_facet Yong Feng
Enbo Luo
Hai Lu
SuWei Zhai
author_sort Yong Feng
collection DOAJ
description Night pedestrian detection with visible image only suffers from the dilemma of high miss rate due to poor illumination conditions. Cross-modality fusion can ameliorate this dilemma by providing complementary information to each other through infrared and visible images. In this paper, we propose a cross-modal fusion framework based on YOLOv5, which is aimed at addressing the challenges of night pedestrian detection under low-light conditions. The framework employs a dual-stream architecture that processes visible images and infrared images separately. Through the Cross-Modal Feature Rectification Module (CMFRM), visible and infrared features are finely tuned on a granular level, leveraging their spatial correlations to focus on complementary information and substantially reduce uncertainty and noise from different modalities. Additionally, we have introduced a two-stage Feature Fusion Module (FFM), with the first stage introducing a cross-attention mechanism for cross-modal global reasoning, and the second stage using a mixed channel embedding to produce enhanced feature outputs. Moreover, our method involves multi-dimensional interaction, not only correcting feature maps in terms of channel and spatial dimensions but also applying cross-attention at the sequence processing level, which is critical for the effective generalization of cross-modal feature combinations. In summary, our research significantly enhances the accuracy and robustness of nighttime pedestrian detection, offering new perspectives and technical pathways for visual information processing in low-light environments.
first_indexed 2024-04-24T19:11:18Z
format Article
id doaj.art-fb5e8140ccc8489e94dd13213875695f
institution Directory Open Access Journal
issn 2296-424X
language English
last_indexed 2024-04-24T19:11:18Z
publishDate 2024-03-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Physics
spelling doaj.art-fb5e8140ccc8489e94dd13213875695f2024-03-26T10:51:29ZengFrontiers Media S.A.Frontiers in Physics2296-424X2024-03-011210.3389/fphy.2024.13562481356248Cross-modality feature fusion for night pedestrian detectionYong FengEnbo LuoHai LuSuWei ZhaiNight pedestrian detection with visible image only suffers from the dilemma of high miss rate due to poor illumination conditions. Cross-modality fusion can ameliorate this dilemma by providing complementary information to each other through infrared and visible images. In this paper, we propose a cross-modal fusion framework based on YOLOv5, which is aimed at addressing the challenges of night pedestrian detection under low-light conditions. The framework employs a dual-stream architecture that processes visible images and infrared images separately. Through the Cross-Modal Feature Rectification Module (CMFRM), visible and infrared features are finely tuned on a granular level, leveraging their spatial correlations to focus on complementary information and substantially reduce uncertainty and noise from different modalities. Additionally, we have introduced a two-stage Feature Fusion Module (FFM), with the first stage introducing a cross-attention mechanism for cross-modal global reasoning, and the second stage using a mixed channel embedding to produce enhanced feature outputs. Moreover, our method involves multi-dimensional interaction, not only correcting feature maps in terms of channel and spatial dimensions but also applying cross-attention at the sequence processing level, which is critical for the effective generalization of cross-modal feature combinations. In summary, our research significantly enhances the accuracy and robustness of nighttime pedestrian detection, offering new perspectives and technical pathways for visual information processing in low-light environments.https://www.frontiersin.org/articles/10.3389/fphy.2024.1356248/fullpedestrian detectionYOLOv5vision transformerCNNsfeature fusion
spellingShingle Yong Feng
Enbo Luo
Hai Lu
SuWei Zhai
Cross-modality feature fusion for night pedestrian detection
Frontiers in Physics
pedestrian detection
YOLOv5
vision transformer
CNNs
feature fusion
title Cross-modality feature fusion for night pedestrian detection
title_full Cross-modality feature fusion for night pedestrian detection
title_fullStr Cross-modality feature fusion for night pedestrian detection
title_full_unstemmed Cross-modality feature fusion for night pedestrian detection
title_short Cross-modality feature fusion for night pedestrian detection
title_sort cross modality feature fusion for night pedestrian detection
topic pedestrian detection
YOLOv5
vision transformer
CNNs
feature fusion
url https://www.frontiersin.org/articles/10.3389/fphy.2024.1356248/full
work_keys_str_mv AT yongfeng crossmodalityfeaturefusionfornightpedestriandetection
AT enboluo crossmodalityfeaturefusionfornightpedestriandetection
AT hailu crossmodalityfeaturefusionfornightpedestriandetection
AT suweizhai crossmodalityfeaturefusionfornightpedestriandetection