Toward reliable fusion object detection based on dilated pyramid and semantic attention

Abstract Object detection on fused images of visible and infrared modals is of great importance for many applications, for example, surveillance and rescue at low‐light conditions. However, current detectors have difficulty for robust fused image detection for mainly two reasons. First, objects are...

Full description

Bibliographic Details
Main Authors: Rong Chang, Shan Gao, Hao Li, Shan Zhao, Yang Yang
Format: Article
Language:English
Published: Wiley 2024-02-01
Series:Engineering Reports
Subjects:
Online Access:https://doi.org/10.1002/eng2.12714
_version_ 1797335310263123968
author Rong Chang
Shan Gao
Hao Li
Shan Zhao
Yang Yang
author_facet Rong Chang
Shan Gao
Hao Li
Shan Zhao
Yang Yang
author_sort Rong Chang
collection DOAJ
description Abstract Object detection on fused images of visible and infrared modals is of great importance for many applications, for example, surveillance and rescue at low‐light conditions. However, current detectors have difficulty for robust fused image detection for mainly two reasons. First, objects are presented in various shapes and sizes, making some hard samples cannot be localized accurately. Second, the same object category in the fused images will have different appearance due to changing weather condition, temperature and intrinsic heat. Such a contradiction will degrade the classification task of a detection network, since it cannot merge commonalities and distinguish differences well. In this paper, we propose to reconstruct the detection pipeline of current detectors, and enhance the detection ability on difficult samples in fused images. Specifically, a Dilation Pyramid Network (DPN) is designed at the lateral connection to generate and aggregate features of various receptive field, without increasing pyramid layers. To strengthen the classification, a Semantic Category Attention Module (SCAM) is proposed to capture attention centers of semantics in fused images, rather than object centers. Abundant experiments on two fusion datasets show that the proposed method achieves a satisfying performance, and both modules can greatly improve current generic detectors on fused images.
first_indexed 2024-03-08T08:35:07Z
format Article
id doaj.art-3a2160e058b44468958d047617efe70a
institution Directory Open Access Journal
issn 2577-8196
language English
last_indexed 2024-03-08T08:35:07Z
publishDate 2024-02-01
publisher Wiley
record_format Article
series Engineering Reports
spelling doaj.art-3a2160e058b44468958d047617efe70a2024-02-02T01:25:53ZengWileyEngineering Reports2577-81962024-02-0162n/an/a10.1002/eng2.12714Toward reliable fusion object detection based on dilated pyramid and semantic attentionRong Chang0Shan Gao1Hao Li2Shan Zhao3Yang Yang4Yuxi Power Supply Bureau Yunnan Power Grid Co., LTD of Kunming Yunnan ChinaGuangzhou Jianruan Technology Co., LTD Guangzhou ChinaSchool of Information Science and Technology Yunnan Normal University Yunnan ChinaSchool of Information Science and Technology Yunnan Normal University Yunnan ChinaSchool of Information Science and Technology Yunnan Normal University Yunnan ChinaAbstract Object detection on fused images of visible and infrared modals is of great importance for many applications, for example, surveillance and rescue at low‐light conditions. However, current detectors have difficulty for robust fused image detection for mainly two reasons. First, objects are presented in various shapes and sizes, making some hard samples cannot be localized accurately. Second, the same object category in the fused images will have different appearance due to changing weather condition, temperature and intrinsic heat. Such a contradiction will degrade the classification task of a detection network, since it cannot merge commonalities and distinguish differences well. In this paper, we propose to reconstruct the detection pipeline of current detectors, and enhance the detection ability on difficult samples in fused images. Specifically, a Dilation Pyramid Network (DPN) is designed at the lateral connection to generate and aggregate features of various receptive field, without increasing pyramid layers. To strengthen the classification, a Semantic Category Attention Module (SCAM) is proposed to capture attention centers of semantics in fused images, rather than object centers. Abundant experiments on two fusion datasets show that the proposed method achieves a satisfying performance, and both modules can greatly improve current generic detectors on fused images.https://doi.org/10.1002/eng2.12714attention mechanismfused imageobject detection
spellingShingle Rong Chang
Shan Gao
Hao Li
Shan Zhao
Yang Yang
Toward reliable fusion object detection based on dilated pyramid and semantic attention
Engineering Reports
attention mechanism
fused image
object detection
title Toward reliable fusion object detection based on dilated pyramid and semantic attention
title_full Toward reliable fusion object detection based on dilated pyramid and semantic attention
title_fullStr Toward reliable fusion object detection based on dilated pyramid and semantic attention
title_full_unstemmed Toward reliable fusion object detection based on dilated pyramid and semantic attention
title_short Toward reliable fusion object detection based on dilated pyramid and semantic attention
title_sort toward reliable fusion object detection based on dilated pyramid and semantic attention
topic attention mechanism
fused image
object detection
url https://doi.org/10.1002/eng2.12714
work_keys_str_mv AT rongchang towardreliablefusionobjectdetectionbasedondilatedpyramidandsemanticattention
AT shangao towardreliablefusionobjectdetectionbasedondilatedpyramidandsemanticattention
AT haoli towardreliablefusionobjectdetectionbasedondilatedpyramidandsemanticattention
AT shanzhao towardreliablefusionobjectdetectionbasedondilatedpyramidandsemanticattention
AT yangyang towardreliablefusionobjectdetectionbasedondilatedpyramidandsemanticattention