Toward reliable fusion object detection based on dilated pyramid and semantic attention

Abstract Object detection on fused images of visible and infrared modals is of great importance for many applications, for example, surveillance and rescue at low‐light conditions. However, current detectors have difficulty for robust fused image detection for mainly two reasons. First, objects are...

Full description

Bibliographic Details
Main Authors:	Rong Chang, Shan Gao, Hao Li, Shan Zhao, Yang Yang
Format:	Article
Language:	English
Published:	Wiley 2024-02-01
Series:	Engineering Reports
Subjects:	attention mechanism fused image object detection
Online Access:	https://doi.org/10.1002/eng2.12714

_version_	1797335310263123968
author	Rong Chang Shan Gao Hao Li Shan Zhao Yang Yang
author_facet	Rong Chang Shan Gao Hao Li Shan Zhao Yang Yang
author_sort	Rong Chang
collection	DOAJ
description	Abstract Object detection on fused images of visible and infrared modals is of great importance for many applications, for example, surveillance and rescue at low‐light conditions. However, current detectors have difficulty for robust fused image detection for mainly two reasons. First, objects are presented in various shapes and sizes, making some hard samples cannot be localized accurately. Second, the same object category in the fused images will have different appearance due to changing weather condition, temperature and intrinsic heat. Such a contradiction will degrade the classification task of a detection network, since it cannot merge commonalities and distinguish differences well. In this paper, we propose to reconstruct the detection pipeline of current detectors, and enhance the detection ability on difficult samples in fused images. Specifically, a Dilation Pyramid Network (DPN) is designed at the lateral connection to generate and aggregate features of various receptive field, without increasing pyramid layers. To strengthen the classification, a Semantic Category Attention Module (SCAM) is proposed to capture attention centers of semantics in fused images, rather than object centers. Abundant experiments on two fusion datasets show that the proposed method achieves a satisfying performance, and both modules can greatly improve current generic detectors on fused images.
first_indexed	2024-03-08T08:35:07Z
format	Article
id	doaj.art-3a2160e058b44468958d047617efe70a
institution	Directory Open Access Journal
issn	2577-8196
language	English
last_indexed	2024-03-08T08:35:07Z
publishDate	2024-02-01
publisher	Wiley
record_format	Article
series	Engineering Reports
spelling	doaj.art-3a2160e058b44468958d047617efe70a2024-02-02T01:25:53ZengWileyEngineering Reports2577-81962024-02-0162n/an/a10.1002/eng2.12714Toward reliable fusion object detection based on dilated pyramid and semantic attentionRong Chang0Shan Gao1Hao Li2Shan Zhao3Yang Yang4Yuxi Power Supply Bureau Yunnan Power Grid Co., LTD of Kunming Yunnan ChinaGuangzhou Jianruan Technology Co., LTD Guangzhou ChinaSchool of Information Science and Technology Yunnan Normal University Yunnan ChinaSchool of Information Science and Technology Yunnan Normal University Yunnan ChinaSchool of Information Science and Technology Yunnan Normal University Yunnan ChinaAbstract Object detection on fused images of visible and infrared modals is of great importance for many applications, for example, surveillance and rescue at low‐light conditions. However, current detectors have difficulty for robust fused image detection for mainly two reasons. First, objects are presented in various shapes and sizes, making some hard samples cannot be localized accurately. Second, the same object category in the fused images will have different appearance due to changing weather condition, temperature and intrinsic heat. Such a contradiction will degrade the classification task of a detection network, since it cannot merge commonalities and distinguish differences well. In this paper, we propose to reconstruct the detection pipeline of current detectors, and enhance the detection ability on difficult samples in fused images. Specifically, a Dilation Pyramid Network (DPN) is designed at the lateral connection to generate and aggregate features of various receptive field, without increasing pyramid layers. To strengthen the classification, a Semantic Category Attention Module (SCAM) is proposed to capture attention centers of semantics in fused images, rather than object centers. Abundant experiments on two fusion datasets show that the proposed method achieves a satisfying performance, and both modules can greatly improve current generic detectors on fused images.https://doi.org/10.1002/eng2.12714attention mechanismfused imageobject detection
spellingShingle	Rong Chang Shan Gao Hao Li Shan Zhao Yang Yang Toward reliable fusion object detection based on dilated pyramid and semantic attention Engineering Reports attention mechanism fused image object detection
title	Toward reliable fusion object detection based on dilated pyramid and semantic attention
title_full	Toward reliable fusion object detection based on dilated pyramid and semantic attention
title_fullStr	Toward reliable fusion object detection based on dilated pyramid and semantic attention
title_full_unstemmed	Toward reliable fusion object detection based on dilated pyramid and semantic attention
title_short	Toward reliable fusion object detection based on dilated pyramid and semantic attention
title_sort	toward reliable fusion object detection based on dilated pyramid and semantic attention
topic	attention mechanism fused image object detection
url	https://doi.org/10.1002/eng2.12714
work_keys_str_mv	AT rongchang towardreliablefusionobjectdetectionbasedondilatedpyramidandsemanticattention AT shangao towardreliablefusionobjectdetectionbasedondilatedpyramidandsemanticattention AT haoli towardreliablefusionobjectdetectionbasedondilatedpyramidandsemanticattention AT shanzhao towardreliablefusionobjectdetectionbasedondilatedpyramidandsemanticattention AT yangyang towardreliablefusionobjectdetectionbasedondilatedpyramidandsemanticattention

Toward reliable fusion object detection based on dilated pyramid and semantic attention

Similar Items