A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation

The increasing number and complex content of aerial images have made some recent methods based on deep learning not fit well with different aerial image processing tasks. The coarse-grained feature representation proposed by these methods is not discriminative enough. Besides, the confounding factor...

Full description

Bibliographic Details
Main Authors:	Wei Xiong, Zhenyu Xiong, Yaqi Cui
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:	Aerial image processing causal inference feature representation visual attention
Online Access:	https://ieeexplore.ieee.org/document/9817622/

_version_	1828330357163819008
author	Wei Xiong Zhenyu Xiong Yaqi Cui
author_facet	Wei Xiong Zhenyu Xiong Yaqi Cui
author_sort	Wei Xiong
collection	DOAJ
description	The increasing number and complex content of aerial images have made some recent methods based on deep learning not fit well with different aerial image processing tasks. The coarse-grained feature representation proposed by these methods is not discriminative enough. Besides, the confounding factors in the datasets and long-tailed distribution of the training data will lead to biased and spurious associations among the objects of aerial images. This study proposes a confounder-free fusion network (CFF-NET) to address the challenges. Global and local feature extraction branches are designed to capture comprehensive and fine-grained deep features from the whole image. Specifically, to extract the discriminative local feature and explore the contextual information across different regions, the models based on gated recurrent units are constructed to extract features of the image region and output the important weight of each region. Furthermore, the confounder-free object feature extraction branch is proposed to generate reasonable visual attention and provide more multigrained image information. It also eliminates the spurious and biased visual relationships of the image on the object level. Finally, the output of the three branches is combined to obtain the fusion feature representation. Extensive experiments are conducted on the three popular aerial image processing tasks: 1) image classification, 2) image retrieval, and 3) image captioning. It is found that the proposed CFF-NET achieves reasonable and state-of-the-art results, including high-level tasks such as aerial image captioning.
first_indexed	2024-04-13T20:38:20Z
format	Article
id	doaj.art-aa8c80450eb5481ab25e6fbc283b7a35
institution	Directory Open Access Journal
issn	2151-1535
language	English
last_indexed	2024-04-13T20:38:20Z
publishDate	2022-01-01
publisher	IEEE
record_format	Article
series	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling	doaj.art-aa8c80450eb5481ab25e6fbc283b7a352022-12-22T02:30:58ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing2151-15352022-01-01155440545410.1109/JSTARS.2022.31890529817622A Confounder-Free Fusion Network for Aerial Image Scene Feature RepresentationWei Xiong0Zhenyu Xiong1https://orcid.org/0000-0002-1277-7875Yaqi Cui2Research Institute of Information Fusion, Naval Aviation University, Yantai, ChinaResearch Institute of Information Fusion, Naval Aviation University, Yantai, ChinaResearch Institute of Information Fusion, Naval Aviation University, Yantai, ChinaThe increasing number and complex content of aerial images have made some recent methods based on deep learning not fit well with different aerial image processing tasks. The coarse-grained feature representation proposed by these methods is not discriminative enough. Besides, the confounding factors in the datasets and long-tailed distribution of the training data will lead to biased and spurious associations among the objects of aerial images. This study proposes a confounder-free fusion network (CFF-NET) to address the challenges. Global and local feature extraction branches are designed to capture comprehensive and fine-grained deep features from the whole image. Specifically, to extract the discriminative local feature and explore the contextual information across different regions, the models based on gated recurrent units are constructed to extract features of the image region and output the important weight of each region. Furthermore, the confounder-free object feature extraction branch is proposed to generate reasonable visual attention and provide more multigrained image information. It also eliminates the spurious and biased visual relationships of the image on the object level. Finally, the output of the three branches is combined to obtain the fusion feature representation. Extensive experiments are conducted on the three popular aerial image processing tasks: 1) image classification, 2) image retrieval, and 3) image captioning. It is found that the proposed CFF-NET achieves reasonable and state-of-the-art results, including high-level tasks such as aerial image captioning.https://ieeexplore.ieee.org/document/9817622/Aerial image processingcausal inferencefeature representationvisual attention
spellingShingle	Wei Xiong Zhenyu Xiong Yaqi Cui A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Aerial image processing causal inference feature representation visual attention
title	A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation
title_full	A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation
title_fullStr	A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation
title_full_unstemmed	A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation
title_short	A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation
title_sort	confounder free fusion network for aerial image scene feature representation
topic	Aerial image processing causal inference feature representation visual attention
url	https://ieeexplore.ieee.org/document/9817622/
work_keys_str_mv	AT weixiong aconfounderfreefusionnetworkforaerialimagescenefeaturerepresentation AT zhenyuxiong aconfounderfreefusionnetworkforaerialimagescenefeaturerepresentation AT yaqicui aconfounderfreefusionnetworkforaerialimagescenefeaturerepresentation AT weixiong confounderfreefusionnetworkforaerialimagescenefeaturerepresentation AT zhenyuxiong confounderfreefusionnetworkforaerialimagescenefeaturerepresentation AT yaqicui confounderfreefusionnetworkforaerialimagescenefeaturerepresentation

A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation

Similar Items