A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation

The increasing number and complex content of aerial images have made some recent methods based on deep learning not fit well with different aerial image processing tasks. The coarse-grained feature representation proposed by these methods is not discriminative enough. Besides, the confounding factor...

Full description

Bibliographic Details
Main Authors: Wei Xiong, Zhenyu Xiong, Yaqi Cui
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9817622/
_version_ 1828330357163819008
author Wei Xiong
Zhenyu Xiong
Yaqi Cui
author_facet Wei Xiong
Zhenyu Xiong
Yaqi Cui
author_sort Wei Xiong
collection DOAJ
description The increasing number and complex content of aerial images have made some recent methods based on deep learning not fit well with different aerial image processing tasks. The coarse-grained feature representation proposed by these methods is not discriminative enough. Besides, the confounding factors in the datasets and long-tailed distribution of the training data will lead to biased and spurious associations among the objects of aerial images. This study proposes a confounder-free fusion network (CFF-NET) to address the challenges. Global and local feature extraction branches are designed to capture comprehensive and fine-grained deep features from the whole image. Specifically, to extract the discriminative local feature and explore the contextual information across different regions, the models based on gated recurrent units are constructed to extract features of the image region and output the important weight of each region. Furthermore, the confounder-free object feature extraction branch is proposed to generate reasonable visual attention and provide more multigrained image information. It also eliminates the spurious and biased visual relationships of the image on the object level. Finally, the output of the three branches is combined to obtain the fusion feature representation. Extensive experiments are conducted on the three popular aerial image processing tasks: 1) image classification, 2) image retrieval, and 3) image captioning. It is found that the proposed CFF-NET achieves reasonable and state-of-the-art results, including high-level tasks such as aerial image captioning.
first_indexed 2024-04-13T20:38:20Z
format Article
id doaj.art-aa8c80450eb5481ab25e6fbc283b7a35
institution Directory Open Access Journal
issn 2151-1535
language English
last_indexed 2024-04-13T20:38:20Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling doaj.art-aa8c80450eb5481ab25e6fbc283b7a352022-12-22T02:30:58ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing2151-15352022-01-01155440545410.1109/JSTARS.2022.31890529817622A Confounder-Free Fusion Network for Aerial Image Scene Feature RepresentationWei Xiong0Zhenyu Xiong1https://orcid.org/0000-0002-1277-7875Yaqi Cui2Research Institute of Information Fusion, Naval Aviation University, Yantai, ChinaResearch Institute of Information Fusion, Naval Aviation University, Yantai, ChinaResearch Institute of Information Fusion, Naval Aviation University, Yantai, ChinaThe increasing number and complex content of aerial images have made some recent methods based on deep learning not fit well with different aerial image processing tasks. The coarse-grained feature representation proposed by these methods is not discriminative enough. Besides, the confounding factors in the datasets and long-tailed distribution of the training data will lead to biased and spurious associations among the objects of aerial images. This study proposes a confounder-free fusion network (CFF-NET) to address the challenges. Global and local feature extraction branches are designed to capture comprehensive and fine-grained deep features from the whole image. Specifically, to extract the discriminative local feature and explore the contextual information across different regions, the models based on gated recurrent units are constructed to extract features of the image region and output the important weight of each region. Furthermore, the confounder-free object feature extraction branch is proposed to generate reasonable visual attention and provide more multigrained image information. It also eliminates the spurious and biased visual relationships of the image on the object level. Finally, the output of the three branches is combined to obtain the fusion feature representation. Extensive experiments are conducted on the three popular aerial image processing tasks: 1) image classification, 2) image retrieval, and 3) image captioning. It is found that the proposed CFF-NET achieves reasonable and state-of-the-art results, including high-level tasks such as aerial image captioning.https://ieeexplore.ieee.org/document/9817622/Aerial image processingcausal inferencefeature representationvisual attention
spellingShingle Wei Xiong
Zhenyu Xiong
Yaqi Cui
A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Aerial image processing
causal inference
feature representation
visual attention
title A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation
title_full A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation
title_fullStr A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation
title_full_unstemmed A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation
title_short A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation
title_sort confounder free fusion network for aerial image scene feature representation
topic Aerial image processing
causal inference
feature representation
visual attention
url https://ieeexplore.ieee.org/document/9817622/
work_keys_str_mv AT weixiong aconfounderfreefusionnetworkforaerialimagescenefeaturerepresentation
AT zhenyuxiong aconfounderfreefusionnetworkforaerialimagescenefeaturerepresentation
AT yaqicui aconfounderfreefusionnetworkforaerialimagescenefeaturerepresentation
AT weixiong confounderfreefusionnetworkforaerialimagescenefeaturerepresentation
AT zhenyuxiong confounderfreefusionnetworkforaerialimagescenefeaturerepresentation
AT yaqicui confounderfreefusionnetworkforaerialimagescenefeaturerepresentation