A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation
The increasing number and complex content of aerial images have made some recent methods based on deep learning not fit well with different aerial image processing tasks. The coarse-grained feature representation proposed by these methods is not discriminative enough. Besides, the confounding factor...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9817622/ |
_version_ | 1828330357163819008 |
---|---|
author | Wei Xiong Zhenyu Xiong Yaqi Cui |
author_facet | Wei Xiong Zhenyu Xiong Yaqi Cui |
author_sort | Wei Xiong |
collection | DOAJ |
description | The increasing number and complex content of aerial images have made some recent methods based on deep learning not fit well with different aerial image processing tasks. The coarse-grained feature representation proposed by these methods is not discriminative enough. Besides, the confounding factors in the datasets and long-tailed distribution of the training data will lead to biased and spurious associations among the objects of aerial images. This study proposes a confounder-free fusion network (CFF-NET) to address the challenges. Global and local feature extraction branches are designed to capture comprehensive and fine-grained deep features from the whole image. Specifically, to extract the discriminative local feature and explore the contextual information across different regions, the models based on gated recurrent units are constructed to extract features of the image region and output the important weight of each region. Furthermore, the confounder-free object feature extraction branch is proposed to generate reasonable visual attention and provide more multigrained image information. It also eliminates the spurious and biased visual relationships of the image on the object level. Finally, the output of the three branches is combined to obtain the fusion feature representation. Extensive experiments are conducted on the three popular aerial image processing tasks: 1) image classification, 2) image retrieval, and 3) image captioning. It is found that the proposed CFF-NET achieves reasonable and state-of-the-art results, including high-level tasks such as aerial image captioning. |
first_indexed | 2024-04-13T20:38:20Z |
format | Article |
id | doaj.art-aa8c80450eb5481ab25e6fbc283b7a35 |
institution | Directory Open Access Journal |
issn | 2151-1535 |
language | English |
last_indexed | 2024-04-13T20:38:20Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
spelling | doaj.art-aa8c80450eb5481ab25e6fbc283b7a352022-12-22T02:30:58ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing2151-15352022-01-01155440545410.1109/JSTARS.2022.31890529817622A Confounder-Free Fusion Network for Aerial Image Scene Feature RepresentationWei Xiong0Zhenyu Xiong1https://orcid.org/0000-0002-1277-7875Yaqi Cui2Research Institute of Information Fusion, Naval Aviation University, Yantai, ChinaResearch Institute of Information Fusion, Naval Aviation University, Yantai, ChinaResearch Institute of Information Fusion, Naval Aviation University, Yantai, ChinaThe increasing number and complex content of aerial images have made some recent methods based on deep learning not fit well with different aerial image processing tasks. The coarse-grained feature representation proposed by these methods is not discriminative enough. Besides, the confounding factors in the datasets and long-tailed distribution of the training data will lead to biased and spurious associations among the objects of aerial images. This study proposes a confounder-free fusion network (CFF-NET) to address the challenges. Global and local feature extraction branches are designed to capture comprehensive and fine-grained deep features from the whole image. Specifically, to extract the discriminative local feature and explore the contextual information across different regions, the models based on gated recurrent units are constructed to extract features of the image region and output the important weight of each region. Furthermore, the confounder-free object feature extraction branch is proposed to generate reasonable visual attention and provide more multigrained image information. It also eliminates the spurious and biased visual relationships of the image on the object level. Finally, the output of the three branches is combined to obtain the fusion feature representation. Extensive experiments are conducted on the three popular aerial image processing tasks: 1) image classification, 2) image retrieval, and 3) image captioning. It is found that the proposed CFF-NET achieves reasonable and state-of-the-art results, including high-level tasks such as aerial image captioning.https://ieeexplore.ieee.org/document/9817622/Aerial image processingcausal inferencefeature representationvisual attention |
spellingShingle | Wei Xiong Zhenyu Xiong Yaqi Cui A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Aerial image processing causal inference feature representation visual attention |
title | A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation |
title_full | A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation |
title_fullStr | A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation |
title_full_unstemmed | A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation |
title_short | A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation |
title_sort | confounder free fusion network for aerial image scene feature representation |
topic | Aerial image processing causal inference feature representation visual attention |
url | https://ieeexplore.ieee.org/document/9817622/ |
work_keys_str_mv | AT weixiong aconfounderfreefusionnetworkforaerialimagescenefeaturerepresentation AT zhenyuxiong aconfounderfreefusionnetworkforaerialimagescenefeaturerepresentation AT yaqicui aconfounderfreefusionnetworkforaerialimagescenefeaturerepresentation AT weixiong confounderfreefusionnetworkforaerialimagescenefeaturerepresentation AT zhenyuxiong confounderfreefusionnetworkforaerialimagescenefeaturerepresentation AT yaqicui confounderfreefusionnetworkforaerialimagescenefeaturerepresentation |