Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction
Automatically extracting 2D buildings from high-resolution remote sensing images is among the most popular research directions in the area of remote sensing information extraction. Semantic segmentation based on a CNN or transformer has greatly improved building extraction accuracy. A CNN is good at...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-05-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/15/10/2689 |
_version_ | 1797598489222316032 |
---|---|
author | Liegang Xia Shulin Mi Junxia Zhang Jiancheng Luo Zhanfeng Shen Yubin Cheng |
author_facet | Liegang Xia Shulin Mi Junxia Zhang Jiancheng Luo Zhanfeng Shen Yubin Cheng |
author_sort | Liegang Xia |
collection | DOAJ |
description | Automatically extracting 2D buildings from high-resolution remote sensing images is among the most popular research directions in the area of remote sensing information extraction. Semantic segmentation based on a CNN or transformer has greatly improved building extraction accuracy. A CNN is good at local feature extraction, but its ability to acquire global features is poor, which can lead to incorrect and missed detection of buildings. The advantage of transformer models lies in their global receptive field, but they do not perform well in extracting local features, resulting in poor local detail for building extraction. We propose a CNN-based and transformer-based dual-stream feature extraction network (DSFENet) in this paper, for accurate building extraction. In the encoder, convolution extracts the local features for buildings, and the transformer realizes the global representation of the buildings. The effective combination of local and global features greatly enhances the network’s feature extraction ability. We validated the capability of DSFENet on the Google Image dataset and the ISPRS Vaihingen dataset. DSEFNet achieved the best accuracy performance compared to other state-of-the-art models. |
first_indexed | 2024-03-11T03:21:49Z |
format | Article |
id | doaj.art-d0005de06b204a6eacaeeff4729b4d4b |
institution | Directory Open Access Journal |
issn | 2072-4292 |
language | English |
last_indexed | 2024-03-11T03:21:49Z |
publishDate | 2023-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj.art-d0005de06b204a6eacaeeff4729b4d4b2023-11-18T03:08:47ZengMDPI AGRemote Sensing2072-42922023-05-011510268910.3390/rs15102689Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building ExtractionLiegang Xia0Shulin Mi1Junxia Zhang2Jiancheng Luo3Zhanfeng Shen4Yubin Cheng5College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, ChinaCollege of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, ChinaCollege of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, ChinaInstitute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100875, ChinaInstitute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100875, ChinaCollege of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, ChinaAutomatically extracting 2D buildings from high-resolution remote sensing images is among the most popular research directions in the area of remote sensing information extraction. Semantic segmentation based on a CNN or transformer has greatly improved building extraction accuracy. A CNN is good at local feature extraction, but its ability to acquire global features is poor, which can lead to incorrect and missed detection of buildings. The advantage of transformer models lies in their global receptive field, but they do not perform well in extracting local features, resulting in poor local detail for building extraction. We propose a CNN-based and transformer-based dual-stream feature extraction network (DSFENet) in this paper, for accurate building extraction. In the encoder, convolution extracts the local features for buildings, and the transformer realizes the global representation of the buildings. The effective combination of local and global features greatly enhances the network’s feature extraction ability. We validated the capability of DSFENet on the Google Image dataset and the ISPRS Vaihingen dataset. DSEFNet achieved the best accuracy performance compared to other state-of-the-art models.https://www.mdpi.com/2072-4292/15/10/2689building extractionconvolutional neural network (CNN)high-resolution remote sensing imagetransformersemantic segmentation |
spellingShingle | Liegang Xia Shulin Mi Junxia Zhang Jiancheng Luo Zhanfeng Shen Yubin Cheng Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction Remote Sensing building extraction convolutional neural network (CNN) high-resolution remote sensing image transformer semantic segmentation |
title | Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction |
title_full | Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction |
title_fullStr | Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction |
title_full_unstemmed | Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction |
title_short | Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction |
title_sort | dual stream feature extraction network based on cnn and transformer for building extraction |
topic | building extraction convolutional neural network (CNN) high-resolution remote sensing image transformer semantic segmentation |
url | https://www.mdpi.com/2072-4292/15/10/2689 |
work_keys_str_mv | AT liegangxia dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction AT shulinmi dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction AT junxiazhang dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction AT jianchengluo dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction AT zhanfengshen dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction AT yubincheng dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction |