Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction

Automatically extracting 2D buildings from high-resolution remote sensing images is among the most popular research directions in the area of remote sensing information extraction. Semantic segmentation based on a CNN or transformer has greatly improved building extraction accuracy. A CNN is good at...

Full description

Bibliographic Details
Main Authors: Liegang Xia, Shulin Mi, Junxia Zhang, Jiancheng Luo, Zhanfeng Shen, Yubin Cheng
Format: Article
Language:English
Published: MDPI AG 2023-05-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/15/10/2689
_version_ 1797598489222316032
author Liegang Xia
Shulin Mi
Junxia Zhang
Jiancheng Luo
Zhanfeng Shen
Yubin Cheng
author_facet Liegang Xia
Shulin Mi
Junxia Zhang
Jiancheng Luo
Zhanfeng Shen
Yubin Cheng
author_sort Liegang Xia
collection DOAJ
description Automatically extracting 2D buildings from high-resolution remote sensing images is among the most popular research directions in the area of remote sensing information extraction. Semantic segmentation based on a CNN or transformer has greatly improved building extraction accuracy. A CNN is good at local feature extraction, but its ability to acquire global features is poor, which can lead to incorrect and missed detection of buildings. The advantage of transformer models lies in their global receptive field, but they do not perform well in extracting local features, resulting in poor local detail for building extraction. We propose a CNN-based and transformer-based dual-stream feature extraction network (DSFENet) in this paper, for accurate building extraction. In the encoder, convolution extracts the local features for buildings, and the transformer realizes the global representation of the buildings. The effective combination of local and global features greatly enhances the network’s feature extraction ability. We validated the capability of DSFENet on the Google Image dataset and the ISPRS Vaihingen dataset. DSEFNet achieved the best accuracy performance compared to other state-of-the-art models.
first_indexed 2024-03-11T03:21:49Z
format Article
id doaj.art-d0005de06b204a6eacaeeff4729b4d4b
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-11T03:21:49Z
publishDate 2023-05-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-d0005de06b204a6eacaeeff4729b4d4b2023-11-18T03:08:47ZengMDPI AGRemote Sensing2072-42922023-05-011510268910.3390/rs15102689Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building ExtractionLiegang Xia0Shulin Mi1Junxia Zhang2Jiancheng Luo3Zhanfeng Shen4Yubin Cheng5College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, ChinaCollege of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, ChinaCollege of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, ChinaInstitute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100875, ChinaInstitute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100875, ChinaCollege of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, ChinaAutomatically extracting 2D buildings from high-resolution remote sensing images is among the most popular research directions in the area of remote sensing information extraction. Semantic segmentation based on a CNN or transformer has greatly improved building extraction accuracy. A CNN is good at local feature extraction, but its ability to acquire global features is poor, which can lead to incorrect and missed detection of buildings. The advantage of transformer models lies in their global receptive field, but they do not perform well in extracting local features, resulting in poor local detail for building extraction. We propose a CNN-based and transformer-based dual-stream feature extraction network (DSFENet) in this paper, for accurate building extraction. In the encoder, convolution extracts the local features for buildings, and the transformer realizes the global representation of the buildings. The effective combination of local and global features greatly enhances the network’s feature extraction ability. We validated the capability of DSFENet on the Google Image dataset and the ISPRS Vaihingen dataset. DSEFNet achieved the best accuracy performance compared to other state-of-the-art models.https://www.mdpi.com/2072-4292/15/10/2689building extractionconvolutional neural network (CNN)high-resolution remote sensing imagetransformersemantic segmentation
spellingShingle Liegang Xia
Shulin Mi
Junxia Zhang
Jiancheng Luo
Zhanfeng Shen
Yubin Cheng
Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction
Remote Sensing
building extraction
convolutional neural network (CNN)
high-resolution remote sensing image
transformer
semantic segmentation
title Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction
title_full Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction
title_fullStr Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction
title_full_unstemmed Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction
title_short Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction
title_sort dual stream feature extraction network based on cnn and transformer for building extraction
topic building extraction
convolutional neural network (CNN)
high-resolution remote sensing image
transformer
semantic segmentation
url https://www.mdpi.com/2072-4292/15/10/2689
work_keys_str_mv AT liegangxia dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction
AT shulinmi dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction
AT junxiazhang dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction
AT jianchengluo dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction
AT zhanfengshen dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction
AT yubincheng dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction