Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction

Automatically extracting 2D buildings from high-resolution remote sensing images is among the most popular research directions in the area of remote sensing information extraction. Semantic segmentation based on a CNN or transformer has greatly improved building extraction accuracy. A CNN is good at...

Full description

Bibliographic Details
Main Authors:	Liegang Xia, Shulin Mi, Junxia Zhang, Jiancheng Luo, Zhanfeng Shen, Yubin Cheng
Format:	Article
Language:	English
Published:	MDPI AG 2023-05-01
Series:	Remote Sensing
Subjects:	building extraction convolutional neural network (CNN) high-resolution remote sensing image transformer semantic segmentation
Online Access:	https://www.mdpi.com/2072-4292/15/10/2689

_version_	1797598489222316032
author	Liegang Xia Shulin Mi Junxia Zhang Jiancheng Luo Zhanfeng Shen Yubin Cheng
author_facet	Liegang Xia Shulin Mi Junxia Zhang Jiancheng Luo Zhanfeng Shen Yubin Cheng
author_sort	Liegang Xia
collection	DOAJ
description	Automatically extracting 2D buildings from high-resolution remote sensing images is among the most popular research directions in the area of remote sensing information extraction. Semantic segmentation based on a CNN or transformer has greatly improved building extraction accuracy. A CNN is good at local feature extraction, but its ability to acquire global features is poor, which can lead to incorrect and missed detection of buildings. The advantage of transformer models lies in their global receptive field, but they do not perform well in extracting local features, resulting in poor local detail for building extraction. We propose a CNN-based and transformer-based dual-stream feature extraction network (DSFENet) in this paper, for accurate building extraction. In the encoder, convolution extracts the local features for buildings, and the transformer realizes the global representation of the buildings. The effective combination of local and global features greatly enhances the network’s feature extraction ability. We validated the capability of DSFENet on the Google Image dataset and the ISPRS Vaihingen dataset. DSEFNet achieved the best accuracy performance compared to other state-of-the-art models.
first_indexed	2024-03-11T03:21:49Z
format	Article
id	doaj.art-d0005de06b204a6eacaeeff4729b4d4b
institution	Directory Open Access Journal
issn	2072-4292
language	English
last_indexed	2024-03-11T03:21:49Z
publishDate	2023-05-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj.art-d0005de06b204a6eacaeeff4729b4d4b2023-11-18T03:08:47ZengMDPI AGRemote Sensing2072-42922023-05-011510268910.3390/rs15102689Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building ExtractionLiegang Xia0Shulin Mi1Junxia Zhang2Jiancheng Luo3Zhanfeng Shen4Yubin Cheng5College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, ChinaCollege of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, ChinaCollege of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, ChinaInstitute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100875, ChinaInstitute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100875, ChinaCollege of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, ChinaAutomatically extracting 2D buildings from high-resolution remote sensing images is among the most popular research directions in the area of remote sensing information extraction. Semantic segmentation based on a CNN or transformer has greatly improved building extraction accuracy. A CNN is good at local feature extraction, but its ability to acquire global features is poor, which can lead to incorrect and missed detection of buildings. The advantage of transformer models lies in their global receptive field, but they do not perform well in extracting local features, resulting in poor local detail for building extraction. We propose a CNN-based and transformer-based dual-stream feature extraction network (DSFENet) in this paper, for accurate building extraction. In the encoder, convolution extracts the local features for buildings, and the transformer realizes the global representation of the buildings. The effective combination of local and global features greatly enhances the network’s feature extraction ability. We validated the capability of DSFENet on the Google Image dataset and the ISPRS Vaihingen dataset. DSEFNet achieved the best accuracy performance compared to other state-of-the-art models.https://www.mdpi.com/2072-4292/15/10/2689building extractionconvolutional neural network (CNN)high-resolution remote sensing imagetransformersemantic segmentation
spellingShingle	Liegang Xia Shulin Mi Junxia Zhang Jiancheng Luo Zhanfeng Shen Yubin Cheng Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction Remote Sensing building extraction convolutional neural network (CNN) high-resolution remote sensing image transformer semantic segmentation
title	Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction
title_full	Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction
title_fullStr	Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction
title_full_unstemmed	Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction
title_short	Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction
title_sort	dual stream feature extraction network based on cnn and transformer for building extraction
topic	building extraction convolutional neural network (CNN) high-resolution remote sensing image transformer semantic segmentation
url	https://www.mdpi.com/2072-4292/15/10/2689
work_keys_str_mv	AT liegangxia dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction AT shulinmi dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction AT junxiazhang dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction AT jianchengluo dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction AT zhanfengshen dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction AT yubincheng dualstreamfeatureextractionnetworkbasedoncnnandtransformerforbuildingextraction

Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction

Similar Items