Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network

Building extraction from high spatial resolution remote sensing images is a hot spot in the field of remote sensing applications and computer vision. This paper presents a semantic segmentation model, which is a supervised method, named Pyramid Self-Attention Network (PISANet). Its structure is simp...

Full description

Bibliographic Details
Main Authors: Dengji Zhou, Guizhou Wang, Guojin He, Tengfei Long, Ranyu Yin, Zhaoming Zhang, Sibao Chen, Bin Luo
Format: Article
Language:English
Published: MDPI AG 2020-12-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/20/24/7241
_version_ 1797544346945323008
author Dengji Zhou
Guizhou Wang
Guojin He
Tengfei Long
Ranyu Yin
Zhaoming Zhang
Sibao Chen
Bin Luo
author_facet Dengji Zhou
Guizhou Wang
Guojin He
Tengfei Long
Ranyu Yin
Zhaoming Zhang
Sibao Chen
Bin Luo
author_sort Dengji Zhou
collection DOAJ
description Building extraction from high spatial resolution remote sensing images is a hot spot in the field of remote sensing applications and computer vision. This paper presents a semantic segmentation model, which is a supervised method, named Pyramid Self-Attention Network (PISANet). Its structure is simple, because it contains only two parts: one is the backbone of the network, which is used to learn the local features (short distance context information around the pixel) of buildings from the image; the other part is the pyramid self-attention module, which is used to obtain the global features (long distance context information with other pixels in the image) and the comprehensive features (includes color, texture, geometric and high-level semantic feature) of the building. The network is an end-to-end approach. In the training stage, the input is the remote sensing image and corresponding label, and the output is probability map (the probability that each pixel is or is not building). In the prediction stage, the input is the remote sensing image, and the output is the extraction result of the building. The complexity of the network structure was reduced so that it is easy to implement. The proposed PISANet was tested on two datasets. The result shows that the overall accuracy reached 94.50 and 96.15%, the intersection-over-union reached 77.45 and 87.97%, and F1 index reached 87.27 and 93.55%, respectively. In experiments on different datasets, PISANet obtained high overall accuracy, low error rate and improved integrity of individual buildings.
first_indexed 2024-03-10T13:59:10Z
format Article
id doaj.art-79f805ed900f430b84d37d2e6d357f69
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-10T13:59:10Z
publishDate 2020-12-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-79f805ed900f430b84d37d2e6d357f692023-11-21T01:17:48ZengMDPI AGSensors1424-82202020-12-012024724110.3390/s20247241Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention NetworkDengji Zhou0Guizhou Wang1Guojin He2Tengfei Long3Ranyu Yin4Zhaoming Zhang5Sibao Chen6Bin Luo7Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaMOE Key Lab of Signal Processing and Intelligent Computing, School of Computer Science and Technology, Anhui University, Hefei 230601, ChinaMOE Key Lab of Signal Processing and Intelligent Computing, School of Computer Science and Technology, Anhui University, Hefei 230601, ChinaBuilding extraction from high spatial resolution remote sensing images is a hot spot in the field of remote sensing applications and computer vision. This paper presents a semantic segmentation model, which is a supervised method, named Pyramid Self-Attention Network (PISANet). Its structure is simple, because it contains only two parts: one is the backbone of the network, which is used to learn the local features (short distance context information around the pixel) of buildings from the image; the other part is the pyramid self-attention module, which is used to obtain the global features (long distance context information with other pixels in the image) and the comprehensive features (includes color, texture, geometric and high-level semantic feature) of the building. The network is an end-to-end approach. In the training stage, the input is the remote sensing image and corresponding label, and the output is probability map (the probability that each pixel is or is not building). In the prediction stage, the input is the remote sensing image, and the output is the extraction result of the building. The complexity of the network structure was reduced so that it is easy to implement. The proposed PISANet was tested on two datasets. The result shows that the overall accuracy reached 94.50 and 96.15%, the intersection-over-union reached 77.45 and 87.97%, and F1 index reached 87.27 and 93.55%, respectively. In experiments on different datasets, PISANet obtained high overall accuracy, low error rate and improved integrity of individual buildings.https://www.mdpi.com/1424-8220/20/24/7241building extractionhigh resolution imagesemantic segmentationdeep learning
spellingShingle Dengji Zhou
Guizhou Wang
Guojin He
Tengfei Long
Ranyu Yin
Zhaoming Zhang
Sibao Chen
Bin Luo
Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network
Sensors
building extraction
high resolution image
semantic segmentation
deep learning
title Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network
title_full Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network
title_fullStr Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network
title_full_unstemmed Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network
title_short Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network
title_sort robust building extraction for high spatial resolution remote sensing images with self attention network
topic building extraction
high resolution image
semantic segmentation
deep learning
url https://www.mdpi.com/1424-8220/20/24/7241
work_keys_str_mv AT dengjizhou robustbuildingextractionforhighspatialresolutionremotesensingimageswithselfattentionnetwork
AT guizhouwang robustbuildingextractionforhighspatialresolutionremotesensingimageswithselfattentionnetwork
AT guojinhe robustbuildingextractionforhighspatialresolutionremotesensingimageswithselfattentionnetwork
AT tengfeilong robustbuildingextractionforhighspatialresolutionremotesensingimageswithselfattentionnetwork
AT ranyuyin robustbuildingextractionforhighspatialresolutionremotesensingimageswithselfattentionnetwork
AT zhaomingzhang robustbuildingextractionforhighspatialresolutionremotesensingimageswithselfattentionnetwork
AT sibaochen robustbuildingextractionforhighspatialresolutionremotesensingimageswithselfattentionnetwork
AT binluo robustbuildingextractionforhighspatialresolutionremotesensingimageswithselfattentionnetwork