Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network
Building extraction from high spatial resolution remote sensing images is a hot spot in the field of remote sensing applications and computer vision. This paper presents a semantic segmentation model, which is a supervised method, named Pyramid Self-Attention Network (PISANet). Its structure is simp...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-12-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/20/24/7241 |
_version_ | 1797544346945323008 |
---|---|
author | Dengji Zhou Guizhou Wang Guojin He Tengfei Long Ranyu Yin Zhaoming Zhang Sibao Chen Bin Luo |
author_facet | Dengji Zhou Guizhou Wang Guojin He Tengfei Long Ranyu Yin Zhaoming Zhang Sibao Chen Bin Luo |
author_sort | Dengji Zhou |
collection | DOAJ |
description | Building extraction from high spatial resolution remote sensing images is a hot spot in the field of remote sensing applications and computer vision. This paper presents a semantic segmentation model, which is a supervised method, named Pyramid Self-Attention Network (PISANet). Its structure is simple, because it contains only two parts: one is the backbone of the network, which is used to learn the local features (short distance context information around the pixel) of buildings from the image; the other part is the pyramid self-attention module, which is used to obtain the global features (long distance context information with other pixels in the image) and the comprehensive features (includes color, texture, geometric and high-level semantic feature) of the building. The network is an end-to-end approach. In the training stage, the input is the remote sensing image and corresponding label, and the output is probability map (the probability that each pixel is or is not building). In the prediction stage, the input is the remote sensing image, and the output is the extraction result of the building. The complexity of the network structure was reduced so that it is easy to implement. The proposed PISANet was tested on two datasets. The result shows that the overall accuracy reached 94.50 and 96.15%, the intersection-over-union reached 77.45 and 87.97%, and F1 index reached 87.27 and 93.55%, respectively. In experiments on different datasets, PISANet obtained high overall accuracy, low error rate and improved integrity of individual buildings. |
first_indexed | 2024-03-10T13:59:10Z |
format | Article |
id | doaj.art-79f805ed900f430b84d37d2e6d357f69 |
institution | Directory Open Access Journal |
issn | 1424-8220 |
language | English |
last_indexed | 2024-03-10T13:59:10Z |
publishDate | 2020-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj.art-79f805ed900f430b84d37d2e6d357f692023-11-21T01:17:48ZengMDPI AGSensors1424-82202020-12-012024724110.3390/s20247241Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention NetworkDengji Zhou0Guizhou Wang1Guojin He2Tengfei Long3Ranyu Yin4Zhaoming Zhang5Sibao Chen6Bin Luo7Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaMOE Key Lab of Signal Processing and Intelligent Computing, School of Computer Science and Technology, Anhui University, Hefei 230601, ChinaMOE Key Lab of Signal Processing and Intelligent Computing, School of Computer Science and Technology, Anhui University, Hefei 230601, ChinaBuilding extraction from high spatial resolution remote sensing images is a hot spot in the field of remote sensing applications and computer vision. This paper presents a semantic segmentation model, which is a supervised method, named Pyramid Self-Attention Network (PISANet). Its structure is simple, because it contains only two parts: one is the backbone of the network, which is used to learn the local features (short distance context information around the pixel) of buildings from the image; the other part is the pyramid self-attention module, which is used to obtain the global features (long distance context information with other pixels in the image) and the comprehensive features (includes color, texture, geometric and high-level semantic feature) of the building. The network is an end-to-end approach. In the training stage, the input is the remote sensing image and corresponding label, and the output is probability map (the probability that each pixel is or is not building). In the prediction stage, the input is the remote sensing image, and the output is the extraction result of the building. The complexity of the network structure was reduced so that it is easy to implement. The proposed PISANet was tested on two datasets. The result shows that the overall accuracy reached 94.50 and 96.15%, the intersection-over-union reached 77.45 and 87.97%, and F1 index reached 87.27 and 93.55%, respectively. In experiments on different datasets, PISANet obtained high overall accuracy, low error rate and improved integrity of individual buildings.https://www.mdpi.com/1424-8220/20/24/7241building extractionhigh resolution imagesemantic segmentationdeep learning |
spellingShingle | Dengji Zhou Guizhou Wang Guojin He Tengfei Long Ranyu Yin Zhaoming Zhang Sibao Chen Bin Luo Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network Sensors building extraction high resolution image semantic segmentation deep learning |
title | Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network |
title_full | Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network |
title_fullStr | Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network |
title_full_unstemmed | Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network |
title_short | Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network |
title_sort | robust building extraction for high spatial resolution remote sensing images with self attention network |
topic | building extraction high resolution image semantic segmentation deep learning |
url | https://www.mdpi.com/1424-8220/20/24/7241 |
work_keys_str_mv | AT dengjizhou robustbuildingextractionforhighspatialresolutionremotesensingimageswithselfattentionnetwork AT guizhouwang robustbuildingextractionforhighspatialresolutionremotesensingimageswithselfattentionnetwork AT guojinhe robustbuildingextractionforhighspatialresolutionremotesensingimageswithselfattentionnetwork AT tengfeilong robustbuildingextractionforhighspatialresolutionremotesensingimageswithselfattentionnetwork AT ranyuyin robustbuildingextractionforhighspatialresolutionremotesensingimageswithselfattentionnetwork AT zhaomingzhang robustbuildingextractionforhighspatialresolutionremotesensingimageswithselfattentionnetwork AT sibaochen robustbuildingextractionforhighspatialresolutionremotesensingimageswithselfattentionnetwork AT binluo robustbuildingextractionforhighspatialresolutionremotesensingimageswithselfattentionnetwork |