Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image

Extracting buildings and roads from remote sensing images is very important in the area of land cover monitoring, which is of great help to urban planning. Currently, a deep learning method is used by the majority of building and road extraction algorithms. However, for existing semantic segmentatio...

Full description

Bibliographic Details
Main Authors: Zhongyu Sun, Wangping Zhou, Chen Ding, Min Xia
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:ISPRS International Journal of Geo-Information
Subjects:
Online Access:https://www.mdpi.com/2220-9964/11/3/165
_version_ 1797471072973488128
author Zhongyu Sun
Wangping Zhou
Chen Ding
Min Xia
author_facet Zhongyu Sun
Wangping Zhou
Chen Ding
Min Xia
author_sort Zhongyu Sun
collection DOAJ
description Extracting buildings and roads from remote sensing images is very important in the area of land cover monitoring, which is of great help to urban planning. Currently, a deep learning method is used by the majority of building and road extraction algorithms. However, for existing semantic segmentation, it has a limitation on the receptive field of high-resolution remote sensing images, which means that it can not show the long-distance scene well during pixel classification, and the image features is compressed during down-sampling, meaning that the detailed information is lost. In order to address these issues, Hybrid Multi-resolution and Transformer semantic extraction Network (HMRT) is proposed in this paper, by which a global receptive field for each pixel can be provided, a small receptive field of convolutional neural networks (CNN) can be overcome, and the ability of scene understanding can be enhanced well. Firstly, we blend the features by branches of different resolutions to keep the high-resolution and multi-resolution during down-sampling and fully retain feature information. Secondly, we introduce the Transformer sequence feature extraction network and use encoding and decoding to realize that each pixel has the global receptive field. The recall, F1, OA and MIoU of HMPR obtain 85.32%, 84.88%, 85.99% and 74.19%, respectively, in the main experiment and reach 91.29%, 90.41%, 91.32% and 84.00%, respectively, in the generalization experiment, which prove that the method proposed is better than existing methods.
first_indexed 2024-03-09T19:44:25Z
format Article
id doaj.art-6d96e29cd1104bbfa157e954f950fe17
institution Directory Open Access Journal
issn 2220-9964
language English
last_indexed 2024-03-09T19:44:25Z
publishDate 2022-02-01
publisher MDPI AG
record_format Article
series ISPRS International Journal of Geo-Information
spelling doaj.art-6d96e29cd1104bbfa157e954f950fe172023-11-24T01:28:12ZengMDPI AGISPRS International Journal of Geo-Information2220-99642022-02-0111316510.3390/ijgi11030165Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing ImageZhongyu Sun0Wangping Zhou1Chen Ding2Min Xia3Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaJiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaJiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaJiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaExtracting buildings and roads from remote sensing images is very important in the area of land cover monitoring, which is of great help to urban planning. Currently, a deep learning method is used by the majority of building and road extraction algorithms. However, for existing semantic segmentation, it has a limitation on the receptive field of high-resolution remote sensing images, which means that it can not show the long-distance scene well during pixel classification, and the image features is compressed during down-sampling, meaning that the detailed information is lost. In order to address these issues, Hybrid Multi-resolution and Transformer semantic extraction Network (HMRT) is proposed in this paper, by which a global receptive field for each pixel can be provided, a small receptive field of convolutional neural networks (CNN) can be overcome, and the ability of scene understanding can be enhanced well. Firstly, we blend the features by branches of different resolutions to keep the high-resolution and multi-resolution during down-sampling and fully retain feature information. Secondly, we introduce the Transformer sequence feature extraction network and use encoding and decoding to realize that each pixel has the global receptive field. The recall, F1, OA and MIoU of HMPR obtain 85.32%, 84.88%, 85.99% and 74.19%, respectively, in the main experiment and reach 91.29%, 90.41%, 91.32% and 84.00%, respectively, in the generalization experiment, which prove that the method proposed is better than existing methods.https://www.mdpi.com/2220-9964/11/3/165segmentationhigh resolutiontransformerdeep learning
spellingShingle Zhongyu Sun
Wangping Zhou
Chen Ding
Min Xia
Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image
ISPRS International Journal of Geo-Information
segmentation
high resolution
transformer
deep learning
title Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image
title_full Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image
title_fullStr Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image
title_full_unstemmed Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image
title_short Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image
title_sort multi resolution transformer network for building and road segmentation of remote sensing image
topic segmentation
high resolution
transformer
deep learning
url https://www.mdpi.com/2220-9964/11/3/165
work_keys_str_mv AT zhongyusun multiresolutiontransformernetworkforbuildingandroadsegmentationofremotesensingimage
AT wangpingzhou multiresolutiontransformernetworkforbuildingandroadsegmentationofremotesensingimage
AT chending multiresolutiontransformernetworkforbuildingandroadsegmentationofremotesensingimage
AT minxia multiresolutiontransformernetworkforbuildingandroadsegmentationofremotesensingimage