Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image

Extracting buildings and roads from remote sensing images is very important in the area of land cover monitoring, which is of great help to urban planning. Currently, a deep learning method is used by the majority of building and road extraction algorithms. However, for existing semantic segmentatio...

Full description

Bibliographic Details
Main Authors:	Zhongyu Sun, Wangping Zhou, Chen Ding, Min Xia
Format:	Article
Language:	English
Published:	MDPI AG 2022-02-01
Series:	ISPRS International Journal of Geo-Information
Subjects:	segmentation high resolution transformer deep learning
Online Access:	https://www.mdpi.com/2220-9964/11/3/165

_version_	1827648671919898624
author	Zhongyu Sun Wangping Zhou Chen Ding Min Xia
author_facet	Zhongyu Sun Wangping Zhou Chen Ding Min Xia
author_sort	Zhongyu Sun
collection	DOAJ
description	Extracting buildings and roads from remote sensing images is very important in the area of land cover monitoring, which is of great help to urban planning. Currently, a deep learning method is used by the majority of building and road extraction algorithms. However, for existing semantic segmentation, it has a limitation on the receptive field of high-resolution remote sensing images, which means that it can not show the long-distance scene well during pixel classification, and the image features is compressed during down-sampling, meaning that the detailed information is lost. In order to address these issues, Hybrid Multi-resolution and Transformer semantic extraction Network (HMRT) is proposed in this paper, by which a global receptive field for each pixel can be provided, a small receptive field of convolutional neural networks (CNN) can be overcome, and the ability of scene understanding can be enhanced well. Firstly, we blend the features by branches of different resolutions to keep the high-resolution and multi-resolution during down-sampling and fully retain feature information. Secondly, we introduce the Transformer sequence feature extraction network and use encoding and decoding to realize that each pixel has the global receptive field. The recall, F1, OA and MIoU of HMPR obtain 85.32%, 84.88%, 85.99% and 74.19%, respectively, in the main experiment and reach 91.29%, 90.41%, 91.32% and 84.00%, respectively, in the generalization experiment, which prove that the method proposed is better than existing methods.
first_indexed	2024-03-09T19:44:25Z
format	Article
id	doaj.art-6d96e29cd1104bbfa157e954f950fe17
institution	Directory Open Access Journal
issn	2220-9964
language	English
last_indexed	2024-03-09T19:44:25Z
publishDate	2022-02-01
publisher	MDPI AG
record_format	Article
series	ISPRS International Journal of Geo-Information
spelling	doaj.art-6d96e29cd1104bbfa157e954f950fe172023-11-24T01:28:12ZengMDPI AGISPRS International Journal of Geo-Information2220-99642022-02-0111316510.3390/ijgi11030165Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing ImageZhongyu Sun0Wangping Zhou1Chen Ding2Min Xia3Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaJiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaJiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaJiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaExtracting buildings and roads from remote sensing images is very important in the area of land cover monitoring, which is of great help to urban planning. Currently, a deep learning method is used by the majority of building and road extraction algorithms. However, for existing semantic segmentation, it has a limitation on the receptive field of high-resolution remote sensing images, which means that it can not show the long-distance scene well during pixel classification, and the image features is compressed during down-sampling, meaning that the detailed information is lost. In order to address these issues, Hybrid Multi-resolution and Transformer semantic extraction Network (HMRT) is proposed in this paper, by which a global receptive field for each pixel can be provided, a small receptive field of convolutional neural networks (CNN) can be overcome, and the ability of scene understanding can be enhanced well. Firstly, we blend the features by branches of different resolutions to keep the high-resolution and multi-resolution during down-sampling and fully retain feature information. Secondly, we introduce the Transformer sequence feature extraction network and use encoding and decoding to realize that each pixel has the global receptive field. The recall, F1, OA and MIoU of HMPR obtain 85.32%, 84.88%, 85.99% and 74.19%, respectively, in the main experiment and reach 91.29%, 90.41%, 91.32% and 84.00%, respectively, in the generalization experiment, which prove that the method proposed is better than existing methods.https://www.mdpi.com/2220-9964/11/3/165segmentationhigh resolutiontransformerdeep learning
spellingShingle	Zhongyu Sun Wangping Zhou Chen Ding Min Xia Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image ISPRS International Journal of Geo-Information segmentation high resolution transformer deep learning
title	Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image
title_full	Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image
title_fullStr	Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image
title_full_unstemmed	Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image
title_short	Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image
title_sort	multi resolution transformer network for building and road segmentation of remote sensing image
topic	segmentation high resolution transformer deep learning
url	https://www.mdpi.com/2220-9964/11/3/165
work_keys_str_mv	AT zhongyusun multiresolutiontransformernetworkforbuildingandroadsegmentationofremotesensingimage AT wangpingzhou multiresolutiontransformernetworkforbuildingandroadsegmentationofremotesensingimage AT chending multiresolutiontransformernetworkforbuildingandroadsegmentationofremotesensingimage AT minxia multiresolutiontransformernetworkforbuildingandroadsegmentationofremotesensingimage

Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image

Similar Items