Multi-Scale Semantic Segmentation and Spatial Relationship Recognition of Remote Sensing Images Based on an Attention Model

A comprehensive interpretation of remote sensing images involves not only remote sensing object recognition but also the recognition of spatial relations between objects. Especially in the case of different objects with the same spectrum, the spatial relationship can help interpret remote sensing ob...

Full description

Bibliographic Details
Main Authors:	Wei Cui, Fei Wang, Xin He, Dongyou Zhang, Xuxiang Xu, Meng Yao, Ziwei Wang, Jiejun Huang
Format:	Article
Language:	English
Published:	MDPI AG 2019-05-01
Series:	Remote Sensing
Subjects:	multi-scale semantic segmentation image caption remote sensing LSTM U-Net upscaling downscaling
Online Access:	https://www.mdpi.com/2072-4292/11/9/1044

_version_	1818321467168784384
author	Wei Cui Fei Wang Xin He Dongyou Zhang Xuxiang Xu Meng Yao Ziwei Wang Jiejun Huang
author_facet	Wei Cui Fei Wang Xin He Dongyou Zhang Xuxiang Xu Meng Yao Ziwei Wang Jiejun Huang
author_sort	Wei Cui
collection	DOAJ
description	A comprehensive interpretation of remote sensing images involves not only remote sensing object recognition but also the recognition of spatial relations between objects. Especially in the case of different objects with the same spectrum, the spatial relationship can help interpret remote sensing objects more accurately. Compared with traditional remote sensing object recognition methods, deep learning has the advantages of high accuracy and strong generalizability regarding scene classification and semantic segmentation. However, it is difficult to simultaneously recognize remote sensing objects and their spatial relationship from end-to-end only relying on present deep learning networks. To address this problem, we propose a multi-scale remote sensing image interpretation network, called the MSRIN. The architecture of the MSRIN is a parallel deep neural network based on a fully convolutional network (FCN), a U-Net, and a long short-term memory network (LSTM). The MSRIN recognizes remote sensing objects and their spatial relationship through three processes. First, the MSRIN defines a multi-scale remote sensing image caption strategy and simultaneously segments the same image using the FCN and U-Net on different spatial scales so that a two-scale hierarchy is formed. The output of the FCN and U-Net are masked to obtain the location and boundaries of remote sensing objects. Second, using an attention-based LSTM, the remote sensing image captions include the remote sensing objects (nouns) and their spatial relationships described with natural language. Finally, we designed a remote sensing object recognition and correction mechanism to build the relationship between nouns in captions and object mask graphs using an attention weight matrix to transfer the spatial relationship from captions to objects mask graphs. In other words, the MSRIN simultaneously realizes the semantic segmentation of the remote sensing objects and their spatial relationship identification end-to-end. Experimental results demonstrated that the matching rate between samples and the mask graph increased by 67.37 percentage points, and the matching rate between nouns and the mask graph increased by 41.78 percentage points compared to before correction. The proposed MSRIN has achieved remarkable results.
first_indexed	2024-12-13T10:41:22Z
format	Article
id	doaj.art-aac1ebb2e0df4f858df8114eff38ca8d
institution	Directory Open Access Journal
issn	2072-4292
language	English
last_indexed	2024-12-13T10:41:22Z
publishDate	2019-05-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj.art-aac1ebb2e0df4f858df8114eff38ca8d2022-12-21T23:50:29ZengMDPI AGRemote Sensing2072-42922019-05-01119104410.3390/rs11091044rs11091044Multi-Scale Semantic Segmentation and Spatial Relationship Recognition of Remote Sensing Images Based on an Attention ModelWei Cui0Fei Wang1Xin He2Dongyou Zhang3Xuxiang Xu4Meng Yao5Ziwei Wang6Jiejun Huang7School of Resources and Environmental Engineering, Wuhan University of Technology, Wuhan 430070, ChinaSchool of Resources and Environmental Engineering, Wuhan University of Technology, Wuhan 430070, ChinaSchool of Resources and Environmental Engineering, Wuhan University of Technology, Wuhan 430070, ChinaSchool of Resources and Environmental Engineering, Wuhan University of Technology, Wuhan 430070, ChinaSchool of Resources and Environmental Engineering, Wuhan University of Technology, Wuhan 430070, ChinaSchool of Resources and Environmental Engineering, Wuhan University of Technology, Wuhan 430070, ChinaSchool of Resources and Environmental Engineering, Wuhan University of Technology, Wuhan 430070, ChinaSchool of Resources and Environmental Engineering, Wuhan University of Technology, Wuhan 430070, ChinaA comprehensive interpretation of remote sensing images involves not only remote sensing object recognition but also the recognition of spatial relations between objects. Especially in the case of different objects with the same spectrum, the spatial relationship can help interpret remote sensing objects more accurately. Compared with traditional remote sensing object recognition methods, deep learning has the advantages of high accuracy and strong generalizability regarding scene classification and semantic segmentation. However, it is difficult to simultaneously recognize remote sensing objects and their spatial relationship from end-to-end only relying on present deep learning networks. To address this problem, we propose a multi-scale remote sensing image interpretation network, called the MSRIN. The architecture of the MSRIN is a parallel deep neural network based on a fully convolutional network (FCN), a U-Net, and a long short-term memory network (LSTM). The MSRIN recognizes remote sensing objects and their spatial relationship through three processes. First, the MSRIN defines a multi-scale remote sensing image caption strategy and simultaneously segments the same image using the FCN and U-Net on different spatial scales so that a two-scale hierarchy is formed. The output of the FCN and U-Net are masked to obtain the location and boundaries of remote sensing objects. Second, using an attention-based LSTM, the remote sensing image captions include the remote sensing objects (nouns) and their spatial relationships described with natural language. Finally, we designed a remote sensing object recognition and correction mechanism to build the relationship between nouns in captions and object mask graphs using an attention weight matrix to transfer the spatial relationship from captions to objects mask graphs. In other words, the MSRIN simultaneously realizes the semantic segmentation of the remote sensing objects and their spatial relationship identification end-to-end. Experimental results demonstrated that the matching rate between samples and the mask graph increased by 67.37 percentage points, and the matching rate between nouns and the mask graph increased by 41.78 percentage points compared to before correction. The proposed MSRIN has achieved remarkable results.https://www.mdpi.com/2072-4292/11/9/1044multi-scalesemantic segmentationimage captionremote sensingLSTMU-Netupscalingdownscaling
spellingShingle	Wei Cui Fei Wang Xin He Dongyou Zhang Xuxiang Xu Meng Yao Ziwei Wang Jiejun Huang Multi-Scale Semantic Segmentation and Spatial Relationship Recognition of Remote Sensing Images Based on an Attention Model Remote Sensing multi-scale semantic segmentation image caption remote sensing LSTM U-Net upscaling downscaling
title	Multi-Scale Semantic Segmentation and Spatial Relationship Recognition of Remote Sensing Images Based on an Attention Model
title_full	Multi-Scale Semantic Segmentation and Spatial Relationship Recognition of Remote Sensing Images Based on an Attention Model
title_fullStr	Multi-Scale Semantic Segmentation and Spatial Relationship Recognition of Remote Sensing Images Based on an Attention Model
title_full_unstemmed	Multi-Scale Semantic Segmentation and Spatial Relationship Recognition of Remote Sensing Images Based on an Attention Model
title_short	Multi-Scale Semantic Segmentation and Spatial Relationship Recognition of Remote Sensing Images Based on an Attention Model
title_sort	multi scale semantic segmentation and spatial relationship recognition of remote sensing images based on an attention model
topic	multi-scale semantic segmentation image caption remote sensing LSTM U-Net upscaling downscaling
url	https://www.mdpi.com/2072-4292/11/9/1044
work_keys_str_mv	AT weicui multiscalesemanticsegmentationandspatialrelationshiprecognitionofremotesensingimagesbasedonanattentionmodel AT feiwang multiscalesemanticsegmentationandspatialrelationshiprecognitionofremotesensingimagesbasedonanattentionmodel AT xinhe multiscalesemanticsegmentationandspatialrelationshiprecognitionofremotesensingimagesbasedonanattentionmodel AT dongyouzhang multiscalesemanticsegmentationandspatialrelationshiprecognitionofremotesensingimagesbasedonanattentionmodel AT xuxiangxu multiscalesemanticsegmentationandspatialrelationshiprecognitionofremotesensingimagesbasedonanattentionmodel AT mengyao multiscalesemanticsegmentationandspatialrelationshiprecognitionofremotesensingimagesbasedonanattentionmodel AT ziweiwang multiscalesemanticsegmentationandspatialrelationshiprecognitionofremotesensingimagesbasedonanattentionmodel AT jiejunhuang multiscalesemanticsegmentationandspatialrelationshiprecognitionofremotesensingimagesbasedonanattentionmodel

Multi-Scale Semantic Segmentation and Spatial Relationship Recognition of Remote Sensing Images Based on an Attention Model

Similar Items