Scene classification for remote sensing images with self‐attention augmented CNN

Abstract Remote sensing scene classification aims to automatically assign a specific semantic label to each image. It is challenging to classify remote sensing scene images due to the images' diversity and rich spatial information. Recently, convolutional neural networks have been widely used t...

Full description

Bibliographic Details
Main Authors: Zongyin Liu, Anming Dong, Jiguo Yu, Yubing Han, You Zhou, Kai Zhao
Format: Article
Language:English
Published: Wiley 2022-09-01
Series:IET Image Processing
Online Access:https://doi.org/10.1049/ipr2.12540
Description
Summary:Abstract Remote sensing scene classification aims to automatically assign a specific semantic label to each image. It is challenging to classify remote sensing scene images due to the images' diversity and rich spatial information. Recently, convolutional neural networks have been widely used to overcome these difficulties, such as the famous Visual Geometry Group (VGG) network. However, the VGG network with local receptive fields cannot model the global information of remote sensing images well. It also needs a large number of parameters and floating point operations to achieve satisfactory accuracy. To overcome these challenges, we introduce the self‐attention mechanism to the VGG network. Specifically, we replace the last four convolutional layers in the VGG‐19 network with two cascaded self‐attention blocks, each consisting of two multi‐head self‐attention (MHSA) layers with the residual network structure. The new structure can simultaneously explore the local and global information from remote sensing scenes. Such improvements not only reduce model parameters but also improve the classification performance. The effectiveness of the proposed method is validated through experiments on four public data sets, i.e., NaSC‐TG2, WHU‐RS19, AID and EuroSAT.
ISSN:1751-9659
1751-9667