CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic Segmentation

The self-supervised learning (SSL) technique, driven by massive unlabeled data, is expected to be a promising solution for semantic segmentation of remote sensing images (RSIs) with limited labeled data, revolutionizing transfer learning. Traditional ‘local-to-local’ transfer from small, local datas...

Full description

Bibliographic Details
Main Authors: Ming Zhang, Xin Gu, Ji Qi, Zhenshi Zhang, Hemeng Yang, Jun Xu, Chengli Peng, Haifeng Li
Format: Article
Language:English
Published: MDPI AG 2024-04-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/16/7/1293
_version_ 1797211992893685760
author Ming Zhang
Xin Gu
Ji Qi
Zhenshi Zhang
Hemeng Yang
Jun Xu
Chengli Peng
Haifeng Li
author_facet Ming Zhang
Xin Gu
Ji Qi
Zhenshi Zhang
Hemeng Yang
Jun Xu
Chengli Peng
Haifeng Li
author_sort Ming Zhang
collection DOAJ
description The self-supervised learning (SSL) technique, driven by massive unlabeled data, is expected to be a promising solution for semantic segmentation of remote sensing images (RSIs) with limited labeled data, revolutionizing transfer learning. Traditional ‘local-to-local’ transfer from small, local datasets to another target dataset plays an ever-shrinking role due to RSIs’ diverse distribution shifts. Instead, SSL promotes a ‘global-to-local’ transfer paradigm, in which generalized models pre-trained on arbitrarily large unlabeled datasets are fine-tuned to the target dataset to overcome data distribution shifts. However, the SSL pre-trained models may contain both useful and useless features for the downstream semantic segmentation task, due to the gap between the SSL tasks and the downstream task. To adapt such pre-trained models to semantic segmentation tasks, traditional supervised fine-tuning methods that use only a small number of labeled samples may drop out useful features due to overfitting. The main reason behind this is that supervised fine-tuning aims to map a few training samples from the high-dimensional, sparse image space to the low-dimensional, compact semantic space defined by the downstream labels, resulting in a degradation of the distinguishability. To address the above issues, we propose a class distinguishability-enhanced self-training (CDEST) method to support global-to-local transfer. First, the self-training module in CDEST introduces a semi-supervised learning mechanism to fully utilize the large amount of unlabeled data in the downstream task to increase the size and diversity of the training data, thus alleviating the problem of biased overfitting of the model. Second, the supervised and semi-supervised contrastive learning modules of CDEST can explicitly enhance the class distinguishability of features, helping to preserve the useful features learned from pre-training while adapting to downstream tasks. We evaluate the proposed CDEST method on four RSI semantic segmentation datasets, and our method achieves optimal experimental results on all four datasets compared to supervised fine-tuning as well as three semi-supervised fine-tuning methods.
first_indexed 2024-04-24T10:35:18Z
format Article
id doaj.art-23aa0c3aadeb4f0fbacc1655848c7594
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-04-24T10:35:18Z
publishDate 2024-04-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-23aa0c3aadeb4f0fbacc1655848c75942024-04-12T13:25:55ZengMDPI AGRemote Sensing2072-42922024-04-01167129310.3390/rs16071293CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic SegmentationMing Zhang0Xin Gu1Ji Qi2Zhenshi Zhang3Hemeng Yang4Jun Xu5Chengli Peng6Haifeng Li7School of Geosciences and Info-Physics, Central South University, Changsha 410083, ChinaChina Academy of Launch Vehicle Technology Research and Development Center, Beijing 100076, ChinaSchool of Geosciences and Info-Physics, Central South University, Changsha 410083, ChinaUndergraduate School, National University of Defense Technology, Changsha 410080, ChinaTianjin Zhongwei Aerospace Data System Technology Co., Ltd., Tianjin 300301, ChinaElectric Power Research Institute of State Grid Fujian Electric Power Co., Ltd., Fuzhou 350007, ChinaSchool of Geosciences and Info-Physics, Central South University, Changsha 410083, ChinaSchool of Geosciences and Info-Physics, Central South University, Changsha 410083, ChinaThe self-supervised learning (SSL) technique, driven by massive unlabeled data, is expected to be a promising solution for semantic segmentation of remote sensing images (RSIs) with limited labeled data, revolutionizing transfer learning. Traditional ‘local-to-local’ transfer from small, local datasets to another target dataset plays an ever-shrinking role due to RSIs’ diverse distribution shifts. Instead, SSL promotes a ‘global-to-local’ transfer paradigm, in which generalized models pre-trained on arbitrarily large unlabeled datasets are fine-tuned to the target dataset to overcome data distribution shifts. However, the SSL pre-trained models may contain both useful and useless features for the downstream semantic segmentation task, due to the gap between the SSL tasks and the downstream task. To adapt such pre-trained models to semantic segmentation tasks, traditional supervised fine-tuning methods that use only a small number of labeled samples may drop out useful features due to overfitting. The main reason behind this is that supervised fine-tuning aims to map a few training samples from the high-dimensional, sparse image space to the low-dimensional, compact semantic space defined by the downstream labels, resulting in a degradation of the distinguishability. To address the above issues, we propose a class distinguishability-enhanced self-training (CDEST) method to support global-to-local transfer. First, the self-training module in CDEST introduces a semi-supervised learning mechanism to fully utilize the large amount of unlabeled data in the downstream task to increase the size and diversity of the training data, thus alleviating the problem of biased overfitting of the model. Second, the supervised and semi-supervised contrastive learning modules of CDEST can explicitly enhance the class distinguishability of features, helping to preserve the useful features learned from pre-training while adapting to downstream tasks. We evaluate the proposed CDEST method on four RSI semantic segmentation datasets, and our method achieves optimal experimental results on all four datasets compared to supervised fine-tuning as well as three semi-supervised fine-tuning methods.https://www.mdpi.com/2072-4292/16/7/1293semantic segmentationremote sensing (RS)transfer learningfine-tuning methodcontrastive learningself-training
spellingShingle Ming Zhang
Xin Gu
Ji Qi
Zhenshi Zhang
Hemeng Yang
Jun Xu
Chengli Peng
Haifeng Li
CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic Segmentation
Remote Sensing
semantic segmentation
remote sensing (RS)
transfer learning
fine-tuning method
contrastive learning
self-training
title CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic Segmentation
title_full CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic Segmentation
title_fullStr CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic Segmentation
title_full_unstemmed CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic Segmentation
title_short CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic Segmentation
title_sort cdest class distinguishability enhanced self training method for adopting pre trained models to downstream remote sensing image semantic segmentation
topic semantic segmentation
remote sensing (RS)
transfer learning
fine-tuning method
contrastive learning
self-training
url https://www.mdpi.com/2072-4292/16/7/1293
work_keys_str_mv AT mingzhang cdestclassdistinguishabilityenhancedselftrainingmethodforadoptingpretrainedmodelstodownstreamremotesensingimagesemanticsegmentation
AT xingu cdestclassdistinguishabilityenhancedselftrainingmethodforadoptingpretrainedmodelstodownstreamremotesensingimagesemanticsegmentation
AT jiqi cdestclassdistinguishabilityenhancedselftrainingmethodforadoptingpretrainedmodelstodownstreamremotesensingimagesemanticsegmentation
AT zhenshizhang cdestclassdistinguishabilityenhancedselftrainingmethodforadoptingpretrainedmodelstodownstreamremotesensingimagesemanticsegmentation
AT hemengyang cdestclassdistinguishabilityenhancedselftrainingmethodforadoptingpretrainedmodelstodownstreamremotesensingimagesemanticsegmentation
AT junxu cdestclassdistinguishabilityenhancedselftrainingmethodforadoptingpretrainedmodelstodownstreamremotesensingimagesemanticsegmentation
AT chenglipeng cdestclassdistinguishabilityenhancedselftrainingmethodforadoptingpretrainedmodelstodownstreamremotesensingimagesemanticsegmentation
AT haifengli cdestclassdistinguishabilityenhancedselftrainingmethodforadoptingpretrainedmodelstodownstreamremotesensingimagesemanticsegmentation