CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic Segmentation
The self-supervised learning (SSL) technique, driven by massive unlabeled data, is expected to be a promising solution for semantic segmentation of remote sensing images (RSIs) with limited labeled data, revolutionizing transfer learning. Traditional ‘local-to-local’ transfer from small, local datas...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-04-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/16/7/1293 |
_version_ | 1797211992893685760 |
---|---|
author | Ming Zhang Xin Gu Ji Qi Zhenshi Zhang Hemeng Yang Jun Xu Chengli Peng Haifeng Li |
author_facet | Ming Zhang Xin Gu Ji Qi Zhenshi Zhang Hemeng Yang Jun Xu Chengli Peng Haifeng Li |
author_sort | Ming Zhang |
collection | DOAJ |
description | The self-supervised learning (SSL) technique, driven by massive unlabeled data, is expected to be a promising solution for semantic segmentation of remote sensing images (RSIs) with limited labeled data, revolutionizing transfer learning. Traditional ‘local-to-local’ transfer from small, local datasets to another target dataset plays an ever-shrinking role due to RSIs’ diverse distribution shifts. Instead, SSL promotes a ‘global-to-local’ transfer paradigm, in which generalized models pre-trained on arbitrarily large unlabeled datasets are fine-tuned to the target dataset to overcome data distribution shifts. However, the SSL pre-trained models may contain both useful and useless features for the downstream semantic segmentation task, due to the gap between the SSL tasks and the downstream task. To adapt such pre-trained models to semantic segmentation tasks, traditional supervised fine-tuning methods that use only a small number of labeled samples may drop out useful features due to overfitting. The main reason behind this is that supervised fine-tuning aims to map a few training samples from the high-dimensional, sparse image space to the low-dimensional, compact semantic space defined by the downstream labels, resulting in a degradation of the distinguishability. To address the above issues, we propose a class distinguishability-enhanced self-training (CDEST) method to support global-to-local transfer. First, the self-training module in CDEST introduces a semi-supervised learning mechanism to fully utilize the large amount of unlabeled data in the downstream task to increase the size and diversity of the training data, thus alleviating the problem of biased overfitting of the model. Second, the supervised and semi-supervised contrastive learning modules of CDEST can explicitly enhance the class distinguishability of features, helping to preserve the useful features learned from pre-training while adapting to downstream tasks. We evaluate the proposed CDEST method on four RSI semantic segmentation datasets, and our method achieves optimal experimental results on all four datasets compared to supervised fine-tuning as well as three semi-supervised fine-tuning methods. |
first_indexed | 2024-04-24T10:35:18Z |
format | Article |
id | doaj.art-23aa0c3aadeb4f0fbacc1655848c7594 |
institution | Directory Open Access Journal |
issn | 2072-4292 |
language | English |
last_indexed | 2024-04-24T10:35:18Z |
publishDate | 2024-04-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj.art-23aa0c3aadeb4f0fbacc1655848c75942024-04-12T13:25:55ZengMDPI AGRemote Sensing2072-42922024-04-01167129310.3390/rs16071293CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic SegmentationMing Zhang0Xin Gu1Ji Qi2Zhenshi Zhang3Hemeng Yang4Jun Xu5Chengli Peng6Haifeng Li7School of Geosciences and Info-Physics, Central South University, Changsha 410083, ChinaChina Academy of Launch Vehicle Technology Research and Development Center, Beijing 100076, ChinaSchool of Geosciences and Info-Physics, Central South University, Changsha 410083, ChinaUndergraduate School, National University of Defense Technology, Changsha 410080, ChinaTianjin Zhongwei Aerospace Data System Technology Co., Ltd., Tianjin 300301, ChinaElectric Power Research Institute of State Grid Fujian Electric Power Co., Ltd., Fuzhou 350007, ChinaSchool of Geosciences and Info-Physics, Central South University, Changsha 410083, ChinaSchool of Geosciences and Info-Physics, Central South University, Changsha 410083, ChinaThe self-supervised learning (SSL) technique, driven by massive unlabeled data, is expected to be a promising solution for semantic segmentation of remote sensing images (RSIs) with limited labeled data, revolutionizing transfer learning. Traditional ‘local-to-local’ transfer from small, local datasets to another target dataset plays an ever-shrinking role due to RSIs’ diverse distribution shifts. Instead, SSL promotes a ‘global-to-local’ transfer paradigm, in which generalized models pre-trained on arbitrarily large unlabeled datasets are fine-tuned to the target dataset to overcome data distribution shifts. However, the SSL pre-trained models may contain both useful and useless features for the downstream semantic segmentation task, due to the gap between the SSL tasks and the downstream task. To adapt such pre-trained models to semantic segmentation tasks, traditional supervised fine-tuning methods that use only a small number of labeled samples may drop out useful features due to overfitting. The main reason behind this is that supervised fine-tuning aims to map a few training samples from the high-dimensional, sparse image space to the low-dimensional, compact semantic space defined by the downstream labels, resulting in a degradation of the distinguishability. To address the above issues, we propose a class distinguishability-enhanced self-training (CDEST) method to support global-to-local transfer. First, the self-training module in CDEST introduces a semi-supervised learning mechanism to fully utilize the large amount of unlabeled data in the downstream task to increase the size and diversity of the training data, thus alleviating the problem of biased overfitting of the model. Second, the supervised and semi-supervised contrastive learning modules of CDEST can explicitly enhance the class distinguishability of features, helping to preserve the useful features learned from pre-training while adapting to downstream tasks. We evaluate the proposed CDEST method on four RSI semantic segmentation datasets, and our method achieves optimal experimental results on all four datasets compared to supervised fine-tuning as well as three semi-supervised fine-tuning methods.https://www.mdpi.com/2072-4292/16/7/1293semantic segmentationremote sensing (RS)transfer learningfine-tuning methodcontrastive learningself-training |
spellingShingle | Ming Zhang Xin Gu Ji Qi Zhenshi Zhang Hemeng Yang Jun Xu Chengli Peng Haifeng Li CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic Segmentation Remote Sensing semantic segmentation remote sensing (RS) transfer learning fine-tuning method contrastive learning self-training |
title | CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic Segmentation |
title_full | CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic Segmentation |
title_fullStr | CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic Segmentation |
title_full_unstemmed | CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic Segmentation |
title_short | CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic Segmentation |
title_sort | cdest class distinguishability enhanced self training method for adopting pre trained models to downstream remote sensing image semantic segmentation |
topic | semantic segmentation remote sensing (RS) transfer learning fine-tuning method contrastive learning self-training |
url | https://www.mdpi.com/2072-4292/16/7/1293 |
work_keys_str_mv | AT mingzhang cdestclassdistinguishabilityenhancedselftrainingmethodforadoptingpretrainedmodelstodownstreamremotesensingimagesemanticsegmentation AT xingu cdestclassdistinguishabilityenhancedselftrainingmethodforadoptingpretrainedmodelstodownstreamremotesensingimagesemanticsegmentation AT jiqi cdestclassdistinguishabilityenhancedselftrainingmethodforadoptingpretrainedmodelstodownstreamremotesensingimagesemanticsegmentation AT zhenshizhang cdestclassdistinguishabilityenhancedselftrainingmethodforadoptingpretrainedmodelstodownstreamremotesensingimagesemanticsegmentation AT hemengyang cdestclassdistinguishabilityenhancedselftrainingmethodforadoptingpretrainedmodelstodownstreamremotesensingimagesemanticsegmentation AT junxu cdestclassdistinguishabilityenhancedselftrainingmethodforadoptingpretrainedmodelstodownstreamremotesensingimagesemanticsegmentation AT chenglipeng cdestclassdistinguishabilityenhancedselftrainingmethodforadoptingpretrainedmodelstodownstreamremotesensingimagesemanticsegmentation AT haifengli cdestclassdistinguishabilityenhancedselftrainingmethodforadoptingpretrainedmodelstodownstreamremotesensingimagesemanticsegmentation |