Self-Supervised Visual Representation Learning via Residual Momentum

Self-supervised learning (SSL) has emerged as a promising approach for learning representations from unlabeled data. Momentum-based contrastive frameworks such as MoCo-v3 have shown remarkable success among the many SSL methods proposed in recent years. However, a significant gap in encoder represen...

Full description

Bibliographic Details
Main Authors:	Trung Xuan Pham, Axi Niu, Kang Zhang, Tee Joshua Tian Jin, Ji Woo Hong, Chang D. Yoo
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Contrastive learning residual momentum representation learning self-supervised learning knowledge distillation teacher-student gap
Online Access:	https://ieeexplore.ieee.org/document/10287941/

_version_	1797648645681577984
author	Trung Xuan Pham Axi Niu Kang Zhang Tee Joshua Tian Jin Ji Woo Hong Chang D. Yoo
author_facet	Trung Xuan Pham Axi Niu Kang Zhang Tee Joshua Tian Jin Ji Woo Hong Chang D. Yoo
author_sort	Trung Xuan Pham
collection	DOAJ
description	Self-supervised learning (SSL) has emerged as a promising approach for learning representations from unlabeled data. Momentum-based contrastive frameworks such as MoCo-v3 have shown remarkable success among the many SSL methods proposed in recent years. However, a significant gap in encoder representation exists between the online encoder (student) and the momentum encoder (teacher) in these frameworks, limiting the performance on downstream tasks. We identify this gap as a bottleneck often overlooked in existing frameworks and propose “residual momentum” that explicitly reduces the gap during training to encourage the student to learn representations closer to the teacher’s. We also reveal that a similar technique, knowledge distillation (KD), to reduce the distribution gap with cross-entropy-based loss in supervised learning is useless in the SSL context and demonstrate that the intra-representation gap measured by cosine similarity is crucial for EMA-based SSLs. Extensive experiments on different benchmark datasets and architectures demonstrate the superiority of our method compared to state-of-the-art contrastive learning baselines. Specifically, our method outperforms MoCo-v3 0.7% top-1 in ImageNet, 2.82% on CIFAR-100, 1.8% AP, and 3.0% AP75 on VOC detection pre-trained on the COCO dataset; it also improves DenseCL with 0.5% AP (800ep) and 0.6% AP75 (1600ep). Our work highlights the importance of reducing the teacher-student intra-gap in momentum-based contrastive learning frameworks and provides a practical solution for improving the quality of learned representations.
first_indexed	2024-03-11T15:33:57Z
format	Article
id	doaj.art-40a6e2d1b58f4821815151218f40b9d6
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-11T15:33:57Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-40a6e2d1b58f4821815151218f40b9d62023-10-26T23:01:26ZengIEEEIEEE Access2169-35362023-01-011111670611672010.1109/ACCESS.2023.332584210287941Self-Supervised Visual Representation Learning via Residual MomentumTrung Xuan Pham0https://orcid.org/0000-0003-4177-7054Axi Niu1https://orcid.org/0000-0001-5238-9917Kang Zhang2https://orcid.org/0000-0003-2761-9383Tee Joshua Tian Jin3https://orcid.org/0009-0001-5119-2802Ji Woo Hong4https://orcid.org/0000-0002-3758-0307Chang D. Yoo5https://orcid.org/0000-0002-0756-7179School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South KoreaSchool of Computer Science, Northwestern Polytechnical University, Xi’an, ChinaSchool of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South KoreaSchool of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South KoreaSchool of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South KoreaSchool of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South KoreaSelf-supervised learning (SSL) has emerged as a promising approach for learning representations from unlabeled data. Momentum-based contrastive frameworks such as MoCo-v3 have shown remarkable success among the many SSL methods proposed in recent years. However, a significant gap in encoder representation exists between the online encoder (student) and the momentum encoder (teacher) in these frameworks, limiting the performance on downstream tasks. We identify this gap as a bottleneck often overlooked in existing frameworks and propose “residual momentum” that explicitly reduces the gap during training to encourage the student to learn representations closer to the teacher’s. We also reveal that a similar technique, knowledge distillation (KD), to reduce the distribution gap with cross-entropy-based loss in supervised learning is useless in the SSL context and demonstrate that the intra-representation gap measured by cosine similarity is crucial for EMA-based SSLs. Extensive experiments on different benchmark datasets and architectures demonstrate the superiority of our method compared to state-of-the-art contrastive learning baselines. Specifically, our method outperforms MoCo-v3 0.7% top-1 in ImageNet, 2.82% on CIFAR-100, 1.8% AP, and 3.0% AP75 on VOC detection pre-trained on the COCO dataset; it also improves DenseCL with 0.5% AP (800ep) and 0.6% AP75 (1600ep). Our work highlights the importance of reducing the teacher-student intra-gap in momentum-based contrastive learning frameworks and provides a practical solution for improving the quality of learned representations.https://ieeexplore.ieee.org/document/10287941/Contrastive learningresidual momentumrepresentation learningself-supervised learningknowledge distillationteacher-student gap
spellingShingle	Trung Xuan Pham Axi Niu Kang Zhang Tee Joshua Tian Jin Ji Woo Hong Chang D. Yoo Self-Supervised Visual Representation Learning via Residual Momentum IEEE Access Contrastive learning residual momentum representation learning self-supervised learning knowledge distillation teacher-student gap
title	Self-Supervised Visual Representation Learning via Residual Momentum
title_full	Self-Supervised Visual Representation Learning via Residual Momentum
title_fullStr	Self-Supervised Visual Representation Learning via Residual Momentum
title_full_unstemmed	Self-Supervised Visual Representation Learning via Residual Momentum
title_short	Self-Supervised Visual Representation Learning via Residual Momentum
title_sort	self supervised visual representation learning via residual momentum
topic	Contrastive learning residual momentum representation learning self-supervised learning knowledge distillation teacher-student gap
url	https://ieeexplore.ieee.org/document/10287941/
work_keys_str_mv	AT trungxuanpham selfsupervisedvisualrepresentationlearningviaresidualmomentum AT axiniu selfsupervisedvisualrepresentationlearningviaresidualmomentum AT kangzhang selfsupervisedvisualrepresentationlearningviaresidualmomentum AT teejoshuatianjin selfsupervisedvisualrepresentationlearningviaresidualmomentum AT jiwoohong selfsupervisedvisualrepresentationlearningviaresidualmomentum AT changdyoo selfsupervisedvisualrepresentationlearningviaresidualmomentum

Self-Supervised Visual Representation Learning via Residual Momentum

Similar Items