Self-Supervised and Few-Shot Contrastive Learning Frameworks for Text Clustering

Contrastive learning is a promising approach to unsupervised learning, as it inherits the advantages of well-studied deep models without a dedicated and complex model design. In this paper, based on bidirectional encoder representations from transformers (BERT) and long-short term memory (LSTM) neur...

Full description

Bibliographic Details
Main Authors: Haoxiang Shi, Tetsuya Sakai
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10210342/
Description
Summary:Contrastive learning is a promising approach to unsupervised learning, as it inherits the advantages of well-studied deep models without a dedicated and complex model design. In this paper, based on bidirectional encoder representations from transformers (BERT) and long-short term memory (LSTM) neural networks, we propose self-supervised contrastive learning (SCL) as well as few-shot contrastive learning (FCL) with unsupervised data augmentation (UDA) for text clustering. BERT-SCL outperforms state-of-the-art unsupervised clustering approaches for short texts and for long texts in terms of several clustering evaluation measures. LSTM-SCL also shows good performance for short text clustering. BERT-FCL achieves performance close to supervised learning, and BERT-FCL with UDA further improves the performance for short texts. LSTM-FCL outperforms the supervised model in terms of several clustering evaluation measures. Our experiment results suggest that both SCL and FCL are effective for text clustering.
ISSN:2169-3536