Clustering together with learning representations

Document clustering is a useful and practical machine learning methodology, with various real-world applications, such as search optimization, document recommendation, and tag generation of papers and records. It realizes the process of arranging a batch of pdf documents into many separate subgroups...

Full description

Bibliographic Details
Main Author: Yu, Shuaiqi
Other Authors: Lihui Chen
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/158048
_version_ 1811694914323349504
author Yu, Shuaiqi
author2 Lihui Chen
author_facet Lihui Chen
Yu, Shuaiqi
author_sort Yu, Shuaiqi
collection NTU
description Document clustering is a useful and practical machine learning methodology, with various real-world applications, such as search optimization, document recommendation, and tag generation of papers and records. It realizes the process of arranging a batch of pdf documents into many separate subgroups. To achieve more efficient clustering, we introduce representation learning, which is an unsupervised learning approach that self-studies the features from unlabeled data. In this project, we aim at implementing and studying a series of representation learning methods which are more suitable for clustering tasks on web documents such as Reuters-10k dataset. Specifically, the deep fuzzy clustering GrDNFCS has been implemented and explored to reproduce automatically categorize web documents reported in the paper. A new approach named CLDFC, where a contrastive loss is introduced into GrDNFCS is proposed and designed to improve accuracy of clustering. Based on our preliminary study, CLDEC shows 2.5% improvement in accuracy and reduce time complexity of average 60s per epoch compared with GrDNFCS. Experiments on several other clustering models will be included for comparisons.
first_indexed 2024-10-01T07:15:08Z
format Final Year Project (FYP)
id ntu-10356/158048
institution Nanyang Technological University
language English
last_indexed 2024-10-01T07:15:08Z
publishDate 2022
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1580482023-07-07T19:26:15Z Clustering together with learning representations Yu, Shuaiqi Lihui Chen School of Electrical and Electronic Engineering ELHCHEN@ntu.edu.sg Engineering::Electrical and electronic engineering Document clustering is a useful and practical machine learning methodology, with various real-world applications, such as search optimization, document recommendation, and tag generation of papers and records. It realizes the process of arranging a batch of pdf documents into many separate subgroups. To achieve more efficient clustering, we introduce representation learning, which is an unsupervised learning approach that self-studies the features from unlabeled data. In this project, we aim at implementing and studying a series of representation learning methods which are more suitable for clustering tasks on web documents such as Reuters-10k dataset. Specifically, the deep fuzzy clustering GrDNFCS has been implemented and explored to reproduce automatically categorize web documents reported in the paper. A new approach named CLDFC, where a contrastive loss is introduced into GrDNFCS is proposed and designed to improve accuracy of clustering. Based on our preliminary study, CLDEC shows 2.5% improvement in accuracy and reduce time complexity of average 60s per epoch compared with GrDNFCS. Experiments on several other clustering models will be included for comparisons. Bachelor of Engineering (Electrical and Electronic Engineering) 2022-05-26T06:45:12Z 2022-05-26T06:45:12Z 2022 Final Year Project (FYP) Yu, S. (2022). Clustering together with learning representations. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/158048 https://hdl.handle.net/10356/158048 en application/pdf Nanyang Technological University
spellingShingle Engineering::Electrical and electronic engineering
Yu, Shuaiqi
Clustering together with learning representations
title Clustering together with learning representations
title_full Clustering together with learning representations
title_fullStr Clustering together with learning representations
title_full_unstemmed Clustering together with learning representations
title_short Clustering together with learning representations
title_sort clustering together with learning representations
topic Engineering::Electrical and electronic engineering
url https://hdl.handle.net/10356/158048
work_keys_str_mv AT yushuaiqi clusteringtogetherwithlearningrepresentations