Learning Distributed Representations and Deep Embedded Clustering of Texts

Instructors face significant time and effort constraints when grading students’ assessments on a large scale. Clustering similar assessments is a unique and effective technique that has the potential to significantly reduce the workload of instructors in online and large-scale learning environments....

Full description

Bibliographic Details
Main Authors: Shuang Wang, Amin Beheshti, Yufei Wang, Jianchao Lu, Quan Z. Sheng, Stephen Elbourn, Hamid Alinejad-Rokny
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/16/3/158
_version_ 1797613926837387264
author Shuang Wang
Amin Beheshti
Yufei Wang
Jianchao Lu
Quan Z. Sheng
Stephen Elbourn
Hamid Alinejad-Rokny
author_facet Shuang Wang
Amin Beheshti
Yufei Wang
Jianchao Lu
Quan Z. Sheng
Stephen Elbourn
Hamid Alinejad-Rokny
author_sort Shuang Wang
collection DOAJ
description Instructors face significant time and effort constraints when grading students’ assessments on a large scale. Clustering similar assessments is a unique and effective technique that has the potential to significantly reduce the workload of instructors in online and large-scale learning environments. By grouping together similar assessments, marking one assessment in a cluster can be scaled to other similar assessments, allowing for a more efficient and streamlined grading process. To address this issue, this paper focuses on text assessments and proposes a method for reducing the workload of instructors by clustering similar assessments. The proposed method involves the use of distributed representation to transform texts into vectors, and contrastive learning to improve the representation that distinguishes the differences among similar texts. The paper presents a general framework for clustering similar texts that includes label representation, K-means, and self-organization map algorithms, with the objective of improving clustering performance using Accuracy (ACC) and Normalized Mutual Information (NMI) metrics. The proposed framework is evaluated experimentally using two real datasets. The results show that self-organization maps and K-means algorithms with Pre-trained language models outperform label representation algorithms for different datasets.
first_indexed 2024-03-11T07:02:34Z
format Article
id doaj.art-6f3cba7e1fa2435bb281dc1d67166cb5
institution Directory Open Access Journal
issn 1999-4893
language English
last_indexed 2024-03-11T07:02:34Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series Algorithms
spelling doaj.art-6f3cba7e1fa2435bb281dc1d67166cb52023-11-17T09:09:23ZengMDPI AGAlgorithms1999-48932023-03-0116315810.3390/a16030158Learning Distributed Representations and Deep Embedded Clustering of TextsShuang Wang0Amin Beheshti1Yufei Wang2Jianchao Lu3Quan Z. Sheng4Stephen Elbourn5Hamid Alinejad-Rokny6School of Computing, Macquarie University, Sydney, NSW 2109, AustraliaSchool of Computing, Macquarie University, Sydney, NSW 2109, AustraliaSchool of Computing, Macquarie University, Sydney, NSW 2109, AustraliaSchool of Computing, Macquarie University, Sydney, NSW 2109, AustraliaSchool of Computing, Macquarie University, Sydney, NSW 2109, AustraliaSchool of Computing, Macquarie University, Sydney, NSW 2109, AustraliaSchool of Computing, Macquarie University, Sydney, NSW 2109, AustraliaInstructors face significant time and effort constraints when grading students’ assessments on a large scale. Clustering similar assessments is a unique and effective technique that has the potential to significantly reduce the workload of instructors in online and large-scale learning environments. By grouping together similar assessments, marking one assessment in a cluster can be scaled to other similar assessments, allowing for a more efficient and streamlined grading process. To address this issue, this paper focuses on text assessments and proposes a method for reducing the workload of instructors by clustering similar assessments. The proposed method involves the use of distributed representation to transform texts into vectors, and contrastive learning to improve the representation that distinguishes the differences among similar texts. The paper presents a general framework for clustering similar texts that includes label representation, K-means, and self-organization map algorithms, with the objective of improving clustering performance using Accuracy (ACC) and Normalized Mutual Information (NMI) metrics. The proposed framework is evaluated experimentally using two real datasets. The results show that self-organization maps and K-means algorithms with Pre-trained language models outperform label representation algorithms for different datasets.https://www.mdpi.com/1999-4893/16/3/158distributed representationdeep clusteringdata augmentationcontrastive learningartificial intelligence
spellingShingle Shuang Wang
Amin Beheshti
Yufei Wang
Jianchao Lu
Quan Z. Sheng
Stephen Elbourn
Hamid Alinejad-Rokny
Learning Distributed Representations and Deep Embedded Clustering of Texts
Algorithms
distributed representation
deep clustering
data augmentation
contrastive learning
artificial intelligence
title Learning Distributed Representations and Deep Embedded Clustering of Texts
title_full Learning Distributed Representations and Deep Embedded Clustering of Texts
title_fullStr Learning Distributed Representations and Deep Embedded Clustering of Texts
title_full_unstemmed Learning Distributed Representations and Deep Embedded Clustering of Texts
title_short Learning Distributed Representations and Deep Embedded Clustering of Texts
title_sort learning distributed representations and deep embedded clustering of texts
topic distributed representation
deep clustering
data augmentation
contrastive learning
artificial intelligence
url https://www.mdpi.com/1999-4893/16/3/158
work_keys_str_mv AT shuangwang learningdistributedrepresentationsanddeepembeddedclusteringoftexts
AT aminbeheshti learningdistributedrepresentationsanddeepembeddedclusteringoftexts
AT yufeiwang learningdistributedrepresentationsanddeepembeddedclusteringoftexts
AT jianchaolu learningdistributedrepresentationsanddeepembeddedclusteringoftexts
AT quanzsheng learningdistributedrepresentationsanddeepembeddedclusteringoftexts
AT stephenelbourn learningdistributedrepresentationsanddeepembeddedclusteringoftexts
AT hamidalinejadrokny learningdistributedrepresentationsanddeepembeddedclusteringoftexts