Improved Constrained <i>k</i>-Means Algorithm for Clustering with Domain Knowledge

Witnessing the tremendous development of machine learning technology, emerging machine learning applications impose challenges of using domain knowledge to improve the accuracy of clustering provided that clustering suffers a compromising accuracy rate despite its advantage of fast procession. In th...

Full description

Bibliographic Details
Main Authors:	Peihuang Huang, Pei Yao, Zhendong Hao, Huihong Peng, Longkun Guo
Format:	Article
Language:	English
Published:	MDPI AG 2021-09-01
Series:	Mathematics
Subjects:	constrained <i>k</i>-means minimum weight matching side information domain knowledge
Online Access:	https://www.mdpi.com/2227-7390/9/19/2390

_version_	1797516021904441344
author	Peihuang Huang Pei Yao Zhendong Hao Huihong Peng Longkun Guo
author_facet	Peihuang Huang Pei Yao Zhendong Hao Huihong Peng Longkun Guo
author_sort	Peihuang Huang
collection	DOAJ
description	Witnessing the tremendous development of machine learning technology, emerging machine learning applications impose challenges of using domain knowledge to improve the accuracy of clustering provided that clustering suffers a compromising accuracy rate despite its advantage of fast procession. In this paper, we model domain knowledge (i.e., background knowledge or side information), respecting some applications as must-link and cannot-link sets, for the sake of collaborating with <i>k</i>-means for better accuracy. We first propose an algorithm for constrained <i>k</i>-means, considering only must-links. The key idea is to consider a set of data points constrained by the must-links as a single data point with a weight equal to the weight sum of the constrained points. Then, for clustering the data points set with cannot-link, we employ minimum-weight matching to assign the data points to the existing clusters. At last, we carried out a numerical simulation to evaluate the proposed algorithms against the UCI datasets, demonstrating that our method outperforms the previous algorithms for constrained <i>k</i>-means as well as the traditional <i>k</i>-means regarding the clustering accuracy rate although with a slightly compromised practical runtime.
first_indexed	2024-03-10T06:55:26Z
format	Article
id	doaj.art-d56fed0eae344330b32f1b354dc65720
institution	Directory Open Access Journal
issn	2227-7390
language	English
last_indexed	2024-03-10T06:55:26Z
publishDate	2021-09-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj.art-d56fed0eae344330b32f1b354dc657202023-11-22T16:29:38ZengMDPI AGMathematics2227-73902021-09-01919239010.3390/math9192390Improved Constrained <i>k</i>-Means Algorithm for Clustering with Domain KnowledgePeihuang Huang0Pei Yao1Zhendong Hao2Huihong Peng3Longkun Guo4College of Mathematics and Data Science, Minjiang University, Fuzhou 350116, ChinaCollege of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, ChinaCollege of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, ChinaCollege of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, ChinaSchool of Computer Science, Qilu University of Technology, Jinan 250353, ChinaWitnessing the tremendous development of machine learning technology, emerging machine learning applications impose challenges of using domain knowledge to improve the accuracy of clustering provided that clustering suffers a compromising accuracy rate despite its advantage of fast procession. In this paper, we model domain knowledge (i.e., background knowledge or side information), respecting some applications as must-link and cannot-link sets, for the sake of collaborating with <i>k</i>-means for better accuracy. We first propose an algorithm for constrained <i>k</i>-means, considering only must-links. The key idea is to consider a set of data points constrained by the must-links as a single data point with a weight equal to the weight sum of the constrained points. Then, for clustering the data points set with cannot-link, we employ minimum-weight matching to assign the data points to the existing clusters. At last, we carried out a numerical simulation to evaluate the proposed algorithms against the UCI datasets, demonstrating that our method outperforms the previous algorithms for constrained <i>k</i>-means as well as the traditional <i>k</i>-means regarding the clustering accuracy rate although with a slightly compromised practical runtime.https://www.mdpi.com/2227-7390/9/19/2390constrained <i>k</i>-meansminimum weight matchingside informationdomain knowledge
spellingShingle	Peihuang Huang Pei Yao Zhendong Hao Huihong Peng Longkun Guo Improved Constrained <i>k</i>-Means Algorithm for Clustering with Domain Knowledge Mathematics constrained <i>k</i>-means minimum weight matching side information domain knowledge
title	Improved Constrained <i>k</i>-Means Algorithm for Clustering with Domain Knowledge
title_full	Improved Constrained <i>k</i>-Means Algorithm for Clustering with Domain Knowledge
title_fullStr	Improved Constrained <i>k</i>-Means Algorithm for Clustering with Domain Knowledge
title_full_unstemmed	Improved Constrained <i>k</i>-Means Algorithm for Clustering with Domain Knowledge
title_short	Improved Constrained <i>k</i>-Means Algorithm for Clustering with Domain Knowledge
title_sort	improved constrained i k i means algorithm for clustering with domain knowledge
topic	constrained <i>k</i>-means minimum weight matching side information domain knowledge
url	https://www.mdpi.com/2227-7390/9/19/2390
work_keys_str_mv	AT peihuanghuang improvedconstrainedikimeansalgorithmforclusteringwithdomainknowledge AT peiyao improvedconstrainedikimeansalgorithmforclusteringwithdomainknowledge AT zhendonghao improvedconstrainedikimeansalgorithmforclusteringwithdomainknowledge AT huihongpeng improvedconstrainedikimeansalgorithmforclusteringwithdomainknowledge AT longkunguo improvedconstrainedikimeansalgorithmforclusteringwithdomainknowledge

Improved Constrained <i>k</i>-Means Algorithm for Clustering with Domain Knowledge

Similar Items