Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index
As a classical data mining technique,clustering is widely used in fields as pattern recognition,machine learning,artificial intelligence,and so on.By effective clustering analysis,the underlying structures of datasets can be identified.As a commonly used partitional clustering algorithm,K-means is s...
Main Author: | |
---|---|
Format: | Article |
Language: | zho |
Published: |
Editorial office of Computer Science
2022-01-01
|
Series: | Jisuanji kexue |
Subjects: | |
Online Access: | https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-1-121.pdf |
_version_ | 1818995661519978496 |
---|---|
author | ZHANG Ya-di, SUN Yue, LIU Feng, ZHU Er-zhou |
author_facet | ZHANG Ya-di, SUN Yue, LIU Feng, ZHU Er-zhou |
author_sort | ZHANG Ya-di, SUN Yue, LIU Feng, ZHU Er-zhou |
collection | DOAJ |
description | As a classical data mining technique,clustering is widely used in fields as pattern recognition,machine learning,artificial intelligence,and so on.By effective clustering analysis,the underlying structures of datasets can be identified.As a commonly used partitional clustering algorithm,K-means is simple of implementation and efficient on classifying large scale datasets.However,due to the influence of the convergence rule,the traditional K-means is still suffering problems as sensitive to the initial clustering centers,cannot properly process non-convex distributed datasets and datasets with outliers.This paper proposes the DC-Kmeans (density parameter and center replacement K-means),an improved K-means algorithm based on the density parameter and center replacement.Due to the gradually selecting of initial clustering centers and continuously update imprecision old centers,the DC-Kmeans is more accurate than the traditional K-means.Two novel methods are also proposed for optimally clustering:1)a novel clustering validity index (CVI),SCVI (Sum of the inner-cluster compactness and the inter-cluster separateness based CVI),is proposed to evaluate the results of the DC-Kmeans;2)a new algorithm,OCNS (optimal clustering number determination based on SCVI),is designed to determine the optimal clustering numbers for different datasets.Experimental results demonstrate that the proposed clustering method is effective for many kinds of datasets. |
first_indexed | 2024-12-20T21:17:24Z |
format | Article |
id | doaj.art-83bb5945f7934b90bcfd29e8c017d5c8 |
institution | Directory Open Access Journal |
issn | 1002-137X |
language | zho |
last_indexed | 2024-12-20T21:17:24Z |
publishDate | 2022-01-01 |
publisher | Editorial office of Computer Science |
record_format | Article |
series | Jisuanji kexue |
spelling | doaj.art-83bb5945f7934b90bcfd29e8c017d5c82022-12-21T19:26:22ZzhoEditorial office of Computer ScienceJisuanji kexue1002-137X2022-01-0149112113210.11896/jsjkx.201100148Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity IndexZHANG Ya-di, SUN Yue, LIU Feng, ZHU Er-zhou0School of Computer Science and Technology,Anhui University,Hefei 230601,ChinaAs a classical data mining technique,clustering is widely used in fields as pattern recognition,machine learning,artificial intelligence,and so on.By effective clustering analysis,the underlying structures of datasets can be identified.As a commonly used partitional clustering algorithm,K-means is simple of implementation and efficient on classifying large scale datasets.However,due to the influence of the convergence rule,the traditional K-means is still suffering problems as sensitive to the initial clustering centers,cannot properly process non-convex distributed datasets and datasets with outliers.This paper proposes the DC-Kmeans (density parameter and center replacement K-means),an improved K-means algorithm based on the density parameter and center replacement.Due to the gradually selecting of initial clustering centers and continuously update imprecision old centers,the DC-Kmeans is more accurate than the traditional K-means.Two novel methods are also proposed for optimally clustering:1)a novel clustering validity index (CVI),SCVI (Sum of the inner-cluster compactness and the inter-cluster separateness based CVI),is proposed to evaluate the results of the DC-Kmeans;2)a new algorithm,OCNS (optimal clustering number determination based on SCVI),is designed to determine the optimal clustering numbers for different datasets.Experimental results demonstrate that the proposed clustering method is effective for many kinds of datasets.https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-1-121.pdfclustering algorithm|clustering validity index|optimal clustering number|cluster center|data mining |
spellingShingle | ZHANG Ya-di, SUN Yue, LIU Feng, ZHU Er-zhou Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index Jisuanji kexue clustering algorithm|clustering validity index|optimal clustering number|cluster center|data mining |
title | Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index |
title_full | Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index |
title_fullStr | Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index |
title_full_unstemmed | Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index |
title_short | Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index |
title_sort | study on density parameter and center replacement combined k means and new clustering validity index |
topic | clustering algorithm|clustering validity index|optimal clustering number|cluster center|data mining |
url | https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-1-121.pdf |
work_keys_str_mv | AT zhangyadisunyueliufengzhuerzhou studyondensityparameterandcenterreplacementcombinedkmeansandnewclusteringvalidityindex |