CUBOS: An Internal Cluster Validity Index for Categorical Data
Internal cluster validity index is a powerful tool for evaluating clustering performance. The study on internal cluster validity indices for categorical data has been a challenging task due to the difficulty in measuring distance between categorical attribute values. While some efforts have been mad...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek
2019-01-01
|
Series: | Tehnički Vjesnik |
Subjects: | |
Online Access: | https://hrcak.srce.hr/file/320440 |
_version_ | 1797207438252834816 |
---|---|
author | Xiaonan Gao Sen Wu |
author_facet | Xiaonan Gao Sen Wu |
author_sort | Xiaonan Gao |
collection | DOAJ |
description | Internal cluster validity index is a powerful tool for evaluating clustering performance. The study on internal cluster validity indices for categorical data has been a challenging task due to the difficulty in measuring distance between categorical attribute values. While some efforts have been made, they ignore the relationship between different categorical attribute values and the detailed distribution information between data objects. To solve these problems, we propose a novel index called Categorical data cluster Utility Based On Silhouette (CUBOS). Specifically, we first make clear the superiority of the paradigm of Silhouette index in exploring the details of clustering results. Then, we raise the Improved Distance metric for Categorical data (IDC) inspired by Category Distance to measure distance between categorical data exactly. Finally, the paradigm of Silhouette index and IDC are combined to construct the CUBOS, which can overcome the aforementioned shortcomings and produce more accurate evaluation results than other baselines, as shown by the experimental results on several UCI datasets. |
first_indexed | 2024-04-24T09:22:55Z |
format | Article |
id | doaj.art-c887a7ec4bb7440d8ee9c38cdbdf2d1b |
institution | Directory Open Access Journal |
issn | 1330-3651 1848-6339 |
language | English |
last_indexed | 2024-04-24T09:22:55Z |
publishDate | 2019-01-01 |
publisher | Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek |
record_format | Article |
series | Tehnički Vjesnik |
spelling | doaj.art-c887a7ec4bb7440d8ee9c38cdbdf2d1b2024-04-15T15:31:07ZengFaculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in OsijekTehnički Vjesnik1330-36511848-63392019-01-0126248649410.17559/TV-20190109015453CUBOS: An Internal Cluster Validity Index for Categorical DataXiaonan Gao0Sen Wu1Donlinks School of Economics and Management, University of Science and Technology Beijing, 30 Xueyuan Road, Haidian District, Beijing 100083, ChinaDonlinks School of Economics and Management, University of Science and Technology Beijing, 30 Xueyuan Road, Haidian District, Beijing 100083, ChinaInternal cluster validity index is a powerful tool for evaluating clustering performance. The study on internal cluster validity indices for categorical data has been a challenging task due to the difficulty in measuring distance between categorical attribute values. While some efforts have been made, they ignore the relationship between different categorical attribute values and the detailed distribution information between data objects. To solve these problems, we propose a novel index called Categorical data cluster Utility Based On Silhouette (CUBOS). Specifically, we first make clear the superiority of the paradigm of Silhouette index in exploring the details of clustering results. Then, we raise the Improved Distance metric for Categorical data (IDC) inspired by Category Distance to measure distance between categorical data exactly. Finally, the paradigm of Silhouette index and IDC are combined to construct the CUBOS, which can overcome the aforementioned shortcomings and produce more accurate evaluation results than other baselines, as shown by the experimental results on several UCI datasets.https://hrcak.srce.hr/file/320440categorical dataclusteringdistance metricevaluationinternal cluster validity index |
spellingShingle | Xiaonan Gao Sen Wu CUBOS: An Internal Cluster Validity Index for Categorical Data Tehnički Vjesnik categorical data clustering distance metric evaluation internal cluster validity index |
title | CUBOS: An Internal Cluster Validity Index for Categorical Data |
title_full | CUBOS: An Internal Cluster Validity Index for Categorical Data |
title_fullStr | CUBOS: An Internal Cluster Validity Index for Categorical Data |
title_full_unstemmed | CUBOS: An Internal Cluster Validity Index for Categorical Data |
title_short | CUBOS: An Internal Cluster Validity Index for Categorical Data |
title_sort | cubos an internal cluster validity index for categorical data |
topic | categorical data clustering distance metric evaluation internal cluster validity index |
url | https://hrcak.srce.hr/file/320440 |
work_keys_str_mv | AT xiaonangao cubosaninternalclustervalidityindexforcategoricaldata AT senwu cubosaninternalclustervalidityindexforcategoricaldata |