CUBOS: An Internal Cluster Validity Index for Categorical Data

Internal cluster validity index is a powerful tool for evaluating clustering performance. The study on internal cluster validity indices for categorical data has been a challenging task due to the difficulty in measuring distance between categorical attribute values. While some efforts have been mad...

Full description

Bibliographic Details
Main Authors: Xiaonan Gao, Sen Wu
Format: Article
Language:English
Published: Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek 2019-01-01
Series:Tehnički Vjesnik
Subjects:
Online Access:https://hrcak.srce.hr/file/320440
_version_ 1797207438252834816
author Xiaonan Gao
Sen Wu
author_facet Xiaonan Gao
Sen Wu
author_sort Xiaonan Gao
collection DOAJ
description Internal cluster validity index is a powerful tool for evaluating clustering performance. The study on internal cluster validity indices for categorical data has been a challenging task due to the difficulty in measuring distance between categorical attribute values. While some efforts have been made, they ignore the relationship between different categorical attribute values and the detailed distribution information between data objects. To solve these problems, we propose a novel index called Categorical data cluster Utility Based On Silhouette (CUBOS). Specifically, we first make clear the superiority of the paradigm of Silhouette index in exploring the details of clustering results. Then, we raise the Improved Distance metric for Categorical data (IDC) inspired by Category Distance to measure distance between categorical data exactly. Finally, the paradigm of Silhouette index and IDC are combined to construct the CUBOS, which can overcome the aforementioned shortcomings and produce more accurate evaluation results than other baselines, as shown by the experimental results on several UCI datasets.
first_indexed 2024-04-24T09:22:55Z
format Article
id doaj.art-c887a7ec4bb7440d8ee9c38cdbdf2d1b
institution Directory Open Access Journal
issn 1330-3651
1848-6339
language English
last_indexed 2024-04-24T09:22:55Z
publishDate 2019-01-01
publisher Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek
record_format Article
series Tehnički Vjesnik
spelling doaj.art-c887a7ec4bb7440d8ee9c38cdbdf2d1b2024-04-15T15:31:07ZengFaculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in OsijekTehnički Vjesnik1330-36511848-63392019-01-0126248649410.17559/TV-20190109015453CUBOS: An Internal Cluster Validity Index for Categorical DataXiaonan Gao0Sen Wu1Donlinks School of Economics and Management, University of Science and Technology Beijing, 30 Xueyuan Road, Haidian District, Beijing 100083, ChinaDonlinks School of Economics and Management, University of Science and Technology Beijing, 30 Xueyuan Road, Haidian District, Beijing 100083, ChinaInternal cluster validity index is a powerful tool for evaluating clustering performance. The study on internal cluster validity indices for categorical data has been a challenging task due to the difficulty in measuring distance between categorical attribute values. While some efforts have been made, they ignore the relationship between different categorical attribute values and the detailed distribution information between data objects. To solve these problems, we propose a novel index called Categorical data cluster Utility Based On Silhouette (CUBOS). Specifically, we first make clear the superiority of the paradigm of Silhouette index in exploring the details of clustering results. Then, we raise the Improved Distance metric for Categorical data (IDC) inspired by Category Distance to measure distance between categorical data exactly. Finally, the paradigm of Silhouette index and IDC are combined to construct the CUBOS, which can overcome the aforementioned shortcomings and produce more accurate evaluation results than other baselines, as shown by the experimental results on several UCI datasets.https://hrcak.srce.hr/file/320440categorical dataclusteringdistance metricevaluationinternal cluster validity index
spellingShingle Xiaonan Gao
Sen Wu
CUBOS: An Internal Cluster Validity Index for Categorical Data
Tehnički Vjesnik
categorical data
clustering
distance metric
evaluation
internal cluster validity index
title CUBOS: An Internal Cluster Validity Index for Categorical Data
title_full CUBOS: An Internal Cluster Validity Index for Categorical Data
title_fullStr CUBOS: An Internal Cluster Validity Index for Categorical Data
title_full_unstemmed CUBOS: An Internal Cluster Validity Index for Categorical Data
title_short CUBOS: An Internal Cluster Validity Index for Categorical Data
title_sort cubos an internal cluster validity index for categorical data
topic categorical data
clustering
distance metric
evaluation
internal cluster validity index
url https://hrcak.srce.hr/file/320440
work_keys_str_mv AT xiaonangao cubosaninternalclustervalidityindexforcategoricaldata
AT senwu cubosaninternalclustervalidityindexforcategoricaldata