Volume and Surface Area-Based Cluster Validity Index

Cluster validity index plays an important role in assessing the quality of clustering results. However, most of the existing validity indices take a trial-and-error strategy, and their correctness depend on not only the measurements of intra- and inter-cluster distances but also the specific cluster...

Full description

Bibliographic Details
Main Authors: Qi Li, Shihong Yue, Mingliang Ding
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8967111/
Description
Summary:Cluster validity index plays an important role in assessing the quality of clustering results. However, most of the existing validity indices take a trial-and-error strategy, and their correctness depend on not only the measurements of intra- and inter-cluster distances but also the specific clustering algorithms and data structures. Consequently, the applications of these indices are limited in practice. In this paper, we firstly define the total surface area and volume of all clusters in a 2-dimensinal data space, thereby recovering their natural interrelation among various numbers of clusters. On this basis, a novel validity index is proposed to directly assess the clustering results of any dataset, which does not require any trail-and-error process, clustering algorithms, data structures, or the measurements of intra- and inter-cluster distances. In the case of a high-dimensional data space, all clusters are transformed into spherical clusters of normalized size in a 2-dimensinal data space through a multidimensional scaling transformation. Two groups of typical synthetic datasets and real datasets with various characteristics are used to validate the novel validity index.
ISSN:2169-3536