Summary: | Cluster validity index plays an important role in assessing the quality of clustering results. However, most of the existing validity indices take a trial-and-error strategy, and their correctness depend on not only the measurements of intra- and inter-cluster distances but also the specific clustering algorithms and data structures. Consequently, the applications of these indices are limited in practice. In this paper, we firstly define the total surface area and volume of all clusters in a 2-dimensinal data space, thereby recovering their natural interrelation among various numbers of clusters. On this basis, a novel validity index is proposed to directly assess the clustering results of any dataset, which does not require any trail-and-error process, clustering algorithms, data structures, or the measurements of intra- and inter-cluster distances. In the case of a high-dimensional data space, all clusters are transformed into spherical clusters of normalized size in a 2-dimensinal data space through a multidimensional scaling transformation. Two groups of typical synthetic datasets and real datasets with various characteristics are used to validate the novel validity index.
|