Efficiency of Cluster Validity Indexes in Fuzzy Clusterwise Generalized Structured Component Analysis

Fuzzy clustering has been broadly applied to classify data into <i>K</i> clusters by assigning membership probabilities of each data point close to <i>K</i> centroids. Such a function has been applied into characterizing the clusters associated with a statistical model such a...

Full description

Bibliographic Details
Main Authors: Ji Hoon Ryoo, Seohee Park, Seongeun Kim, Hyun Suk Ryoo
Format: Article
Language:English
Published: MDPI AG 2020-09-01
Series:Symmetry
Subjects:
Online Access:https://www.mdpi.com/2073-8994/12/9/1514
_version_ 1797553734621855744
author Ji Hoon Ryoo
Seohee Park
Seongeun Kim
Hyun Suk Ryoo
author_facet Ji Hoon Ryoo
Seohee Park
Seongeun Kim
Hyun Suk Ryoo
author_sort Ji Hoon Ryoo
collection DOAJ
description Fuzzy clustering has been broadly applied to classify data into <i>K</i> clusters by assigning membership probabilities of each data point close to <i>K</i> centroids. Such a function has been applied into characterizing the clusters associated with a statistical model such as structural equation modeling. The characteristics identified by the statistical model further define the clusters as heterogeneous groups selected from a population. Recently, such statistical model has been formulated as fuzzy clusterwise generalized structured component analysis (fuzzy clusterwise GSCA). The same as in fuzzy clustering, the clusters are enumerated to infer the population and its parameters within the fuzzy clusterwise GSCA. However, the identification of clusters in fuzzy clustering is a difficult task because of the data-dependence of classification indexes, which is known as a cluster validity problem. We examined the cluster validity problem within the fuzzy clusterwise GSCA framework and proposed a new criterion for selecting the most optimal number of clusters using both fit indexes of the GSCA and the fuzzy validity indexes in fuzzy clustering. The criterion, named the FIT-FHV method combining a fit index, FIT, from GSCA and a cluster validation measure, FHV, from fuzzy clustering, performed better than any other indices used in fuzzy clusterwise GSCA.
first_indexed 2024-03-10T16:20:48Z
format Article
id doaj.art-a8efedea7d844d02a59a05123564bdde
institution Directory Open Access Journal
issn 2073-8994
language English
last_indexed 2024-03-10T16:20:48Z
publishDate 2020-09-01
publisher MDPI AG
record_format Article
series Symmetry
spelling doaj.art-a8efedea7d844d02a59a05123564bdde2023-11-20T13:42:24ZengMDPI AGSymmetry2073-89942020-09-01129151410.3390/sym12091514Efficiency of Cluster Validity Indexes in Fuzzy Clusterwise Generalized Structured Component AnalysisJi Hoon Ryoo0Seohee Park1Seongeun Kim2Hyun Suk Ryoo3Department of Education, College of Educational Sciences, Yonsei University, Seoul 03722, KoreaDepartment of Educational Measurement and Statistics, College of Education, University of Iowa, Iowa, IA 52242, USADepartment of Educational Research Methodology, School of Education, University of North Carolina at Greensboro, Greensboro, NC 27412, USADepartment of Computer Science, College of Arts and Science, University of Virginia, Charlottesville, VA 22904, USAFuzzy clustering has been broadly applied to classify data into <i>K</i> clusters by assigning membership probabilities of each data point close to <i>K</i> centroids. Such a function has been applied into characterizing the clusters associated with a statistical model such as structural equation modeling. The characteristics identified by the statistical model further define the clusters as heterogeneous groups selected from a population. Recently, such statistical model has been formulated as fuzzy clusterwise generalized structured component analysis (fuzzy clusterwise GSCA). The same as in fuzzy clustering, the clusters are enumerated to infer the population and its parameters within the fuzzy clusterwise GSCA. However, the identification of clusters in fuzzy clustering is a difficult task because of the data-dependence of classification indexes, which is known as a cluster validity problem. We examined the cluster validity problem within the fuzzy clusterwise GSCA framework and proposed a new criterion for selecting the most optimal number of clusters using both fit indexes of the GSCA and the fuzzy validity indexes in fuzzy clustering. The criterion, named the FIT-FHV method combining a fit index, FIT, from GSCA and a cluster validation measure, FHV, from fuzzy clustering, performed better than any other indices used in fuzzy clusterwise GSCA.https://www.mdpi.com/2073-8994/12/9/1514cluster validity problemFIT-FHV methodfuzzy clusteringfuzzy hypervolume validity indexgeneralized structured component analysisstructural equation modeling
spellingShingle Ji Hoon Ryoo
Seohee Park
Seongeun Kim
Hyun Suk Ryoo
Efficiency of Cluster Validity Indexes in Fuzzy Clusterwise Generalized Structured Component Analysis
Symmetry
cluster validity problem
FIT-FHV method
fuzzy clustering
fuzzy hypervolume validity index
generalized structured component analysis
structural equation modeling
title Efficiency of Cluster Validity Indexes in Fuzzy Clusterwise Generalized Structured Component Analysis
title_full Efficiency of Cluster Validity Indexes in Fuzzy Clusterwise Generalized Structured Component Analysis
title_fullStr Efficiency of Cluster Validity Indexes in Fuzzy Clusterwise Generalized Structured Component Analysis
title_full_unstemmed Efficiency of Cluster Validity Indexes in Fuzzy Clusterwise Generalized Structured Component Analysis
title_short Efficiency of Cluster Validity Indexes in Fuzzy Clusterwise Generalized Structured Component Analysis
title_sort efficiency of cluster validity indexes in fuzzy clusterwise generalized structured component analysis
topic cluster validity problem
FIT-FHV method
fuzzy clustering
fuzzy hypervolume validity index
generalized structured component analysis
structural equation modeling
url https://www.mdpi.com/2073-8994/12/9/1514
work_keys_str_mv AT jihoonryoo efficiencyofclustervalidityindexesinfuzzyclusterwisegeneralizedstructuredcomponentanalysis
AT seoheepark efficiencyofclustervalidityindexesinfuzzyclusterwisegeneralizedstructuredcomponentanalysis
AT seongeunkim efficiencyofclustervalidityindexesinfuzzyclusterwisegeneralizedstructuredcomponentanalysis
AT hyunsukryoo efficiencyofclustervalidityindexesinfuzzyclusterwisegeneralizedstructuredcomponentanalysis