An observation of different clustering algorithms and clustering evaluation criteria for a feature selection based on linear discriminant analysis

Linear discriminant analysis (LDA) is a very popular method for dimensionality reduction in machine learning. Yet, the LDA cannot be implemented directly on unsupervised data as it requires the presence of class labels to train the algorithm. Thus, a clustering algorithm is needed to predict the cla...

Full description

Bibliographic Details
Main Authors: Tie, K. H., A., Senawi, Chuan, Z. L.
Format: Book Chapter
Language:English
Published: Springer Nature Singapore Ptd. Ltd. 2022
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/35517/1/FULL%20TEXT%20PAPER.pdf
_version_ 1796995352300093440
author Tie, K. H.
A., Senawi
Chuan, Z. L.
author_facet Tie, K. H.
A., Senawi
Chuan, Z. L.
author_sort Tie, K. H.
collection UMP
description Linear discriminant analysis (LDA) is a very popular method for dimensionality reduction in machine learning. Yet, the LDA cannot be implemented directly on unsupervised data as it requires the presence of class labels to train the algorithm. Thus, a clustering algorithm is needed to predict the class labels before the LDA can be utilized. However, different clustering algorithms have different parameters that need to be specified. The objective of this paper is to investigate how the parameters behave with a measurement criterion for feature selection, that is, the total error reduction ratio (TERR). The k-means and the Gaussian mixture distribution were adopted as the clustering algorithms and each algorithm was tested on four datasets with four distinct clustering evaluation criteria: Calinski-Harabasz, Davies-Bouldin, Gap and Silhouette. Overall, the k-means outperforms the Gaussian mixture distribution in selecting smaller feature subsets. It was found that if a certain threshold value of the TERR is set and the k-means algorithm is applied, the Calinski-Harabasz, Davies-Bouldin, and Silhouette criteria yield the same number of selected features, less than the feature subset size given by the Gap criterion. When the Gaussian mixture distribution algorithm is adopted, none of the criteria can consistently select features with the least number. The higher the TERR threshold value is set, the more the feature subset size will be, regardless of the type of clustering algorithm and the clustering evaluation criterion are used. These results are essential for future work direction in designing a robust unsupervised feature selection based on LDA.
first_indexed 2024-03-06T13:01:04Z
format Book Chapter
id UMPir35517
institution Universiti Malaysia Pahang
language English
last_indexed 2024-03-06T13:01:04Z
publishDate 2022
publisher Springer Nature Singapore Ptd. Ltd.
record_format dspace
spelling UMPir355172022-10-31T03:12:35Z http://umpir.ump.edu.my/id/eprint/35517/ An observation of different clustering algorithms and clustering evaluation criteria for a feature selection based on linear discriminant analysis Tie, K. H. A., Senawi Chuan, Z. L. HD28 Management. Industrial Management TJ Mechanical engineering and machinery Linear discriminant analysis (LDA) is a very popular method for dimensionality reduction in machine learning. Yet, the LDA cannot be implemented directly on unsupervised data as it requires the presence of class labels to train the algorithm. Thus, a clustering algorithm is needed to predict the class labels before the LDA can be utilized. However, different clustering algorithms have different parameters that need to be specified. The objective of this paper is to investigate how the parameters behave with a measurement criterion for feature selection, that is, the total error reduction ratio (TERR). The k-means and the Gaussian mixture distribution were adopted as the clustering algorithms and each algorithm was tested on four datasets with four distinct clustering evaluation criteria: Calinski-Harabasz, Davies-Bouldin, Gap and Silhouette. Overall, the k-means outperforms the Gaussian mixture distribution in selecting smaller feature subsets. It was found that if a certain threshold value of the TERR is set and the k-means algorithm is applied, the Calinski-Harabasz, Davies-Bouldin, and Silhouette criteria yield the same number of selected features, less than the feature subset size given by the Gap criterion. When the Gaussian mixture distribution algorithm is adopted, none of the criteria can consistently select features with the least number. The higher the TERR threshold value is set, the more the feature subset size will be, regardless of the type of clustering algorithm and the clustering evaluation criterion are used. These results are essential for future work direction in designing a robust unsupervised feature selection based on LDA. Springer Nature Singapore Ptd. Ltd. 2022-05-15 Book Chapter PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/35517/1/FULL%20TEXT%20PAPER.pdf Tie, K. H. and A., Senawi and Chuan, Z. L. (2022) An observation of different clustering algorithms and clustering evaluation criteria for a feature selection based on linear discriminant analysis. In: Enabling Industry 4.0 through Advances in Mechatronics. Lecture Notes in Electrical Engineering book series (LNEE), 800 . Springer Nature Singapore Ptd. Ltd., Singapore, pp. 497-505. ISBN 978-981-19-2094-3(Printed); 978-981-19-2095-0 (Online) https://doi.org/10.1007/978-981-19-2095-0_42 https://doi.org/10.1007/978-981-19-2095-0_42
spellingShingle HD28 Management. Industrial Management
TJ Mechanical engineering and machinery
Tie, K. H.
A., Senawi
Chuan, Z. L.
An observation of different clustering algorithms and clustering evaluation criteria for a feature selection based on linear discriminant analysis
title An observation of different clustering algorithms and clustering evaluation criteria for a feature selection based on linear discriminant analysis
title_full An observation of different clustering algorithms and clustering evaluation criteria for a feature selection based on linear discriminant analysis
title_fullStr An observation of different clustering algorithms and clustering evaluation criteria for a feature selection based on linear discriminant analysis
title_full_unstemmed An observation of different clustering algorithms and clustering evaluation criteria for a feature selection based on linear discriminant analysis
title_short An observation of different clustering algorithms and clustering evaluation criteria for a feature selection based on linear discriminant analysis
title_sort observation of different clustering algorithms and clustering evaluation criteria for a feature selection based on linear discriminant analysis
topic HD28 Management. Industrial Management
TJ Mechanical engineering and machinery
url http://umpir.ump.edu.my/id/eprint/35517/1/FULL%20TEXT%20PAPER.pdf
work_keys_str_mv AT tiekh anobservationofdifferentclusteringalgorithmsandclusteringevaluationcriteriaforafeatureselectionbasedonlineardiscriminantanalysis
AT asenawi anobservationofdifferentclusteringalgorithmsandclusteringevaluationcriteriaforafeatureselectionbasedonlineardiscriminantanalysis
AT chuanzl anobservationofdifferentclusteringalgorithmsandclusteringevaluationcriteriaforafeatureselectionbasedonlineardiscriminantanalysis
AT tiekh observationofdifferentclusteringalgorithmsandclusteringevaluationcriteriaforafeatureselectionbasedonlineardiscriminantanalysis
AT asenawi observationofdifferentclusteringalgorithmsandclusteringevaluationcriteriaforafeatureselectionbasedonlineardiscriminantanalysis
AT chuanzl observationofdifferentclusteringalgorithmsandclusteringevaluationcriteriaforafeatureselectionbasedonlineardiscriminantanalysis