Improved clustering using robust and classical principal component

k-means algorithm is a popular data clustering algorithm. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Finding the appropriate number of clusters for a given data set...

Full description

Bibliographic Details
Main Author: Hassn, Ahmed Kadom
Format: Thesis
Language:English
Published: 2017
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/70922/1/FS%202017%2047%20UPM.pdf
_version_ 1796979248440803328
author Hassn, Ahmed Kadom
author_facet Hassn, Ahmed Kadom
author_sort Hassn, Ahmed Kadom
collection UPM
description k-means algorithm is a popular data clustering algorithm. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Finding the appropriate number of clusters for a given data set is generally a trial-and-error process which made more difficult by the subjective nature of deciding what constitutes ‘correct’ clustering. When dimension of data is large it is often difficult to apply k-means clustering algorithm since it needs lots of computational times. To remedy this problem, we propose to integrate Principal Component analysis (PCA) which is useful for dimensionality reduction of a dataset with the k-means clustering algorithm. We call our propose method as k-means by principal components (pc1). In this study, the kernels that are created by using the k-means method are replaced with kernels which are created by using PCA method where the PCA method reduces the dimensionality of a data. The results of the study show that the k-means by PCA is faster and more efficient than the classical k-means algorithm. The classical k-means algorithm and the k-means by PCA algorithm are very sensitive to the presence of outlier. Hence the k-means by robust PCA is developed to rectify the problem of outliers in the dataset. The findings indicate that in the absence of outliers, the performances of both methods; the k-means by PCA and the k-means by robust PCA are equally good. Nonetheless, the k-means by robust PCA is not much affected by outliers compared to the k-means by classical PCA.
first_indexed 2024-03-06T10:06:09Z
format Thesis
id upm.eprints-70922
institution Universiti Putra Malaysia
language English
last_indexed 2024-03-06T10:06:09Z
publishDate 2017
record_format dspace
spelling upm.eprints-709222022-07-07T03:07:15Z http://psasir.upm.edu.my/id/eprint/70922/ Improved clustering using robust and classical principal component Hassn, Ahmed Kadom k-means algorithm is a popular data clustering algorithm. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Finding the appropriate number of clusters for a given data set is generally a trial-and-error process which made more difficult by the subjective nature of deciding what constitutes ‘correct’ clustering. When dimension of data is large it is often difficult to apply k-means clustering algorithm since it needs lots of computational times. To remedy this problem, we propose to integrate Principal Component analysis (PCA) which is useful for dimensionality reduction of a dataset with the k-means clustering algorithm. We call our propose method as k-means by principal components (pc1). In this study, the kernels that are created by using the k-means method are replaced with kernels which are created by using PCA method where the PCA method reduces the dimensionality of a data. The results of the study show that the k-means by PCA is faster and more efficient than the classical k-means algorithm. The classical k-means algorithm and the k-means by PCA algorithm are very sensitive to the presence of outlier. Hence the k-means by robust PCA is developed to rectify the problem of outliers in the dataset. The findings indicate that in the absence of outliers, the performances of both methods; the k-means by PCA and the k-means by robust PCA are equally good. Nonetheless, the k-means by robust PCA is not much affected by outliers compared to the k-means by classical PCA. 2017-06 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/70922/1/FS%202017%2047%20UPM.pdf Hassn, Ahmed Kadom (2017) Improved clustering using robust and classical principal component. Masters thesis, Universiti Putra Malaysia. Algorithms
spellingShingle Algorithms
Hassn, Ahmed Kadom
Improved clustering using robust and classical principal component
title Improved clustering using robust and classical principal component
title_full Improved clustering using robust and classical principal component
title_fullStr Improved clustering using robust and classical principal component
title_full_unstemmed Improved clustering using robust and classical principal component
title_short Improved clustering using robust and classical principal component
title_sort improved clustering using robust and classical principal component
topic Algorithms
url http://psasir.upm.edu.my/id/eprint/70922/1/FS%202017%2047%20UPM.pdf
work_keys_str_mv AT hassnahmedkadom improvedclusteringusingrobustandclassicalprincipalcomponent