Improved clustering using robust and classical principal component
k-means algorithm is a popular data clustering algorithm. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Finding the appropriate number of clusters for a given data set...
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://psasir.upm.edu.my/id/eprint/70922/1/FS%202017%2047%20UPM.pdf |
_version_ | 1796979248440803328 |
---|---|
author | Hassn, Ahmed Kadom |
author_facet | Hassn, Ahmed Kadom |
author_sort | Hassn, Ahmed Kadom |
collection | UPM |
description | k-means algorithm is a popular data clustering algorithm. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Finding the appropriate number of clusters for a given data set is generally a trial-and-error process which made more difficult by the subjective nature of deciding what constitutes ‘correct’ clustering. When dimension of data is large it is often difficult to apply k-means clustering algorithm since it needs lots of computational times. To remedy this problem, we propose to integrate Principal Component analysis (PCA) which is useful for dimensionality reduction of a dataset with the k-means clustering algorithm. We call our propose method as k-means by principal components (pc1). In this study, the kernels that are created by using the k-means method are replaced with kernels which are created by using PCA method where the PCA method reduces the dimensionality of a data. The results of the study show that the k-means by PCA is faster and more efficient than the classical k-means algorithm. The classical k-means algorithm and the k-means by PCA algorithm are very sensitive to the presence of outlier. Hence the k-means by robust PCA is developed to rectify the problem of outliers in the dataset. The findings indicate that in the absence of outliers, the performances of both methods; the k-means by PCA and the k-means by robust PCA are equally good. Nonetheless, the k-means by robust PCA is not much affected by outliers compared to the k-means by classical PCA. |
first_indexed | 2024-03-06T10:06:09Z |
format | Thesis |
id | upm.eprints-70922 |
institution | Universiti Putra Malaysia |
language | English |
last_indexed | 2024-03-06T10:06:09Z |
publishDate | 2017 |
record_format | dspace |
spelling | upm.eprints-709222022-07-07T03:07:15Z http://psasir.upm.edu.my/id/eprint/70922/ Improved clustering using robust and classical principal component Hassn, Ahmed Kadom k-means algorithm is a popular data clustering algorithm. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Finding the appropriate number of clusters for a given data set is generally a trial-and-error process which made more difficult by the subjective nature of deciding what constitutes ‘correct’ clustering. When dimension of data is large it is often difficult to apply k-means clustering algorithm since it needs lots of computational times. To remedy this problem, we propose to integrate Principal Component analysis (PCA) which is useful for dimensionality reduction of a dataset with the k-means clustering algorithm. We call our propose method as k-means by principal components (pc1). In this study, the kernels that are created by using the k-means method are replaced with kernels which are created by using PCA method where the PCA method reduces the dimensionality of a data. The results of the study show that the k-means by PCA is faster and more efficient than the classical k-means algorithm. The classical k-means algorithm and the k-means by PCA algorithm are very sensitive to the presence of outlier. Hence the k-means by robust PCA is developed to rectify the problem of outliers in the dataset. The findings indicate that in the absence of outliers, the performances of both methods; the k-means by PCA and the k-means by robust PCA are equally good. Nonetheless, the k-means by robust PCA is not much affected by outliers compared to the k-means by classical PCA. 2017-06 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/70922/1/FS%202017%2047%20UPM.pdf Hassn, Ahmed Kadom (2017) Improved clustering using robust and classical principal component. Masters thesis, Universiti Putra Malaysia. Algorithms |
spellingShingle | Algorithms Hassn, Ahmed Kadom Improved clustering using robust and classical principal component |
title | Improved clustering using robust and classical principal component |
title_full | Improved clustering using robust and classical principal component |
title_fullStr | Improved clustering using robust and classical principal component |
title_full_unstemmed | Improved clustering using robust and classical principal component |
title_short | Improved clustering using robust and classical principal component |
title_sort | improved clustering using robust and classical principal component |
topic | Algorithms |
url | http://psasir.upm.edu.my/id/eprint/70922/1/FS%202017%2047%20UPM.pdf |
work_keys_str_mv | AT hassnahmedkadom improvedclusteringusingrobustandclassicalprincipalcomponent |