A modified weighted support vector machine (WSVM) to reduce noise data in classification problem

Classification refers to a predictive modeling problem where a class label is predicted for a given example of input data. Data is everywhere and the amount of digital data that exists is growing exponentially. However, data is rarely perfect and there are many inconsistencies that affect data quali...

Full description

Bibliographic Details
Main Author: Mohd Dzulkifli, Syarizul Amri
Format: Thesis
Language:English
English
English
Published: 2021
Subjects:
Online Access:http://eprints.uthm.edu.my/10805/1/24p%20SYARIZUL%20AMRI%20MOHD%20DZULKIFLI.pdf
http://eprints.uthm.edu.my/10805/2/SYARIZUL%20AMRI%20MOHD%20DZULKIFLI%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/10805/3/SYARIZUL%20AMRI%20MOHD%20DZULKIFLI%20WATERMARK.pdf
_version_ 1811134671828811776
author Mohd Dzulkifli, Syarizul Amri
author_facet Mohd Dzulkifli, Syarizul Amri
author_sort Mohd Dzulkifli, Syarizul Amri
collection UTHM
description Classification refers to a predictive modeling problem where a class label is predicted for a given example of input data. Data is everywhere and the amount of digital data that exists is growing exponentially. However, data is rarely perfect and there are many inconsistencies that affect data quality such as noise data. Nowadays, the use of SVM is very perspective for the big data classification. SVM provides a global solution for data classification but SVM is highly sensitive to noise data and may not be effective when the level of noise data is high. When noise exists in training data, the decision boundary of SVM would deviate from the optimal hyperplane severely. To overcome SVM drawback for noise data problem, WSVM using KPCM algorithm was used but WSVM using kernel-based learning algorithm such as KPCM algorithm suffer from training complexity, expensive computation time and storage memory when noise data contaminate training data. Thus, through a simple pruning and speed-up method such as clustering method, WKM-SVM has been proposed. However, WKM-SVM has several limitations that are related to k-Means Clustering. One of the limitations of WKM-SVM is the clustering centers may not suitably represent original data structures which can potentially cause poor prediction results. Therefore, this research work proposes a modified WSVM utilized with instance selection method and weighted learning to improve WSVM training and classification accuracy. The modification of WSVM will reduce noise data by producing multiple hyperplanes and selecting the optimal hyperplane based on the lowest noise data. The overall result shows that the proposed method outperforms WSVM, OWSVM and WKM-SVM in all datasets in terms of classification accuracy. Specifically, the proposed method produces classification accuracy equal to or higher than 85% for three datasets and lower than 85% for six datasets. However, the performance of the proposed method for test data may not be as good as anticipated since most of the datasets produced classification accuracy lower than 85%.
first_indexed 2024-09-24T00:08:36Z
format Thesis
id uthm.eprints-10805
institution Universiti Tun Hussein Onn Malaysia
language English
English
English
last_indexed 2024-09-24T00:08:36Z
publishDate 2021
record_format dspace
spelling uthm.eprints-108052024-05-13T06:56:15Z http://eprints.uthm.edu.my/10805/ A modified weighted support vector machine (WSVM) to reduce noise data in classification problem Mohd Dzulkifli, Syarizul Amri T Technology (General) Classification refers to a predictive modeling problem where a class label is predicted for a given example of input data. Data is everywhere and the amount of digital data that exists is growing exponentially. However, data is rarely perfect and there are many inconsistencies that affect data quality such as noise data. Nowadays, the use of SVM is very perspective for the big data classification. SVM provides a global solution for data classification but SVM is highly sensitive to noise data and may not be effective when the level of noise data is high. When noise exists in training data, the decision boundary of SVM would deviate from the optimal hyperplane severely. To overcome SVM drawback for noise data problem, WSVM using KPCM algorithm was used but WSVM using kernel-based learning algorithm such as KPCM algorithm suffer from training complexity, expensive computation time and storage memory when noise data contaminate training data. Thus, through a simple pruning and speed-up method such as clustering method, WKM-SVM has been proposed. However, WKM-SVM has several limitations that are related to k-Means Clustering. One of the limitations of WKM-SVM is the clustering centers may not suitably represent original data structures which can potentially cause poor prediction results. Therefore, this research work proposes a modified WSVM utilized with instance selection method and weighted learning to improve WSVM training and classification accuracy. The modification of WSVM will reduce noise data by producing multiple hyperplanes and selecting the optimal hyperplane based on the lowest noise data. The overall result shows that the proposed method outperforms WSVM, OWSVM and WKM-SVM in all datasets in terms of classification accuracy. Specifically, the proposed method produces classification accuracy equal to or higher than 85% for three datasets and lower than 85% for six datasets. However, the performance of the proposed method for test data may not be as good as anticipated since most of the datasets produced classification accuracy lower than 85%. 2021-12 Thesis NonPeerReviewed text en http://eprints.uthm.edu.my/10805/1/24p%20SYARIZUL%20AMRI%20MOHD%20DZULKIFLI.pdf text en http://eprints.uthm.edu.my/10805/2/SYARIZUL%20AMRI%20MOHD%20DZULKIFLI%20COPYRIGHT%20DECLARATION.pdf text en http://eprints.uthm.edu.my/10805/3/SYARIZUL%20AMRI%20MOHD%20DZULKIFLI%20WATERMARK.pdf Mohd Dzulkifli, Syarizul Amri (2021) A modified weighted support vector machine (WSVM) to reduce noise data in classification problem. Doctoral thesis, Universiti Tun Hussein Onn Malaysia.
spellingShingle T Technology (General)
Mohd Dzulkifli, Syarizul Amri
A modified weighted support vector machine (WSVM) to reduce noise data in classification problem
title A modified weighted support vector machine (WSVM) to reduce noise data in classification problem
title_full A modified weighted support vector machine (WSVM) to reduce noise data in classification problem
title_fullStr A modified weighted support vector machine (WSVM) to reduce noise data in classification problem
title_full_unstemmed A modified weighted support vector machine (WSVM) to reduce noise data in classification problem
title_short A modified weighted support vector machine (WSVM) to reduce noise data in classification problem
title_sort modified weighted support vector machine wsvm to reduce noise data in classification problem
topic T Technology (General)
url http://eprints.uthm.edu.my/10805/1/24p%20SYARIZUL%20AMRI%20MOHD%20DZULKIFLI.pdf
http://eprints.uthm.edu.my/10805/2/SYARIZUL%20AMRI%20MOHD%20DZULKIFLI%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/10805/3/SYARIZUL%20AMRI%20MOHD%20DZULKIFLI%20WATERMARK.pdf
work_keys_str_mv AT mohddzulkiflisyarizulamri amodifiedweightedsupportvectormachinewsvmtoreducenoisedatainclassificationproblem
AT mohddzulkiflisyarizulamri modifiedweightedsupportvectormachinewsvmtoreducenoisedatainclassificationproblem