An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method
Data normalization can be useful in eliminating the effect of inconsistent ranges in some machine learning (ML) techniques and in speeding up the optimization process in others. Many studies apply different methods of data normalization with an aim to reduce or eliminate the impact of data variance...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Koya University
2022-09-01
|
Series: | ARO-The Scientific Journal of Koya University |
Subjects: | |
Online Access: | https://aro.koyauniversity.org/index.php/aro/article/view/970 |
_version_ | 1797761377228554240 |
---|---|
author | Haval A. Ahmed Peshawa J. Muhammad Ali Abdulbasit K. Faeq Saman M. Abdullah |
author_facet | Haval A. Ahmed Peshawa J. Muhammad Ali Abdulbasit K. Faeq Saman M. Abdullah |
author_sort | Haval A. Ahmed |
collection | DOAJ |
description |
Data normalization can be useful in eliminating the effect of inconsistent ranges in some machine learning (ML) techniques and in speeding up the optimization process in others. Many studies apply different methods of data normalization with an aim to reduce or eliminate the impact of data variance on the accuracy rate of ML-based models. However, the significance of this impact aligning with the mathematical concept of the ML algorithms still needs more investigation and tests. To identify that, this work proposes an investigation methodology involving three different ML algorithms, which are support vector machine (SVM), artificial neural network (ANN), and Euclidean-based K-nearest neighbor (E-KNN). Throughout this work, five different datasets have been utilized, and each has been taken from different application fields with different statistical properties. Although there are many data normalization methods available, this work focuses on the min-max method, because it actively eliminates the effect of inconsistent ranges of the datasets. Moreover, other factors that are challenging the process of min-max normalization, such as including or excluding outliers or the least significant feature, have also been considered in this work. The finding of this work shows that each ML technique responds differently to the min-max normalization. The performance of SVM models has been improved, while no significant improvement happened to the performance of ANN models. It is been concluded that the performance of E-KNN models may improve or degrade with the min-max normalization, and it depends on the statistical properties of the dataset.
|
first_indexed | 2024-03-12T19:13:08Z |
format | Article |
id | doaj.art-684ae4a99c1a439aad37bd92caaa5dce |
institution | Directory Open Access Journal |
issn | 2410-9355 2307-549X |
language | English |
last_indexed | 2024-03-12T19:13:08Z |
publishDate | 2022-09-01 |
publisher | Koya University |
record_format | Article |
series | ARO-The Scientific Journal of Koya University |
spelling | doaj.art-684ae4a99c1a439aad37bd92caaa5dce2023-08-02T05:44:11ZengKoya UniversityARO-The Scientific Journal of Koya University2410-93552307-549X2022-09-0110210.14500/aro.10970An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization MethodHaval A. Ahmed0 Peshawa J. Muhammad Ali1Abdulbasit K. Faeq2Saman M. Abdullah3Department of Software Engineering, Faculty of Engineering, Koya University, Koya KOY45, Kurdistan Region, F.R. IraqDepartment of Software Engineering, Faculty of Engineering, Koya University, Koya KOY45, Kurdistan Region, F.R. IraqDepartment of Software Engineering, Faculty of Engineering, Koya University, Koya KOY45, Kurdistan Region, F.R. Iraq(1) Department of Software Engineering, Faculty of Engineering, Koya University, Koya KOY45, Kurdistan region – F.R. Iraq. (2) Department of Computer Engineering, Faculty of Engineering, Tishk International University, Erbil, Kurdistan Region - F.R. Iraq Data normalization can be useful in eliminating the effect of inconsistent ranges in some machine learning (ML) techniques and in speeding up the optimization process in others. Many studies apply different methods of data normalization with an aim to reduce or eliminate the impact of data variance on the accuracy rate of ML-based models. However, the significance of this impact aligning with the mathematical concept of the ML algorithms still needs more investigation and tests. To identify that, this work proposes an investigation methodology involving three different ML algorithms, which are support vector machine (SVM), artificial neural network (ANN), and Euclidean-based K-nearest neighbor (E-KNN). Throughout this work, five different datasets have been utilized, and each has been taken from different application fields with different statistical properties. Although there are many data normalization methods available, this work focuses on the min-max method, because it actively eliminates the effect of inconsistent ranges of the datasets. Moreover, other factors that are challenging the process of min-max normalization, such as including or excluding outliers or the least significant feature, have also been considered in this work. The finding of this work shows that each ML technique responds differently to the min-max normalization. The performance of SVM models has been improved, while no significant improvement happened to the performance of ANN models. It is been concluded that the performance of E-KNN models may improve or degrade with the min-max normalization, and it depends on the statistical properties of the dataset. https://aro.koyauniversity.org/index.php/aro/article/view/970Min-max normalizationSupport vector machineArtificial neural networkEuclidean-based K-nearest neighborMean squared error |
spellingShingle | Haval A. Ahmed Peshawa J. Muhammad Ali Abdulbasit K. Faeq Saman M. Abdullah An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method ARO-The Scientific Journal of Koya University Min-max normalization Support vector machine Artificial neural network Euclidean-based K-nearest neighbor Mean squared error |
title | An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method |
title_full | An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method |
title_fullStr | An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method |
title_full_unstemmed | An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method |
title_short | An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method |
title_sort | investigation on disparity responds of machine learning algorithms to data normalization method |
topic | Min-max normalization Support vector machine Artificial neural network Euclidean-based K-nearest neighbor Mean squared error |
url | https://aro.koyauniversity.org/index.php/aro/article/view/970 |
work_keys_str_mv | AT havalaahmed aninvestigationondisparityrespondsofmachinelearningalgorithmstodatanormalizationmethod AT peshawajmuhammadali aninvestigationondisparityrespondsofmachinelearningalgorithmstodatanormalizationmethod AT abdulbasitkfaeq aninvestigationondisparityrespondsofmachinelearningalgorithmstodatanormalizationmethod AT samanmabdullah aninvestigationondisparityrespondsofmachinelearningalgorithmstodatanormalizationmethod AT havalaahmed investigationondisparityrespondsofmachinelearningalgorithmstodatanormalizationmethod AT peshawajmuhammadali investigationondisparityrespondsofmachinelearningalgorithmstodatanormalizationmethod AT abdulbasitkfaeq investigationondisparityrespondsofmachinelearningalgorithmstodatanormalizationmethod AT samanmabdullah investigationondisparityrespondsofmachinelearningalgorithmstodatanormalizationmethod |