An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method

Data normalization can be useful in eliminating the effect of inconsistent ranges in some machine learning (ML) techniques and in speeding up the optimization process in others. Many studies apply different methods of data normalization with an aim to reduce or eliminate the impact of data variance...

Full description

Bibliographic Details
Main Authors: Haval A. Ahmed, Peshawa J. Muhammad Ali, Abdulbasit K. Faeq, Saman M. Abdullah
Format: Article
Language:English
Published: Koya University 2022-09-01
Series:ARO-The Scientific Journal of Koya University
Subjects:
Online Access:https://aro.koyauniversity.org/index.php/aro/article/view/970
_version_ 1797761377228554240
author Haval A. Ahmed
Peshawa J. Muhammad Ali
Abdulbasit K. Faeq
Saman M. Abdullah
author_facet Haval A. Ahmed
Peshawa J. Muhammad Ali
Abdulbasit K. Faeq
Saman M. Abdullah
author_sort Haval A. Ahmed
collection DOAJ
description Data normalization can be useful in eliminating the effect of inconsistent ranges in some machine learning (ML) techniques and in speeding up the optimization process in others. Many studies apply different methods of data normalization with an aim to reduce or eliminate the impact of data variance on the accuracy rate of ML-based models. However, the significance of this impact aligning with the mathematical concept of the ML algorithms still needs more investigation and tests. To identify that, this work proposes an investigation methodology involving three different ML algorithms, which are support vector machine (SVM), artificial neural network (ANN), and Euclidean-based K-nearest neighbor (E-KNN). Throughout this work, five different datasets have been utilized, and each has been taken from different application fields with different statistical properties. Although there are many data normalization methods available, this work focuses on the min-max method, because it actively eliminates the effect of inconsistent ranges of the datasets. Moreover, other factors that are challenging the process of min-max normalization, such as including or excluding outliers or the least significant feature, have also been considered in this work. The finding of this work shows that each ML technique responds differently to the min-max normalization. The performance of SVM models has been improved, while no significant improvement happened to the performance of ANN models. It is been concluded that the performance of E-KNN models may improve or degrade with the min-max normalization, and it depends on the statistical properties of the dataset.
first_indexed 2024-03-12T19:13:08Z
format Article
id doaj.art-684ae4a99c1a439aad37bd92caaa5dce
institution Directory Open Access Journal
issn 2410-9355
2307-549X
language English
last_indexed 2024-03-12T19:13:08Z
publishDate 2022-09-01
publisher Koya University
record_format Article
series ARO-The Scientific Journal of Koya University
spelling doaj.art-684ae4a99c1a439aad37bd92caaa5dce2023-08-02T05:44:11ZengKoya UniversityARO-The Scientific Journal of Koya University2410-93552307-549X2022-09-0110210.14500/aro.10970An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization MethodHaval A. Ahmed0 Peshawa J. Muhammad Ali1Abdulbasit K. Faeq2Saman M. Abdullah3Department of Software Engineering, Faculty of Engineering, Koya University, Koya KOY45, Kurdistan Region, F.R. IraqDepartment of Software Engineering, Faculty of Engineering, Koya University, Koya KOY45, Kurdistan Region, F.R. IraqDepartment of Software Engineering, Faculty of Engineering, Koya University, Koya KOY45, Kurdistan Region, F.R. Iraq(1) Department of Software Engineering, Faculty of Engineering, Koya University, Koya KOY45, Kurdistan region – F.R. Iraq. (2) Department of Computer Engineering, Faculty of Engineering, Tishk International University, Erbil, Kurdistan Region - F.R. Iraq Data normalization can be useful in eliminating the effect of inconsistent ranges in some machine learning (ML) techniques and in speeding up the optimization process in others. Many studies apply different methods of data normalization with an aim to reduce or eliminate the impact of data variance on the accuracy rate of ML-based models. However, the significance of this impact aligning with the mathematical concept of the ML algorithms still needs more investigation and tests. To identify that, this work proposes an investigation methodology involving three different ML algorithms, which are support vector machine (SVM), artificial neural network (ANN), and Euclidean-based K-nearest neighbor (E-KNN). Throughout this work, five different datasets have been utilized, and each has been taken from different application fields with different statistical properties. Although there are many data normalization methods available, this work focuses on the min-max method, because it actively eliminates the effect of inconsistent ranges of the datasets. Moreover, other factors that are challenging the process of min-max normalization, such as including or excluding outliers or the least significant feature, have also been considered in this work. The finding of this work shows that each ML technique responds differently to the min-max normalization. The performance of SVM models has been improved, while no significant improvement happened to the performance of ANN models. It is been concluded that the performance of E-KNN models may improve or degrade with the min-max normalization, and it depends on the statistical properties of the dataset. https://aro.koyauniversity.org/index.php/aro/article/view/970Min-max normalizationSupport vector machineArtificial neural networkEuclidean-based K-nearest neighborMean squared error
spellingShingle Haval A. Ahmed
Peshawa J. Muhammad Ali
Abdulbasit K. Faeq
Saman M. Abdullah
An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method
ARO-The Scientific Journal of Koya University
Min-max normalization
Support vector machine
Artificial neural network
Euclidean-based K-nearest neighbor
Mean squared error
title An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method
title_full An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method
title_fullStr An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method
title_full_unstemmed An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method
title_short An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method
title_sort investigation on disparity responds of machine learning algorithms to data normalization method
topic Min-max normalization
Support vector machine
Artificial neural network
Euclidean-based K-nearest neighbor
Mean squared error
url https://aro.koyauniversity.org/index.php/aro/article/view/970
work_keys_str_mv AT havalaahmed aninvestigationondisparityrespondsofmachinelearningalgorithmstodatanormalizationmethod
AT peshawajmuhammadali aninvestigationondisparityrespondsofmachinelearningalgorithmstodatanormalizationmethod
AT abdulbasitkfaeq aninvestigationondisparityrespondsofmachinelearningalgorithmstodatanormalizationmethod
AT samanmabdullah aninvestigationondisparityrespondsofmachinelearningalgorithmstodatanormalizationmethod
AT havalaahmed investigationondisparityrespondsofmachinelearningalgorithmstodatanormalizationmethod
AT peshawajmuhammadali investigationondisparityrespondsofmachinelearningalgorithmstodatanormalizationmethod
AT abdulbasitkfaeq investigationondisparityrespondsofmachinelearningalgorithmstodatanormalizationmethod
AT samanmabdullah investigationondisparityrespondsofmachinelearningalgorithmstodatanormalizationmethod