Software Defect Prediction Using Non-Dominated Sorting Genetic Algorithm and k-Nearest Neighbour Classifier

Background: Software Defect Prediction (SDP) is a vital step in software development. SDP aims to identify the most likely defect-prone modules before starting the testing phase, and it helps assign resources and reduces the cost of testing. Aim: Although many machine learning algorithms have been...

Full description

Bibliographic Details
Main Authors:	Mohammad Azzeh, Ali Bou Nassif, Manar Abu Talib, Hajra Iqbal
Format:	Article
Language:	English
Published:	Wroclaw University of Science and Technology 2023-10-01
Series:	e-Informatica Software Engineering Journal
Subjects:	software defect prediction genetic algorithm multi-objective optimization k-nearest neighbor
Online Access:	https://www.e-informatyka.pl/index.php/einformatica/volumes/volume-2024/issue-1/article-3/

_version_	1797425181994516480
author	Mohammad Azzeh Ali Bou Nassif Manar Abu Talib Hajra Iqbal
author_facet	Mohammad Azzeh Ali Bou Nassif Manar Abu Talib Hajra Iqbal
author_sort	Mohammad Azzeh
collection	DOAJ
description	Background: Software Defect Prediction (SDP) is a vital step in software development. SDP aims to identify the most likely defect-prone modules before starting the testing phase, and it helps assign resources and reduces the cost of testing. Aim: Although many machine learning algorithms have been used to classify software modules based on static code metrics, the k-Nearest Neighbors (kNN) method does not greatly improve defect prediction because it requires careful set-up of multiple configuration parameters before it can be used. To address this issue, we used the Non-dominated Sorting Genetic Algorithm (NSGA-II) to optimize the parameters in the kNN classifier with favor to improve SDP accuracy. We used NSGA-II because the existing accuracy metrics often behave differently, making an opposite judgment in evaluating SDP models. This means that changing one parameter might improve one accuracy measure while it decreases the others. Method: The proposed NSGAII-kNN model was evaluated against the classical kNN model and state-of-the-art machine learning algorithms such as Support Vector Machine (SVM), Naïve Bayes (NB), and Random Forest (RF) classifiers. Results: Results indicate that the GA-optimized kNN model yields a higher Matthews Coefficient Correlation and higher balanced accuracy based on ten datasets. Conclusion: The paper concludes that integrating GA with kNN improved defect prediction when applied to large or small or large datasets.
first_indexed	2024-03-09T08:12:21Z
format	Article
id	doaj.art-34030c3b45dc4d3badc0269693ce483c
institution	Directory Open Access Journal
issn	1897-7979 2084-4840
language	English
last_indexed	2024-03-09T08:12:21Z
publishDate	2023-10-01
publisher	Wroclaw University of Science and Technology
record_format	Article
series	e-Informatica Software Engineering Journal
spelling	doaj.art-34030c3b45dc4d3badc0269693ce483c2023-12-02T23:10:17ZengWroclaw University of Science and Technologye-Informatica Software Engineering Journal1897-79792084-48402023-10-0118110.37190/e-Inf240103Software Defect Prediction Using Non-Dominated Sorting Genetic Algorithm and k-Nearest Neighbour ClassifierMohammad AzzehAli Bou NassifManar Abu TalibHajra Iqbal Background: Software Defect Prediction (SDP) is a vital step in software development. SDP aims to identify the most likely defect-prone modules before starting the testing phase, and it helps assign resources and reduces the cost of testing. Aim: Although many machine learning algorithms have been used to classify software modules based on static code metrics, the k-Nearest Neighbors (kNN) method does not greatly improve defect prediction because it requires careful set-up of multiple configuration parameters before it can be used. To address this issue, we used the Non-dominated Sorting Genetic Algorithm (NSGA-II) to optimize the parameters in the kNN classifier with favor to improve SDP accuracy. We used NSGA-II because the existing accuracy metrics often behave differently, making an opposite judgment in evaluating SDP models. This means that changing one parameter might improve one accuracy measure while it decreases the others. Method: The proposed NSGAII-kNN model was evaluated against the classical kNN model and state-of-the-art machine learning algorithms such as Support Vector Machine (SVM), Naïve Bayes (NB), and Random Forest (RF) classifiers. Results: Results indicate that the GA-optimized kNN model yields a higher Matthews Coefficient Correlation and higher balanced accuracy based on ten datasets. Conclusion: The paper concludes that integrating GA with kNN improved defect prediction when applied to large or small or large datasets. https://www.e-informatyka.pl/index.php/einformatica/volumes/volume-2024/issue-1/article-3/software defect predictiongenetic algorithmmulti-objective optimizationk-nearest neighbor
spellingShingle	Mohammad Azzeh Ali Bou Nassif Manar Abu Talib Hajra Iqbal Software Defect Prediction Using Non-Dominated Sorting Genetic Algorithm and k-Nearest Neighbour Classifier e-Informatica Software Engineering Journal software defect prediction genetic algorithm multi-objective optimization k-nearest neighbor
title	Software Defect Prediction Using Non-Dominated Sorting Genetic Algorithm and k-Nearest Neighbour Classifier
title_full	Software Defect Prediction Using Non-Dominated Sorting Genetic Algorithm and k-Nearest Neighbour Classifier
title_fullStr	Software Defect Prediction Using Non-Dominated Sorting Genetic Algorithm and k-Nearest Neighbour Classifier
title_full_unstemmed	Software Defect Prediction Using Non-Dominated Sorting Genetic Algorithm and k-Nearest Neighbour Classifier
title_short	Software Defect Prediction Using Non-Dominated Sorting Genetic Algorithm and k-Nearest Neighbour Classifier
title_sort	software defect prediction using non dominated sorting genetic algorithm and k nearest neighbour classifier
topic	software defect prediction genetic algorithm multi-objective optimization k-nearest neighbor
url	https://www.e-informatyka.pl/index.php/einformatica/volumes/volume-2024/issue-1/article-3/
work_keys_str_mv	AT mohammadazzeh softwaredefectpredictionusingnondominatedsortinggeneticalgorithmandknearestneighbourclassifier AT alibounassif softwaredefectpredictionusingnondominatedsortinggeneticalgorithmandknearestneighbourclassifier AT manarabutalib softwaredefectpredictionusingnondominatedsortinggeneticalgorithmandknearestneighbourclassifier AT hajraiqbal softwaredefectpredictionusingnondominatedsortinggeneticalgorithmandknearestneighbourclassifier

Software Defect Prediction Using Non-Dominated Sorting Genetic Algorithm and k-Nearest Neighbour Classifier

Similar Items