The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study

The ongoing development of computer systems requires massive software projects. Running the components of these huge projects for testing purposes might be a costly process; therefore, parameter estimation can be used instead. Software defect prediction models are crucial for software quality assura...

Full description

Bibliographic Details
Main Authors:	Mohammad Alshayeb, Mashaan A. Alshammari
Format:	Article
Language:	English
Published:	Asociación Española para la Inteligencia Artificial 2021-10-01
Series:	Inteligencia Artificial
Subjects:	Software Defect Prediction Support Vector Machine Feature Selection
Online Access:	https://journal.iberamia.org/index.php/intartif/article/view/638

_version_	1818902392683364352
author	Mohammad Alshayeb Mashaan A. Alshammari
author_facet	Mohammad Alshayeb Mashaan A. Alshammari
author_sort	Mohammad Alshayeb
collection	DOAJ
description	The ongoing development of computer systems requires massive software projects. Running the components of these huge projects for testing purposes might be a costly process; therefore, parameter estimation can be used instead. Software defect prediction models are crucial for software quality assurance. This study investigates the impact of dataset size and feature selection algorithms on software defect prediction models. We use two approaches to build software defect prediction models: a statistical approach and a machine learning approach with support vector machines (SVMs). The fault prediction model was built based on four datasets of different sizes. Additionally, four feature selection algorithms were used. We found that applying the SVM defect prediction model on datasets with a reduced number of measures as features may enhance the accuracy of the fault prediction model. Also, it directs the test effort to maintain the most influential set of metrics. We also found that the running time of the SVM fault prediction model is not consistent with dataset size. Therefore, having fewer metrics does not guarantee a shorter execution time. From the experiments, we found that dataset size has a direct influence on the SVM fault prediction model. However, reduced datasets performed the same or slightly lower than the original datasets.
first_indexed	2024-12-19T20:34:55Z
format	Article
id	doaj.art-8437f3a053d44ca8bd13389c3929b4b3
institution	Directory Open Access Journal
issn	1137-3601 1988-3064
language	English
last_indexed	2024-12-19T20:34:55Z
publishDate	2021-10-01
publisher	Asociación Española para la Inteligencia Artificial
record_format	Article
series	Inteligencia Artificial
spelling	doaj.art-8437f3a053d44ca8bd13389c3929b4b32022-12-21T20:06:34ZengAsociación Española para la Inteligencia ArtificialInteligencia Artificial1137-36011988-30642021-10-01246810.4114/intartif.vol24iss68pp72-88The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical StudyMohammad Alshayeb0Mashaan A. Alshammari1University of Ha'il, Ha'il, Saudi ArabiaKing Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia The ongoing development of computer systems requires massive software projects. Running the components of these huge projects for testing purposes might be a costly process; therefore, parameter estimation can be used instead. Software defect prediction models are crucial for software quality assurance. This study investigates the impact of dataset size and feature selection algorithms on software defect prediction models. We use two approaches to build software defect prediction models: a statistical approach and a machine learning approach with support vector machines (SVMs). The fault prediction model was built based on four datasets of different sizes. Additionally, four feature selection algorithms were used. We found that applying the SVM defect prediction model on datasets with a reduced number of measures as features may enhance the accuracy of the fault prediction model. Also, it directs the test effort to maintain the most influential set of metrics. We also found that the running time of the SVM fault prediction model is not consistent with dataset size. Therefore, having fewer metrics does not guarantee a shorter execution time. From the experiments, we found that dataset size has a direct influence on the SVM fault prediction model. However, reduced datasets performed the same or slightly lower than the original datasets.https://journal.iberamia.org/index.php/intartif/article/view/638Software Defect PredictionSupport Vector MachineFeature Selection
spellingShingle	Mohammad Alshayeb Mashaan A. Alshammari The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study Inteligencia Artificial Software Defect Prediction Support Vector Machine Feature Selection
title	The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study
title_full	The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study
title_fullStr	The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study
title_full_unstemmed	The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study
title_short	The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study
title_sort	effect of the dataset size on the accuracy of software defect prediction models an empirical study
topic	Software Defect Prediction Support Vector Machine Feature Selection
url	https://journal.iberamia.org/index.php/intartif/article/view/638
work_keys_str_mv	AT mohammadalshayeb theeffectofthedatasetsizeontheaccuracyofsoftwaredefectpredictionmodelsanempiricalstudy AT mashaanaalshammari theeffectofthedatasetsizeontheaccuracyofsoftwaredefectpredictionmodelsanempiricalstudy AT mohammadalshayeb effectofthedatasetsizeontheaccuracyofsoftwaredefectpredictionmodelsanempiricalstudy AT mashaanaalshammari effectofthedatasetsizeontheaccuracyofsoftwaredefectpredictionmodelsanempiricalstudy

The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study

Similar Items