The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study

The ongoing development of computer systems requires massive software projects. Running the components of these huge projects for testing purposes might be a costly process; therefore, parameter estimation can be used instead. Software defect prediction models are crucial for software quality assura...

Full description

Bibliographic Details
Main Authors: Mohammad Alshayeb, Mashaan A. Alshammari
Format: Article
Language:English
Published: Asociación Española para la Inteligencia Artificial 2021-10-01
Series:Inteligencia Artificial
Subjects:
Online Access:https://journal.iberamia.org/index.php/intartif/article/view/638
_version_ 1818902392683364352
author Mohammad Alshayeb
Mashaan A. Alshammari
author_facet Mohammad Alshayeb
Mashaan A. Alshammari
author_sort Mohammad Alshayeb
collection DOAJ
description The ongoing development of computer systems requires massive software projects. Running the components of these huge projects for testing purposes might be a costly process; therefore, parameter estimation can be used instead. Software defect prediction models are crucial for software quality assurance. This study investigates the impact of dataset size and feature selection algorithms on software defect prediction models. We use two approaches to build software defect prediction models: a statistical approach and a machine learning approach with support vector machines (SVMs). The fault prediction model was built based on four datasets of different sizes. Additionally, four feature selection algorithms were used. We found that applying the SVM defect prediction model on datasets with a reduced number of measures as features may enhance the accuracy of the fault prediction model. Also, it directs the test effort to maintain the most influential set of metrics. We also found that the running time of the SVM fault prediction model is not consistent with dataset size. Therefore, having fewer metrics does not guarantee a shorter execution time. From the experiments, we found that dataset size has a direct influence on the SVM fault prediction model. However, reduced datasets performed the same or slightly lower than the original datasets.
first_indexed 2024-12-19T20:34:55Z
format Article
id doaj.art-8437f3a053d44ca8bd13389c3929b4b3
institution Directory Open Access Journal
issn 1137-3601
1988-3064
language English
last_indexed 2024-12-19T20:34:55Z
publishDate 2021-10-01
publisher Asociación Española para la Inteligencia Artificial
record_format Article
series Inteligencia Artificial
spelling doaj.art-8437f3a053d44ca8bd13389c3929b4b32022-12-21T20:06:34ZengAsociación Española para la Inteligencia ArtificialInteligencia Artificial1137-36011988-30642021-10-01246810.4114/intartif.vol24iss68pp72-88The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical StudyMohammad Alshayeb0Mashaan A. Alshammari1University of Ha'il, Ha'il, Saudi ArabiaKing Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia The ongoing development of computer systems requires massive software projects. Running the components of these huge projects for testing purposes might be a costly process; therefore, parameter estimation can be used instead. Software defect prediction models are crucial for software quality assurance. This study investigates the impact of dataset size and feature selection algorithms on software defect prediction models. We use two approaches to build software defect prediction models: a statistical approach and a machine learning approach with support vector machines (SVMs). The fault prediction model was built based on four datasets of different sizes. Additionally, four feature selection algorithms were used. We found that applying the SVM defect prediction model on datasets with a reduced number of measures as features may enhance the accuracy of the fault prediction model. Also, it directs the test effort to maintain the most influential set of metrics. We also found that the running time of the SVM fault prediction model is not consistent with dataset size. Therefore, having fewer metrics does not guarantee a shorter execution time. From the experiments, we found that dataset size has a direct influence on the SVM fault prediction model. However, reduced datasets performed the same or slightly lower than the original datasets.https://journal.iberamia.org/index.php/intartif/article/view/638Software Defect PredictionSupport Vector MachineFeature Selection
spellingShingle Mohammad Alshayeb
Mashaan A. Alshammari
The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study
Inteligencia Artificial
Software Defect Prediction
Support Vector Machine
Feature Selection
title The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study
title_full The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study
title_fullStr The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study
title_full_unstemmed The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study
title_short The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study
title_sort effect of the dataset size on the accuracy of software defect prediction models an empirical study
topic Software Defect Prediction
Support Vector Machine
Feature Selection
url https://journal.iberamia.org/index.php/intartif/article/view/638
work_keys_str_mv AT mohammadalshayeb theeffectofthedatasetsizeontheaccuracyofsoftwaredefectpredictionmodelsanempiricalstudy
AT mashaanaalshammari theeffectofthedatasetsizeontheaccuracyofsoftwaredefectpredictionmodelsanempiricalstudy
AT mohammadalshayeb effectofthedatasetsizeontheaccuracyofsoftwaredefectpredictionmodelsanempiricalstudy
AT mashaanaalshammari effectofthedatasetsizeontheaccuracyofsoftwaredefectpredictionmodelsanempiricalstudy