Cross-Project Defect Prediction with Metrics Selection and Balancing Approach

In software development, defects influence the quality and cost in an undesirable way. Software defect prediction (SDP) is one of the techniques which improves the software quality and testing efficiency by early identification of defects(bug/fault/error). Thus, several experiments have been suggest...

Full description

Bibliographic Details
Main Authors: Nevendra Meetesh, Singh Pradeep
Format: Article
Language:English
Published: Sciendo 2022-12-01
Series:Applied Computer Systems
Subjects:
Online Access:https://doi.org/10.2478/acss-2022-0015
_version_ 1811171477703098368
author Nevendra Meetesh
Singh Pradeep
author_facet Nevendra Meetesh
Singh Pradeep
author_sort Nevendra Meetesh
collection DOAJ
description In software development, defects influence the quality and cost in an undesirable way. Software defect prediction (SDP) is one of the techniques which improves the software quality and testing efficiency by early identification of defects(bug/fault/error). Thus, several experiments have been suggested for defect prediction (DP) techniques. Mainly DP method utilises historical project data for constructing prediction models. SDP performs well within projects until there is an adequate amount of data accessible to train the models. However, if the data are inadequate or limited for the same project, the researchers mainly use Cross-Project Defect Prediction (CPDP). CPDP is a possible alternative option that refers to anticipating defects using prediction models built on historical data from other projects. CPDP is challenging due to its data distribution and domain difference problem. The proposed framework is an effective two-stage approach for CPDP, i.e., model generation and prediction process. In model generation phase, the conglomeration of different pre-processing, including feature selection and class reweights technique, is used to improve the initial data quality. Finally, a fine-tuned efficient bagging and boosting based hybrid ensemble model is developed, which avoids model over -fitting/under-fitting and helps enhance the prediction performance. In the prediction process phase, the generated model predicts the historical data from other projects, which has defects or clean. The framework is evaluated using25 software projects obtained from public repositories. The result analysis shows that the proposed model has achieved a 0.71±0.03 f1-score, which significantly improves the state-of-the-art approaches by 23 % to 60 %.
first_indexed 2024-04-10T17:14:34Z
format Article
id doaj.art-f07662aa3faf40fcbd1c33bb7db7ad37
institution Directory Open Access Journal
issn 2255-8691
language English
last_indexed 2024-04-10T17:14:34Z
publishDate 2022-12-01
publisher Sciendo
record_format Article
series Applied Computer Systems
spelling doaj.art-f07662aa3faf40fcbd1c33bb7db7ad372023-02-05T18:30:18ZengSciendoApplied Computer Systems2255-86912022-12-0127213714810.2478/acss-2022-0015Cross-Project Defect Prediction with Metrics Selection and Balancing ApproachNevendra Meetesh0Singh Pradeep1Department of Computer Science & Engineering, National Institute of Technology, Raipur, IndiaDepartment of Computer Science & Engineering, National Institute of Technology, Raipur, IndiaIn software development, defects influence the quality and cost in an undesirable way. Software defect prediction (SDP) is one of the techniques which improves the software quality and testing efficiency by early identification of defects(bug/fault/error). Thus, several experiments have been suggested for defect prediction (DP) techniques. Mainly DP method utilises historical project data for constructing prediction models. SDP performs well within projects until there is an adequate amount of data accessible to train the models. However, if the data are inadequate or limited for the same project, the researchers mainly use Cross-Project Defect Prediction (CPDP). CPDP is a possible alternative option that refers to anticipating defects using prediction models built on historical data from other projects. CPDP is challenging due to its data distribution and domain difference problem. The proposed framework is an effective two-stage approach for CPDP, i.e., model generation and prediction process. In model generation phase, the conglomeration of different pre-processing, including feature selection and class reweights technique, is used to improve the initial data quality. Finally, a fine-tuned efficient bagging and boosting based hybrid ensemble model is developed, which avoids model over -fitting/under-fitting and helps enhance the prediction performance. In the prediction process phase, the generated model predicts the historical data from other projects, which has defects or clean. The framework is evaluated using25 software projects obtained from public repositories. The result analysis shows that the proposed model has achieved a 0.71±0.03 f1-score, which significantly improves the state-of-the-art approaches by 23 % to 60 %.https://doi.org/10.2478/acss-2022-0015adaboostensemblerandom forestsmote
spellingShingle Nevendra Meetesh
Singh Pradeep
Cross-Project Defect Prediction with Metrics Selection and Balancing Approach
Applied Computer Systems
adaboost
ensemble
random forest
smote
title Cross-Project Defect Prediction with Metrics Selection and Balancing Approach
title_full Cross-Project Defect Prediction with Metrics Selection and Balancing Approach
title_fullStr Cross-Project Defect Prediction with Metrics Selection and Balancing Approach
title_full_unstemmed Cross-Project Defect Prediction with Metrics Selection and Balancing Approach
title_short Cross-Project Defect Prediction with Metrics Selection and Balancing Approach
title_sort cross project defect prediction with metrics selection and balancing approach
topic adaboost
ensemble
random forest
smote
url https://doi.org/10.2478/acss-2022-0015
work_keys_str_mv AT nevendrameetesh crossprojectdefectpredictionwithmetricsselectionandbalancingapproach
AT singhpradeep crossprojectdefectpredictionwithmetricsselectionandbalancingapproach