Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data

Identifying defective software entities is essential to ensure software quality during software development. However, the high dimensionality and class distribution imbalance of software defect data seriously affect software defect prediction performance. In order to solve this problem, this paper p...

Full description

Bibliographic Details
Main Authors: Haitao He, Xu Zhang, Qian Wang, Jiadong Ren, Jiaxin Liu, Xiaolin Zhao, Yongqiang Cheng
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8793088/
_version_ 1818566465245151232
author Haitao He
Xu Zhang
Qian Wang
Jiadong Ren
Jiaxin Liu
Xiaolin Zhao
Yongqiang Cheng
author_facet Haitao He
Xu Zhang
Qian Wang
Jiadong Ren
Jiaxin Liu
Xiaolin Zhao
Yongqiang Cheng
author_sort Haitao He
collection DOAJ
description Identifying defective software entities is essential to ensure software quality during software development. However, the high dimensionality and class distribution imbalance of software defect data seriously affect software defect prediction performance. In order to solve this problem, this paper proposes an Ensemble MultiBoost based on RIPPER classifier for prediction of imbalanced Software Defect data, called EMR_SD. Firstly, the algorithm uses principal component analysis (PCA) method to find out the most effective features from the original features of the data set, so as to achieve the purpose of dimensionality reduction and redundancy removal. Furthermore, the combined sampling method of adaptive synthetic sampling (ADASYN) and random sampling without replacement is performed to solve the problem of data class imbalance. This classifier establishes association rules based on attributes and classes, using MultiBoost to reduce deviation and variance, so as to achieve the purpose of reducing classification error. The proposed prediction model is evaluated experimentally on the NASA MDP public datasets and compared with existing similar algorithms. The results show that EMR_SD algorithm is superior to DNC, CEL and other defect prediction techniques in most evaluation indicators, which proves the effectiveness of the algorithm.
first_indexed 2024-12-14T01:54:01Z
format Article
id doaj.art-3eb2a36e8fab4a718f3da253b6300d3a
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-14T01:54:01Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-3eb2a36e8fab4a718f3da253b6300d3a2022-12-21T23:21:16ZengIEEEIEEE Access2169-35362019-01-01711033311034310.1109/ACCESS.2019.29341288793088Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect DataHaitao He0Xu Zhang1Qian Wang2https://orcid.org/0000-0001-7159-1424Jiadong Ren3Jiaxin Liu4Xiaolin Zhao5https://orcid.org/0000-0002-9741-2954Yongqiang Cheng6https://orcid.org/0000-0001-7282-7638Computer Virtual Technology and System Integration Laboratory of Hebei Province, College of Information Science and Engineering, Yanshan University, Qinhuangdao, ChinaComputer Virtual Technology and System Integration Laboratory of Hebei Province, College of Information Science and Engineering, Yanshan University, Qinhuangdao, ChinaComputer Virtual Technology and System Integration Laboratory of Hebei Province, College of Information Science and Engineering, Yanshan University, Qinhuangdao, ChinaComputer Virtual Technology and System Integration Laboratory of Hebei Province, College of Information Science and Engineering, Yanshan University, Qinhuangdao, ChinaComputer Virtual Technology and System Integration Laboratory of Hebei Province, College of Information Science and Engineering, Yanshan University, Qinhuangdao, ChinaBeijing Key Laboratory of Software Security Engineering Technology, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, ChinaDepartment of Computer Science, University of Hull, Hull, U.K.Identifying defective software entities is essential to ensure software quality during software development. However, the high dimensionality and class distribution imbalance of software defect data seriously affect software defect prediction performance. In order to solve this problem, this paper proposes an Ensemble MultiBoost based on RIPPER classifier for prediction of imbalanced Software Defect data, called EMR_SD. Firstly, the algorithm uses principal component analysis (PCA) method to find out the most effective features from the original features of the data set, so as to achieve the purpose of dimensionality reduction and redundancy removal. Furthermore, the combined sampling method of adaptive synthetic sampling (ADASYN) and random sampling without replacement is performed to solve the problem of data class imbalance. This classifier establishes association rules based on attributes and classes, using MultiBoost to reduce deviation and variance, so as to achieve the purpose of reducing classification error. The proposed prediction model is evaluated experimentally on the NASA MDP public datasets and compared with existing similar algorithms. The results show that EMR_SD algorithm is superior to DNC, CEL and other defect prediction techniques in most evaluation indicators, which proves the effectiveness of the algorithm.https://ieeexplore.ieee.org/document/8793088/Software defect predictionclass imbalancecombined samplingrule learningMultiBoost
spellingShingle Haitao He
Xu Zhang
Qian Wang
Jiadong Ren
Jiaxin Liu
Xiaolin Zhao
Yongqiang Cheng
Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data
IEEE Access
Software defect prediction
class imbalance
combined sampling
rule learning
MultiBoost
title Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data
title_full Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data
title_fullStr Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data
title_full_unstemmed Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data
title_short Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data
title_sort ensemble multiboost based on ripper classifier for prediction of imbalanced software defect data
topic Software defect prediction
class imbalance
combined sampling
rule learning
MultiBoost
url https://ieeexplore.ieee.org/document/8793088/
work_keys_str_mv AT haitaohe ensemblemultiboostbasedonripperclassifierforpredictionofimbalancedsoftwaredefectdata
AT xuzhang ensemblemultiboostbasedonripperclassifierforpredictionofimbalancedsoftwaredefectdata
AT qianwang ensemblemultiboostbasedonripperclassifierforpredictionofimbalancedsoftwaredefectdata
AT jiadongren ensemblemultiboostbasedonripperclassifierforpredictionofimbalancedsoftwaredefectdata
AT jiaxinliu ensemblemultiboostbasedonripperclassifierforpredictionofimbalancedsoftwaredefectdata
AT xiaolinzhao ensemblemultiboostbasedonripperclassifierforpredictionofimbalancedsoftwaredefectdata
AT yongqiangcheng ensemblemultiboostbasedonripperclassifierforpredictionofimbalancedsoftwaredefectdata