Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data
Identifying defective software entities is essential to ensure software quality during software development. However, the high dimensionality and class distribution imbalance of software defect data seriously affect software defect prediction performance. In order to solve this problem, this paper p...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8793088/ |
_version_ | 1818566465245151232 |
---|---|
author | Haitao He Xu Zhang Qian Wang Jiadong Ren Jiaxin Liu Xiaolin Zhao Yongqiang Cheng |
author_facet | Haitao He Xu Zhang Qian Wang Jiadong Ren Jiaxin Liu Xiaolin Zhao Yongqiang Cheng |
author_sort | Haitao He |
collection | DOAJ |
description | Identifying defective software entities is essential to ensure software quality during software development. However, the high dimensionality and class distribution imbalance of software defect data seriously affect software defect prediction performance. In order to solve this problem, this paper proposes an Ensemble MultiBoost based on RIPPER classifier for prediction of imbalanced Software Defect data, called EMR_SD. Firstly, the algorithm uses principal component analysis (PCA) method to find out the most effective features from the original features of the data set, so as to achieve the purpose of dimensionality reduction and redundancy removal. Furthermore, the combined sampling method of adaptive synthetic sampling (ADASYN) and random sampling without replacement is performed to solve the problem of data class imbalance. This classifier establishes association rules based on attributes and classes, using MultiBoost to reduce deviation and variance, so as to achieve the purpose of reducing classification error. The proposed prediction model is evaluated experimentally on the NASA MDP public datasets and compared with existing similar algorithms. The results show that EMR_SD algorithm is superior to DNC, CEL and other defect prediction techniques in most evaluation indicators, which proves the effectiveness of the algorithm. |
first_indexed | 2024-12-14T01:54:01Z |
format | Article |
id | doaj.art-3eb2a36e8fab4a718f3da253b6300d3a |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-14T01:54:01Z |
publishDate | 2019-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-3eb2a36e8fab4a718f3da253b6300d3a2022-12-21T23:21:16ZengIEEEIEEE Access2169-35362019-01-01711033311034310.1109/ACCESS.2019.29341288793088Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect DataHaitao He0Xu Zhang1Qian Wang2https://orcid.org/0000-0001-7159-1424Jiadong Ren3Jiaxin Liu4Xiaolin Zhao5https://orcid.org/0000-0002-9741-2954Yongqiang Cheng6https://orcid.org/0000-0001-7282-7638Computer Virtual Technology and System Integration Laboratory of Hebei Province, College of Information Science and Engineering, Yanshan University, Qinhuangdao, ChinaComputer Virtual Technology and System Integration Laboratory of Hebei Province, College of Information Science and Engineering, Yanshan University, Qinhuangdao, ChinaComputer Virtual Technology and System Integration Laboratory of Hebei Province, College of Information Science and Engineering, Yanshan University, Qinhuangdao, ChinaComputer Virtual Technology and System Integration Laboratory of Hebei Province, College of Information Science and Engineering, Yanshan University, Qinhuangdao, ChinaComputer Virtual Technology and System Integration Laboratory of Hebei Province, College of Information Science and Engineering, Yanshan University, Qinhuangdao, ChinaBeijing Key Laboratory of Software Security Engineering Technology, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, ChinaDepartment of Computer Science, University of Hull, Hull, U.K.Identifying defective software entities is essential to ensure software quality during software development. However, the high dimensionality and class distribution imbalance of software defect data seriously affect software defect prediction performance. In order to solve this problem, this paper proposes an Ensemble MultiBoost based on RIPPER classifier for prediction of imbalanced Software Defect data, called EMR_SD. Firstly, the algorithm uses principal component analysis (PCA) method to find out the most effective features from the original features of the data set, so as to achieve the purpose of dimensionality reduction and redundancy removal. Furthermore, the combined sampling method of adaptive synthetic sampling (ADASYN) and random sampling without replacement is performed to solve the problem of data class imbalance. This classifier establishes association rules based on attributes and classes, using MultiBoost to reduce deviation and variance, so as to achieve the purpose of reducing classification error. The proposed prediction model is evaluated experimentally on the NASA MDP public datasets and compared with existing similar algorithms. The results show that EMR_SD algorithm is superior to DNC, CEL and other defect prediction techniques in most evaluation indicators, which proves the effectiveness of the algorithm.https://ieeexplore.ieee.org/document/8793088/Software defect predictionclass imbalancecombined samplingrule learningMultiBoost |
spellingShingle | Haitao He Xu Zhang Qian Wang Jiadong Ren Jiaxin Liu Xiaolin Zhao Yongqiang Cheng Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data IEEE Access Software defect prediction class imbalance combined sampling rule learning MultiBoost |
title | Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data |
title_full | Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data |
title_fullStr | Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data |
title_full_unstemmed | Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data |
title_short | Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data |
title_sort | ensemble multiboost based on ripper classifier for prediction of imbalanced software defect data |
topic | Software defect prediction class imbalance combined sampling rule learning MultiBoost |
url | https://ieeexplore.ieee.org/document/8793088/ |
work_keys_str_mv | AT haitaohe ensemblemultiboostbasedonripperclassifierforpredictionofimbalancedsoftwaredefectdata AT xuzhang ensemblemultiboostbasedonripperclassifierforpredictionofimbalancedsoftwaredefectdata AT qianwang ensemblemultiboostbasedonripperclassifierforpredictionofimbalancedsoftwaredefectdata AT jiadongren ensemblemultiboostbasedonripperclassifierforpredictionofimbalancedsoftwaredefectdata AT jiaxinliu ensemblemultiboostbasedonripperclassifierforpredictionofimbalancedsoftwaredefectdata AT xiaolinzhao ensemblemultiboostbasedonripperclassifierforpredictionofimbalancedsoftwaredefectdata AT yongqiangcheng ensemblemultiboostbasedonripperclassifierforpredictionofimbalancedsoftwaredefectdata |