Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report Classification

Due to the unavoidable bugs appearing in the most of the software systems, bug resolution has become one of the most important activities in software maintenance. To decrease the time cost in manual work, text classification techniques are applied to automatically identify severity of bug reports. I...

Full description

Bibliographic Details
Main Authors: Shikai Guo, Rong Chen, Miaomiao Wei, Hui Li, Yaqing Liu
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8438448/
_version_ 1819169224417869824
author Shikai Guo
Rong Chen
Miaomiao Wei
Hui Li
Yaqing Liu
author_facet Shikai Guo
Rong Chen
Miaomiao Wei
Hui Li
Yaqing Liu
author_sort Shikai Guo
collection DOAJ
description Due to the unavoidable bugs appearing in the most of the software systems, bug resolution has become one of the most important activities in software maintenance. To decrease the time cost in manual work, text classification techniques are applied to automatically identify severity of bug reports. In this paper, we address the problem of low-quality and class imbalance for identifying the severity of bug reports. First, we combine feature selection with instance selection to simultaneously reduce the bug report dimension and the word dimension, which could get small-scale and high-quality reduced data set. Then, an improve random oversampling technique, named, RSMOTE, which is presented to weaken the imbalancedness degree of class distribution. Finally, to avoid the random over-sampling uncertainty of RSMOTE, we develop an ensemble learning algorithm, which is based on Choquet fuzzy integral, to combine multiple RSMOTE. We empirically investigate the performance of data reduction on ten data sets of three large open source projects, namely, Eclipse, Mozilla, and GNOME. The results show that our approach can effectively reduce the data scale and improve the performance of identifying the severity of bug reports.
first_indexed 2024-12-22T19:16:06Z
format Article
id doaj.art-f4832e42738e48e481f12b13affa0f16
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-22T19:16:06Z
publishDate 2018-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-f4832e42738e48e481f12b13affa0f162022-12-21T18:15:31ZengIEEEIEEE Access2169-35362018-01-016459344595010.1109/ACCESS.2018.28657808438448Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report ClassificationShikai Guo0https://orcid.org/0000-0002-8554-6365Rong Chen1Miaomiao Wei2Hui Li3Yaqing Liu4College of Information Science and Technology, Dalian Maritime University, Dalian, ChinaCollege of Information Science and Technology, Dalian Maritime University, Dalian, ChinaCollege of Information Science and Technology, Dalian Maritime University, Dalian, ChinaCollege of Information Science and Technology, Dalian Maritime University, Dalian, ChinaCollege of Information Science and Technology, Dalian Maritime University, Dalian, ChinaDue to the unavoidable bugs appearing in the most of the software systems, bug resolution has become one of the most important activities in software maintenance. To decrease the time cost in manual work, text classification techniques are applied to automatically identify severity of bug reports. In this paper, we address the problem of low-quality and class imbalance for identifying the severity of bug reports. First, we combine feature selection with instance selection to simultaneously reduce the bug report dimension and the word dimension, which could get small-scale and high-quality reduced data set. Then, an improve random oversampling technique, named, RSMOTE, which is presented to weaken the imbalancedness degree of class distribution. Finally, to avoid the random over-sampling uncertainty of RSMOTE, we develop an ensemble learning algorithm, which is based on Choquet fuzzy integral, to combine multiple RSMOTE. We empirically investigate the performance of data reduction on ten data sets of three large open source projects, namely, Eclipse, Mozilla, and GNOME. The results show that our approach can effectively reduce the data scale and improve the performance of identifying the severity of bug reports.https://ieeexplore.ieee.org/document/8438448/Mining software repositoriesdata reductionimbalance distributionfuzzy integral
spellingShingle Shikai Guo
Rong Chen
Miaomiao Wei
Hui Li
Yaqing Liu
Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report Classification
IEEE Access
Mining software repositories
data reduction
imbalance distribution
fuzzy integral
title Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report Classification
title_full Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report Classification
title_fullStr Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report Classification
title_full_unstemmed Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report Classification
title_short Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report Classification
title_sort ensemble data reduction techniques and multi rsmote via fuzzy integral for bug report classification
topic Mining software repositories
data reduction
imbalance distribution
fuzzy integral
url https://ieeexplore.ieee.org/document/8438448/
work_keys_str_mv AT shikaiguo ensembledatareductiontechniquesandmultirsmoteviafuzzyintegralforbugreportclassification
AT rongchen ensembledatareductiontechniquesandmultirsmoteviafuzzyintegralforbugreportclassification
AT miaomiaowei ensembledatareductiontechniquesandmultirsmoteviafuzzyintegralforbugreportclassification
AT huili ensembledatareductiontechniquesandmultirsmoteviafuzzyintegralforbugreportclassification
AT yaqingliu ensembledatareductiontechniquesandmultirsmoteviafuzzyintegralforbugreportclassification