Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report Classification
Due to the unavoidable bugs appearing in the most of the software systems, bug resolution has become one of the most important activities in software maintenance. To decrease the time cost in manual work, text classification techniques are applied to automatically identify severity of bug reports. I...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2018-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8438448/ |
_version_ | 1819169224417869824 |
---|---|
author | Shikai Guo Rong Chen Miaomiao Wei Hui Li Yaqing Liu |
author_facet | Shikai Guo Rong Chen Miaomiao Wei Hui Li Yaqing Liu |
author_sort | Shikai Guo |
collection | DOAJ |
description | Due to the unavoidable bugs appearing in the most of the software systems, bug resolution has become one of the most important activities in software maintenance. To decrease the time cost in manual work, text classification techniques are applied to automatically identify severity of bug reports. In this paper, we address the problem of low-quality and class imbalance for identifying the severity of bug reports. First, we combine feature selection with instance selection to simultaneously reduce the bug report dimension and the word dimension, which could get small-scale and high-quality reduced data set. Then, an improve random oversampling technique, named, RSMOTE, which is presented to weaken the imbalancedness degree of class distribution. Finally, to avoid the random over-sampling uncertainty of RSMOTE, we develop an ensemble learning algorithm, which is based on Choquet fuzzy integral, to combine multiple RSMOTE. We empirically investigate the performance of data reduction on ten data sets of three large open source projects, namely, Eclipse, Mozilla, and GNOME. The results show that our approach can effectively reduce the data scale and improve the performance of identifying the severity of bug reports. |
first_indexed | 2024-12-22T19:16:06Z |
format | Article |
id | doaj.art-f4832e42738e48e481f12b13affa0f16 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-22T19:16:06Z |
publishDate | 2018-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-f4832e42738e48e481f12b13affa0f162022-12-21T18:15:31ZengIEEEIEEE Access2169-35362018-01-016459344595010.1109/ACCESS.2018.28657808438448Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report ClassificationShikai Guo0https://orcid.org/0000-0002-8554-6365Rong Chen1Miaomiao Wei2Hui Li3Yaqing Liu4College of Information Science and Technology, Dalian Maritime University, Dalian, ChinaCollege of Information Science and Technology, Dalian Maritime University, Dalian, ChinaCollege of Information Science and Technology, Dalian Maritime University, Dalian, ChinaCollege of Information Science and Technology, Dalian Maritime University, Dalian, ChinaCollege of Information Science and Technology, Dalian Maritime University, Dalian, ChinaDue to the unavoidable bugs appearing in the most of the software systems, bug resolution has become one of the most important activities in software maintenance. To decrease the time cost in manual work, text classification techniques are applied to automatically identify severity of bug reports. In this paper, we address the problem of low-quality and class imbalance for identifying the severity of bug reports. First, we combine feature selection with instance selection to simultaneously reduce the bug report dimension and the word dimension, which could get small-scale and high-quality reduced data set. Then, an improve random oversampling technique, named, RSMOTE, which is presented to weaken the imbalancedness degree of class distribution. Finally, to avoid the random over-sampling uncertainty of RSMOTE, we develop an ensemble learning algorithm, which is based on Choquet fuzzy integral, to combine multiple RSMOTE. We empirically investigate the performance of data reduction on ten data sets of three large open source projects, namely, Eclipse, Mozilla, and GNOME. The results show that our approach can effectively reduce the data scale and improve the performance of identifying the severity of bug reports.https://ieeexplore.ieee.org/document/8438448/Mining software repositoriesdata reductionimbalance distributionfuzzy integral |
spellingShingle | Shikai Guo Rong Chen Miaomiao Wei Hui Li Yaqing Liu Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report Classification IEEE Access Mining software repositories data reduction imbalance distribution fuzzy integral |
title | Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report Classification |
title_full | Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report Classification |
title_fullStr | Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report Classification |
title_full_unstemmed | Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report Classification |
title_short | Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report Classification |
title_sort | ensemble data reduction techniques and multi rsmote via fuzzy integral for bug report classification |
topic | Mining software repositories data reduction imbalance distribution fuzzy integral |
url | https://ieeexplore.ieee.org/document/8438448/ |
work_keys_str_mv | AT shikaiguo ensembledatareductiontechniquesandmultirsmoteviafuzzyintegralforbugreportclassification AT rongchen ensembledatareductiontechniquesandmultirsmoteviafuzzyintegralforbugreportclassification AT miaomiaowei ensembledatareductiontechniquesandmultirsmoteviafuzzyintegralforbugreportclassification AT huili ensembledatareductiontechniquesandmultirsmoteviafuzzyintegralforbugreportclassification AT yaqingliu ensembledatareductiontechniquesandmultirsmoteviafuzzyintegralforbugreportclassification |