CaPBug-A Framework for Automatic Bug Categorization and Prioritization Using NLP and Machine Learning Algorithms

Bug reports facilitate software development teams in improving the quality of software. These reports include significant information related to problems encountered within a software, possible enhancement suggestions, and other potential issues. Bug reports are typically complex and are too detaile...

Full description

Bibliographic Details
Main Authors: Hafiza Anisa Ahmed, Narmeen Zakaria Bawany, Jawwad Ahmed Shamsi
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9388660/
Description
Summary:Bug reports facilitate software development teams in improving the quality of software. These reports include significant information related to problems encountered within a software, possible enhancement suggestions, and other potential issues. Bug reports are typically complex and are too detailed; hence a lot of resources are required to analyze and process them manually. Moreover, it leads to delays in the resolution of high priority bugs. Accurate and timely processing of bug reports based on their category and priority plays a significant role in improving the quality of software maintenance. Therefore, an automated process of categorization and prioritization of bug reports is needed to address the aforementioned issues. Automated categorization and prioritization of bug reports have been explored recently by many researchers; however, limited progress has been made in this regard. In this research, we present a novel framework, titled CaPBug, for automated categorization and prioritization of bug reports. The framework is implemented using Natural Language Processing (NLP) and supervised machine learning algorithms. A baseline corpus is built with six categories and five prioritization levels by analyzing more than 2000 bug reports of Mozilla and Eclipse repository. Four classification algorithms i.e., Naive Bayes, Random Forest, Decision Tree, and Logistic Regression have been used to categorize and prioritize bug reports. We demonstrate that the CaPBug framework achieved an accuracy of 88.78% by using a Random Forest classifier with a textual feature for predicting the category. Similarly, using the CaPBug framework, an accuracy of 90.43% was achieved in predicting the priority of bug reports. Synthetic Minority Over-Sampling Technique (SMOTE) has been applied to address the class imbalance issue in priority classes.
ISSN:2169-3536