Continual Prediction of Bug-Fix Time Using Deep Learning-Based Activity Stream Embedding

Predicting the fix time of a bug is important for managing the resources and release milestones of a software development project. However, it is considered non-trivial to achieve high accuracy when predicting bug-fix times. We view that such difficulties come from the lack of continuous or posterio...

Full description

Bibliographic Details
Main Authors: Youngseok Lee, Suin Lee, Chan-Gun Lee, Ikjun Yeom, Honguk Woo
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8955829/
Description
Summary:Predicting the fix time of a bug is important for managing the resources and release milestones of a software development project. However, it is considered non-trivial to achieve high accuracy when predicting bug-fix times. We view that such difficulties come from the lack of continuous or posterior estimation based on subsequent developers' activities after a bug is initially reported. In this paper, we formulate the problem of bug-fix time prediction into a continual update of estimates with more activities. Logging data of bug-related activities that are streamed to a bug tracking system change the bug reports, enabling us to recalculate predictions over time. To do so, we propose a deep learning-based two-staged activity stream embedding model, DASENet that employs (i) a merged network for extracting contextual features across different types of logs, and (ii) a sequence network for exploring temporal relations of the logs. Through experiments with bug tracking system datasets from open source projects including Firefox, Chromium, and Eclipse, we show that DASENet achieves stable performance, e.g., for the Firefox dataset, top-1 accuracy of 4.6 to 8.5 % higher than other state-of-the-art works. Our approach also provides a transferable structure, yielding robust performance with a small dataset for different tasks; the DASENet model trained with a small dataset of about 900 samples (2 % of a full dataset) can show competitive performance to the other models with a full dataset. To the best of our knowledge, we are the first to employ deep learning on log streams in the context of bug-fix time prediction.
ISSN:2169-3536