Graph-based machine learning improves just-in-time defect prediction

The increasing complexity of today’s software requires the contribution of thousands of developers. This complex collaboration structure makes developers more likely to introduce defect-prone changes that lead to software faults. Determining when these defect-prone changes are introduced has proven...

Full description

Bibliographic Details
Main Authors:	Jonathan Bryan, Pablo Moriano
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2023-01-01
Series:	PLoS ONE
Online Access:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10101485/?tool=EBI

_version_	1797846132351565824
author	Jonathan Bryan Pablo Moriano
author_facet	Jonathan Bryan Pablo Moriano
author_sort	Jonathan Bryan
collection	DOAJ
description	The increasing complexity of today’s software requires the contribution of thousands of developers. This complex collaboration structure makes developers more likely to introduce defect-prone changes that lead to software faults. Determining when these defect-prone changes are introduced has proven challenging, and using traditional machine learning (ML) methods to make these determinations seems to have reached a plateau. In this work, we build contribution graphs consisting of developers and source files to capture the nuanced complexity of changes required to build software. By leveraging these contribution graphs, our research shows the potential of using graph-based ML to improve Just-In-Time (JIT) defect prediction. We hypothesize that features extracted from the contribution graphs may be better predictors of defect-prone changes than intrinsic features derived from software characteristics. We corroborate our hypothesis using graph-based ML for classifying edges that represent defect-prone changes. This new framing of the JIT defect prediction problem leads to remarkably better results. We test our approach on 14 open-source projects and show that our best model can predict whether or not a code change will lead to a defect with an F1 score as high as 77.55% and a Matthews correlation coefficient (MCC) as high as 53.16%. This represents a 152% higher F1 score and a 3% higher MCC over the state-of-the-art JIT defect prediction. We describe limitations, open challenges, and how this method can be used for operational JIT defect prediction.
first_indexed	2024-04-09T17:51:04Z
format	Article
id	doaj.art-0de803158bab474fade4f2e0fbbc207e
institution	Directory Open Access Journal
issn	1932-6203
language	English
last_indexed	2024-04-09T17:51:04Z
publishDate	2023-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj.art-0de803158bab474fade4f2e0fbbc207e2023-04-16T05:31:40ZengPublic Library of Science (PLoS)PLoS ONE1932-62032023-01-01184Graph-based machine learning improves just-in-time defect predictionJonathan BryanPablo MorianoThe increasing complexity of today’s software requires the contribution of thousands of developers. This complex collaboration structure makes developers more likely to introduce defect-prone changes that lead to software faults. Determining when these defect-prone changes are introduced has proven challenging, and using traditional machine learning (ML) methods to make these determinations seems to have reached a plateau. In this work, we build contribution graphs consisting of developers and source files to capture the nuanced complexity of changes required to build software. By leveraging these contribution graphs, our research shows the potential of using graph-based ML to improve Just-In-Time (JIT) defect prediction. We hypothesize that features extracted from the contribution graphs may be better predictors of defect-prone changes than intrinsic features derived from software characteristics. We corroborate our hypothesis using graph-based ML for classifying edges that represent defect-prone changes. This new framing of the JIT defect prediction problem leads to remarkably better results. We test our approach on 14 open-source projects and show that our best model can predict whether or not a code change will lead to a defect with an F1 score as high as 77.55% and a Matthews correlation coefficient (MCC) as high as 53.16%. This represents a 152% higher F1 score and a 3% higher MCC over the state-of-the-art JIT defect prediction. We describe limitations, open challenges, and how this method can be used for operational JIT defect prediction.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10101485/?tool=EBI
spellingShingle	Jonathan Bryan Pablo Moriano Graph-based machine learning improves just-in-time defect prediction PLoS ONE
title	Graph-based machine learning improves just-in-time defect prediction
title_full	Graph-based machine learning improves just-in-time defect prediction
title_fullStr	Graph-based machine learning improves just-in-time defect prediction
title_full_unstemmed	Graph-based machine learning improves just-in-time defect prediction
title_short	Graph-based machine learning improves just-in-time defect prediction
title_sort	graph based machine learning improves just in time defect prediction
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10101485/?tool=EBI
work_keys_str_mv	AT jonathanbryan graphbasedmachinelearningimprovesjustintimedefectprediction AT pablomoriano graphbasedmachinelearningimprovesjustintimedefectprediction

Graph-based machine learning improves just-in-time defect prediction

Similar Items