An Improved CNN Model for Within-Project Software Defect Prediction

To improve software reliability, software defect prediction is used to find software bugs and prioritize testing efforts. Recently, some researchers introduced deep learning models, such as the deep belief network (DBN) and the state-of-the-art convolutional neural network (CNN), and used automatica...

Full description

Bibliographic Details
Main Authors: Cong Pan, Minyan Lu, Biao Xu, Houleng Gao
Format: Article
Language:English
Published: MDPI AG 2019-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/9/10/2138
_version_ 1817990737771364352
author Cong Pan
Minyan Lu
Biao Xu
Houleng Gao
author_facet Cong Pan
Minyan Lu
Biao Xu
Houleng Gao
author_sort Cong Pan
collection DOAJ
description To improve software reliability, software defect prediction is used to find software bugs and prioritize testing efforts. Recently, some researchers introduced deep learning models, such as the deep belief network (DBN) and the state-of-the-art convolutional neural network (CNN), and used automatically generated features extracted from abstract syntax trees (ASTs) and deep learning models to improve defect prediction performance. However, the research on the CNN model failed to reveal clear conclusions due to its limited dataset size, insufficiently repeated experiments, and outdated baseline selection. To solve these problems, we built the PROMISE Source Code (PSC) dataset to enlarge the original dataset in the CNN research, which we named the Simplified PROMISE Source Code (SPSC) dataset. Then, we proposed an improved CNN model for within-project defect prediction (WPDP) and compared our results to existing CNN results and an empirical study. Our experiment was based on a 30-repetition holdout validation and a 10 * 10 cross-validation. Experimental results showed that our improved CNN model was comparable to the existing CNN model, and it outperformed the state-of-the-art machine learning models significantly for WPDP. Furthermore, we defined hyperparameter instability and examined the threat and opportunity it presents for deep learning models on defect prediction.
first_indexed 2024-04-14T01:03:32Z
format Article
id doaj.art-9e5cb88b7b37415890b379171c59d9f2
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-04-14T01:03:32Z
publishDate 2019-05-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-9e5cb88b7b37415890b379171c59d9f22022-12-22T02:21:19ZengMDPI AGApplied Sciences2076-34172019-05-01910213810.3390/app9102138app9102138An Improved CNN Model for Within-Project Software Defect PredictionCong Pan0Minyan Lu1Biao Xu2Houleng Gao3The Key Laboratory on Reliability and Environmental Engineering Technology, Beihang University, Beijing 100191, ChinaThe Key Laboratory on Reliability and Environmental Engineering Technology, Beihang University, Beijing 100191, ChinaThe Key Laboratory on Reliability and Environmental Engineering Technology, Beihang University, Beijing 100191, ChinaThe Key Laboratory on Reliability and Environmental Engineering Technology, Beihang University, Beijing 100191, ChinaTo improve software reliability, software defect prediction is used to find software bugs and prioritize testing efforts. Recently, some researchers introduced deep learning models, such as the deep belief network (DBN) and the state-of-the-art convolutional neural network (CNN), and used automatically generated features extracted from abstract syntax trees (ASTs) and deep learning models to improve defect prediction performance. However, the research on the CNN model failed to reveal clear conclusions due to its limited dataset size, insufficiently repeated experiments, and outdated baseline selection. To solve these problems, we built the PROMISE Source Code (PSC) dataset to enlarge the original dataset in the CNN research, which we named the Simplified PROMISE Source Code (SPSC) dataset. Then, we proposed an improved CNN model for within-project defect prediction (WPDP) and compared our results to existing CNN results and an empirical study. Our experiment was based on a 30-repetition holdout validation and a 10 * 10 cross-validation. Experimental results showed that our improved CNN model was comparable to the existing CNN model, and it outperformed the state-of-the-art machine learning models significantly for WPDP. Furthermore, we defined hyperparameter instability and examined the threat and opportunity it presents for deep learning models on defect prediction.https://www.mdpi.com/2076-3417/9/10/2138CNN modelwithin-project defect predictionabstract syntax treedeep learninghyperparameter instability
spellingShingle Cong Pan
Minyan Lu
Biao Xu
Houleng Gao
An Improved CNN Model for Within-Project Software Defect Prediction
Applied Sciences
CNN model
within-project defect prediction
abstract syntax tree
deep learning
hyperparameter instability
title An Improved CNN Model for Within-Project Software Defect Prediction
title_full An Improved CNN Model for Within-Project Software Defect Prediction
title_fullStr An Improved CNN Model for Within-Project Software Defect Prediction
title_full_unstemmed An Improved CNN Model for Within-Project Software Defect Prediction
title_short An Improved CNN Model for Within-Project Software Defect Prediction
title_sort improved cnn model for within project software defect prediction
topic CNN model
within-project defect prediction
abstract syntax tree
deep learning
hyperparameter instability
url https://www.mdpi.com/2076-3417/9/10/2138
work_keys_str_mv AT congpan animprovedcnnmodelforwithinprojectsoftwaredefectprediction
AT minyanlu animprovedcnnmodelforwithinprojectsoftwaredefectprediction
AT biaoxu animprovedcnnmodelforwithinprojectsoftwaredefectprediction
AT houlenggao animprovedcnnmodelforwithinprojectsoftwaredefectprediction
AT congpan improvedcnnmodelforwithinprojectsoftwaredefectprediction
AT minyanlu improvedcnnmodelforwithinprojectsoftwaredefectprediction
AT biaoxu improvedcnnmodelforwithinprojectsoftwaredefectprediction
AT houlenggao improvedcnnmodelforwithinprojectsoftwaredefectprediction