An Improved CNN Model for Within-Project Software Defect Prediction
To improve software reliability, software defect prediction is used to find software bugs and prioritize testing efforts. Recently, some researchers introduced deep learning models, such as the deep belief network (DBN) and the state-of-the-art convolutional neural network (CNN), and used automatica...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-05-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/9/10/2138 |
_version_ | 1817990737771364352 |
---|---|
author | Cong Pan Minyan Lu Biao Xu Houleng Gao |
author_facet | Cong Pan Minyan Lu Biao Xu Houleng Gao |
author_sort | Cong Pan |
collection | DOAJ |
description | To improve software reliability, software defect prediction is used to find software bugs and prioritize testing efforts. Recently, some researchers introduced deep learning models, such as the deep belief network (DBN) and the state-of-the-art convolutional neural network (CNN), and used automatically generated features extracted from abstract syntax trees (ASTs) and deep learning models to improve defect prediction performance. However, the research on the CNN model failed to reveal clear conclusions due to its limited dataset size, insufficiently repeated experiments, and outdated baseline selection. To solve these problems, we built the PROMISE Source Code (PSC) dataset to enlarge the original dataset in the CNN research, which we named the Simplified PROMISE Source Code (SPSC) dataset. Then, we proposed an improved CNN model for within-project defect prediction (WPDP) and compared our results to existing CNN results and an empirical study. Our experiment was based on a 30-repetition holdout validation and a 10 * 10 cross-validation. Experimental results showed that our improved CNN model was comparable to the existing CNN model, and it outperformed the state-of-the-art machine learning models significantly for WPDP. Furthermore, we defined hyperparameter instability and examined the threat and opportunity it presents for deep learning models on defect prediction. |
first_indexed | 2024-04-14T01:03:32Z |
format | Article |
id | doaj.art-9e5cb88b7b37415890b379171c59d9f2 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-04-14T01:03:32Z |
publishDate | 2019-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-9e5cb88b7b37415890b379171c59d9f22022-12-22T02:21:19ZengMDPI AGApplied Sciences2076-34172019-05-01910213810.3390/app9102138app9102138An Improved CNN Model for Within-Project Software Defect PredictionCong Pan0Minyan Lu1Biao Xu2Houleng Gao3The Key Laboratory on Reliability and Environmental Engineering Technology, Beihang University, Beijing 100191, ChinaThe Key Laboratory on Reliability and Environmental Engineering Technology, Beihang University, Beijing 100191, ChinaThe Key Laboratory on Reliability and Environmental Engineering Technology, Beihang University, Beijing 100191, ChinaThe Key Laboratory on Reliability and Environmental Engineering Technology, Beihang University, Beijing 100191, ChinaTo improve software reliability, software defect prediction is used to find software bugs and prioritize testing efforts. Recently, some researchers introduced deep learning models, such as the deep belief network (DBN) and the state-of-the-art convolutional neural network (CNN), and used automatically generated features extracted from abstract syntax trees (ASTs) and deep learning models to improve defect prediction performance. However, the research on the CNN model failed to reveal clear conclusions due to its limited dataset size, insufficiently repeated experiments, and outdated baseline selection. To solve these problems, we built the PROMISE Source Code (PSC) dataset to enlarge the original dataset in the CNN research, which we named the Simplified PROMISE Source Code (SPSC) dataset. Then, we proposed an improved CNN model for within-project defect prediction (WPDP) and compared our results to existing CNN results and an empirical study. Our experiment was based on a 30-repetition holdout validation and a 10 * 10 cross-validation. Experimental results showed that our improved CNN model was comparable to the existing CNN model, and it outperformed the state-of-the-art machine learning models significantly for WPDP. Furthermore, we defined hyperparameter instability and examined the threat and opportunity it presents for deep learning models on defect prediction.https://www.mdpi.com/2076-3417/9/10/2138CNN modelwithin-project defect predictionabstract syntax treedeep learninghyperparameter instability |
spellingShingle | Cong Pan Minyan Lu Biao Xu Houleng Gao An Improved CNN Model for Within-Project Software Defect Prediction Applied Sciences CNN model within-project defect prediction abstract syntax tree deep learning hyperparameter instability |
title | An Improved CNN Model for Within-Project Software Defect Prediction |
title_full | An Improved CNN Model for Within-Project Software Defect Prediction |
title_fullStr | An Improved CNN Model for Within-Project Software Defect Prediction |
title_full_unstemmed | An Improved CNN Model for Within-Project Software Defect Prediction |
title_short | An Improved CNN Model for Within-Project Software Defect Prediction |
title_sort | improved cnn model for within project software defect prediction |
topic | CNN model within-project defect prediction abstract syntax tree deep learning hyperparameter instability |
url | https://www.mdpi.com/2076-3417/9/10/2138 |
work_keys_str_mv | AT congpan animprovedcnnmodelforwithinprojectsoftwaredefectprediction AT minyanlu animprovedcnnmodelforwithinprojectsoftwaredefectprediction AT biaoxu animprovedcnnmodelforwithinprojectsoftwaredefectprediction AT houlenggao animprovedcnnmodelforwithinprojectsoftwaredefectprediction AT congpan improvedcnnmodelforwithinprojectsoftwaredefectprediction AT minyanlu improvedcnnmodelforwithinprojectsoftwaredefectprediction AT biaoxu improvedcnnmodelforwithinprojectsoftwaredefectprediction AT houlenggao improvedcnnmodelforwithinprojectsoftwaredefectprediction |