An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks

Recently, Code Quality (CQ) has become critical in a wide range of organizations and in many areas from academia to industry. CQ, in terms of readability, security, and testability, is a major goal throughout the software development process because it affects overall Software Quality (SQ) in terms...

Full description

Bibliographic Details
Main Authors:	Shaymaa E. Sorour, Hanan E. Abdelkader, Karam M. Sallam, Ripon K. Chakrabortty, Michael J. Ryan, Amr Abohany
Format:	Article
Language:	English
Published:	Elsevier 2022-09-01
Series:	Journal of King Saud University: Computer and Information Sciences
Subjects:	Code Quality (CQ) Latent Dirichlet Allocation (LDA) Convolutional Neural Networks (CNN) Deep Learning (DL) Classification
Online Access:	http://www.sciencedirect.com/science/article/pii/S131915782200026X

_version_	1828356985092833280
author	Shaymaa E. Sorour Hanan E. Abdelkader Karam M. Sallam Ripon K. Chakrabortty Michael J. Ryan Amr Abohany
author_facet	Shaymaa E. Sorour Hanan E. Abdelkader Karam M. Sallam Ripon K. Chakrabortty Michael J. Ryan Amr Abohany
author_sort	Shaymaa E. Sorour
collection	DOAJ
description	Recently, Code Quality (CQ) has become critical in a wide range of organizations and in many areas from academia to industry. CQ, in terms of readability, security, and testability, is a major goal throughout the software development process because it affects overall Software Quality (SQ) in terms of subsequent releases, maintenance, and updates. It is particularly important for the development of safety critical systems. Existing studies on CQ have several shortcomings in that they are based on incomplete information about the source code, and tend to focus on only one feature, which is likely to determine the performance of the model. Moreover, these considerations often limit obtaining high accuracy because there is no strong relationship between the input data and the output data. Thus, it is necessary to design an effective and efficient SQ measurement system for measuring multiple quality factors. To that end, we propose a deep learning framework that employed a Latent Dirichlet Allocation (LDA) with Convolutional Neural Networks (CNN), called CNN-LDA, to classify input data into topics that are related to CQ features and to identify hidden patterns and correlations in programming data. Three SQ metrics (i.e., readability, security, and testability) and machine learning techniques (e.g., random forest (RF) and support vector machine (SVM)) are taken into account to validate the proposed model. The proposed CNN-LDA outperformed its peers across the vast majority of datasets examined. The average overall F-measure for readability, security, and testability are 94%,94% and 93%. The average overall accuracy for readability, security, and testability are 93%,93% and 92%. The superiority of LDA-CNN over the other classifiers was very clear based on a Wilcoxon’s non-parametric statistical test (α=0.05).
first_indexed	2024-04-14T03:08:00Z
format	Article
id	doaj.art-3a57f73f2aca4d828a309dc129728a7f
institution	Directory Open Access Journal
issn	1319-1578
language	English
last_indexed	2024-04-14T03:08:00Z
publishDate	2022-09-01
publisher	Elsevier
record_format	Article
series	Journal of King Saud University: Computer and Information Sciences
spelling	doaj.art-3a57f73f2aca4d828a309dc129728a7f2022-12-22T02:15:40ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782022-09-0134859795997An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural NetworksShaymaa E. Sorour0Hanan E. Abdelkader1Karam M. Sallam2Ripon K. Chakrabortty3Michael J. Ryan4Amr Abohany5Faculty of Specific Education, Kafrelsheikh University, EgyptFaculty of Specific Education, Mansoura University, EgyptFaculty of Science and Information Technology, University of Canberra, Australia; Faculty of Computers and Informatics, Zagazig University, Egypt; Corresponding author at: Faculty of Computers and Informatics, Zagazig University, Egypt.School of Engineering and IT, University of New South Wales, Canberra, AustraliaSchool of Engineering and IT, University of New South Wales, Canberra, AustraliaFaculty of Computing and Information, Kafrelsheikh University, EgyptRecently, Code Quality (CQ) has become critical in a wide range of organizations and in many areas from academia to industry. CQ, in terms of readability, security, and testability, is a major goal throughout the software development process because it affects overall Software Quality (SQ) in terms of subsequent releases, maintenance, and updates. It is particularly important for the development of safety critical systems. Existing studies on CQ have several shortcomings in that they are based on incomplete information about the source code, and tend to focus on only one feature, which is likely to determine the performance of the model. Moreover, these considerations often limit obtaining high accuracy because there is no strong relationship between the input data and the output data. Thus, it is necessary to design an effective and efficient SQ measurement system for measuring multiple quality factors. To that end, we propose a deep learning framework that employed a Latent Dirichlet Allocation (LDA) with Convolutional Neural Networks (CNN), called CNN-LDA, to classify input data into topics that are related to CQ features and to identify hidden patterns and correlations in programming data. Three SQ metrics (i.e., readability, security, and testability) and machine learning techniques (e.g., random forest (RF) and support vector machine (SVM)) are taken into account to validate the proposed model. The proposed CNN-LDA outperformed its peers across the vast majority of datasets examined. The average overall F-measure for readability, security, and testability are 94%,94% and 93%. The average overall accuracy for readability, security, and testability are 93%,93% and 92%. The superiority of LDA-CNN over the other classifiers was very clear based on a Wilcoxon’s non-parametric statistical test (α=0.05).http://www.sciencedirect.com/science/article/pii/S131915782200026XCode Quality (CQ)Latent Dirichlet Allocation (LDA)Convolutional Neural Networks (CNN)Deep Learning (DL)Classification
spellingShingle	Shaymaa E. Sorour Hanan E. Abdelkader Karam M. Sallam Ripon K. Chakrabortty Michael J. Ryan Amr Abohany An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks Journal of King Saud University: Computer and Information Sciences Code Quality (CQ) Latent Dirichlet Allocation (LDA) Convolutional Neural Networks (CNN) Deep Learning (DL) Classification
title	An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks
title_full	An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks
title_fullStr	An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks
title_full_unstemmed	An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks
title_short	An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks
title_sort	analytical code quality methodology using latent dirichlet allocation and convolutional neural networks
topic	Code Quality (CQ) Latent Dirichlet Allocation (LDA) Convolutional Neural Networks (CNN) Deep Learning (DL) Classification
url	http://www.sciencedirect.com/science/article/pii/S131915782200026X
work_keys_str_mv	AT shaymaaesorour ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT hananeabdelkader ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT karammsallam ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT riponkchakrabortty ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT michaeljryan ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT amrabohany ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT shaymaaesorour analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT hananeabdelkader analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT karammsallam analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT riponkchakrabortty analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT michaeljryan analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT amrabohany analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks

An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks

Similar Items