An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks

Recently, Code Quality (CQ) has become critical in a wide range of organizations and in many areas from academia to industry. CQ, in terms of readability, security, and testability, is a major goal throughout the software development process because it affects overall Software Quality (SQ) in terms...

Full description

Bibliographic Details
Main Authors: Shaymaa E. Sorour, Hanan E. Abdelkader, Karam M. Sallam, Ripon K. Chakrabortty, Michael J. Ryan, Amr Abohany
Format: Article
Language:English
Published: Elsevier 2022-09-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S131915782200026X
_version_ 1828356985092833280
author Shaymaa E. Sorour
Hanan E. Abdelkader
Karam M. Sallam
Ripon K. Chakrabortty
Michael J. Ryan
Amr Abohany
author_facet Shaymaa E. Sorour
Hanan E. Abdelkader
Karam M. Sallam
Ripon K. Chakrabortty
Michael J. Ryan
Amr Abohany
author_sort Shaymaa E. Sorour
collection DOAJ
description Recently, Code Quality (CQ) has become critical in a wide range of organizations and in many areas from academia to industry. CQ, in terms of readability, security, and testability, is a major goal throughout the software development process because it affects overall Software Quality (SQ) in terms of subsequent releases, maintenance, and updates. It is particularly important for the development of safety critical systems. Existing studies on CQ have several shortcomings in that they are based on incomplete information about the source code, and tend to focus on only one feature, which is likely to determine the performance of the model. Moreover, these considerations often limit obtaining high accuracy because there is no strong relationship between the input data and the output data. Thus, it is necessary to design an effective and efficient SQ measurement system for measuring multiple quality factors. To that end, we propose a deep learning framework that employed a Latent Dirichlet Allocation (LDA) with Convolutional Neural Networks (CNN), called CNN-LDA, to classify input data into topics that are related to CQ features and to identify hidden patterns and correlations in programming data. Three SQ metrics (i.e., readability, security, and testability) and machine learning techniques (e.g., random forest (RF) and support vector machine (SVM)) are taken into account to validate the proposed model. The proposed CNN-LDA outperformed its peers across the vast majority of datasets examined. The average overall F-measure for readability, security, and testability are 94%,94% and 93%. The average overall accuracy for readability, security, and testability are 93%,93% and 92%. The superiority of LDA-CNN over the other classifiers was very clear based on a Wilcoxon’s non-parametric statistical test (α=0.05).
first_indexed 2024-04-14T03:08:00Z
format Article
id doaj.art-3a57f73f2aca4d828a309dc129728a7f
institution Directory Open Access Journal
issn 1319-1578
language English
last_indexed 2024-04-14T03:08:00Z
publishDate 2022-09-01
publisher Elsevier
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj.art-3a57f73f2aca4d828a309dc129728a7f2022-12-22T02:15:40ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782022-09-0134859795997An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural NetworksShaymaa E. Sorour0Hanan E. Abdelkader1Karam M. Sallam2Ripon K. Chakrabortty3Michael J. Ryan4Amr Abohany5Faculty of Specific Education, Kafrelsheikh University, EgyptFaculty of Specific Education, Mansoura University, EgyptFaculty of Science and Information Technology, University of Canberra, Australia; Faculty of Computers and Informatics, Zagazig University, Egypt; Corresponding author at: Faculty of Computers and Informatics, Zagazig University, Egypt.School of Engineering and IT, University of New South Wales, Canberra, AustraliaSchool of Engineering and IT, University of New South Wales, Canberra, AustraliaFaculty of Computing and Information, Kafrelsheikh University, EgyptRecently, Code Quality (CQ) has become critical in a wide range of organizations and in many areas from academia to industry. CQ, in terms of readability, security, and testability, is a major goal throughout the software development process because it affects overall Software Quality (SQ) in terms of subsequent releases, maintenance, and updates. It is particularly important for the development of safety critical systems. Existing studies on CQ have several shortcomings in that they are based on incomplete information about the source code, and tend to focus on only one feature, which is likely to determine the performance of the model. Moreover, these considerations often limit obtaining high accuracy because there is no strong relationship between the input data and the output data. Thus, it is necessary to design an effective and efficient SQ measurement system for measuring multiple quality factors. To that end, we propose a deep learning framework that employed a Latent Dirichlet Allocation (LDA) with Convolutional Neural Networks (CNN), called CNN-LDA, to classify input data into topics that are related to CQ features and to identify hidden patterns and correlations in programming data. Three SQ metrics (i.e., readability, security, and testability) and machine learning techniques (e.g., random forest (RF) and support vector machine (SVM)) are taken into account to validate the proposed model. The proposed CNN-LDA outperformed its peers across the vast majority of datasets examined. The average overall F-measure for readability, security, and testability are 94%,94% and 93%. The average overall accuracy for readability, security, and testability are 93%,93% and 92%. The superiority of LDA-CNN over the other classifiers was very clear based on a Wilcoxon’s non-parametric statistical test (α=0.05).http://www.sciencedirect.com/science/article/pii/S131915782200026XCode Quality (CQ)Latent Dirichlet Allocation (LDA)Convolutional Neural Networks (CNN)Deep Learning (DL)Classification
spellingShingle Shaymaa E. Sorour
Hanan E. Abdelkader
Karam M. Sallam
Ripon K. Chakrabortty
Michael J. Ryan
Amr Abohany
An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks
Journal of King Saud University: Computer and Information Sciences
Code Quality (CQ)
Latent Dirichlet Allocation (LDA)
Convolutional Neural Networks (CNN)
Deep Learning (DL)
Classification
title An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks
title_full An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks
title_fullStr An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks
title_full_unstemmed An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks
title_short An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks
title_sort analytical code quality methodology using latent dirichlet allocation and convolutional neural networks
topic Code Quality (CQ)
Latent Dirichlet Allocation (LDA)
Convolutional Neural Networks (CNN)
Deep Learning (DL)
Classification
url http://www.sciencedirect.com/science/article/pii/S131915782200026X
work_keys_str_mv AT shaymaaesorour ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks
AT hananeabdelkader ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks
AT karammsallam ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks
AT riponkchakrabortty ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks
AT michaeljryan ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks
AT amrabohany ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks
AT shaymaaesorour analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks
AT hananeabdelkader analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks
AT karammsallam analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks
AT riponkchakrabortty analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks
AT michaeljryan analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks
AT amrabohany analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks