An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks
Recently, Code Quality (CQ) has become critical in a wide range of organizations and in many areas from academia to industry. CQ, in terms of readability, security, and testability, is a major goal throughout the software development process because it affects overall Software Quality (SQ) in terms...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2022-09-01
|
Series: | Journal of King Saud University: Computer and Information Sciences |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S131915782200026X |
_version_ | 1828356985092833280 |
---|---|
author | Shaymaa E. Sorour Hanan E. Abdelkader Karam M. Sallam Ripon K. Chakrabortty Michael J. Ryan Amr Abohany |
author_facet | Shaymaa E. Sorour Hanan E. Abdelkader Karam M. Sallam Ripon K. Chakrabortty Michael J. Ryan Amr Abohany |
author_sort | Shaymaa E. Sorour |
collection | DOAJ |
description | Recently, Code Quality (CQ) has become critical in a wide range of organizations and in many areas from academia to industry. CQ, in terms of readability, security, and testability, is a major goal throughout the software development process because it affects overall Software Quality (SQ) in terms of subsequent releases, maintenance, and updates. It is particularly important for the development of safety critical systems. Existing studies on CQ have several shortcomings in that they are based on incomplete information about the source code, and tend to focus on only one feature, which is likely to determine the performance of the model. Moreover, these considerations often limit obtaining high accuracy because there is no strong relationship between the input data and the output data. Thus, it is necessary to design an effective and efficient SQ measurement system for measuring multiple quality factors. To that end, we propose a deep learning framework that employed a Latent Dirichlet Allocation (LDA) with Convolutional Neural Networks (CNN), called CNN-LDA, to classify input data into topics that are related to CQ features and to identify hidden patterns and correlations in programming data. Three SQ metrics (i.e., readability, security, and testability) and machine learning techniques (e.g., random forest (RF) and support vector machine (SVM)) are taken into account to validate the proposed model. The proposed CNN-LDA outperformed its peers across the vast majority of datasets examined. The average overall F-measure for readability, security, and testability are 94%,94% and 93%. The average overall accuracy for readability, security, and testability are 93%,93% and 92%. The superiority of LDA-CNN over the other classifiers was very clear based on a Wilcoxon’s non-parametric statistical test (α=0.05). |
first_indexed | 2024-04-14T03:08:00Z |
format | Article |
id | doaj.art-3a57f73f2aca4d828a309dc129728a7f |
institution | Directory Open Access Journal |
issn | 1319-1578 |
language | English |
last_indexed | 2024-04-14T03:08:00Z |
publishDate | 2022-09-01 |
publisher | Elsevier |
record_format | Article |
series | Journal of King Saud University: Computer and Information Sciences |
spelling | doaj.art-3a57f73f2aca4d828a309dc129728a7f2022-12-22T02:15:40ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782022-09-0134859795997An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural NetworksShaymaa E. Sorour0Hanan E. Abdelkader1Karam M. Sallam2Ripon K. Chakrabortty3Michael J. Ryan4Amr Abohany5Faculty of Specific Education, Kafrelsheikh University, EgyptFaculty of Specific Education, Mansoura University, EgyptFaculty of Science and Information Technology, University of Canberra, Australia; Faculty of Computers and Informatics, Zagazig University, Egypt; Corresponding author at: Faculty of Computers and Informatics, Zagazig University, Egypt.School of Engineering and IT, University of New South Wales, Canberra, AustraliaSchool of Engineering and IT, University of New South Wales, Canberra, AustraliaFaculty of Computing and Information, Kafrelsheikh University, EgyptRecently, Code Quality (CQ) has become critical in a wide range of organizations and in many areas from academia to industry. CQ, in terms of readability, security, and testability, is a major goal throughout the software development process because it affects overall Software Quality (SQ) in terms of subsequent releases, maintenance, and updates. It is particularly important for the development of safety critical systems. Existing studies on CQ have several shortcomings in that they are based on incomplete information about the source code, and tend to focus on only one feature, which is likely to determine the performance of the model. Moreover, these considerations often limit obtaining high accuracy because there is no strong relationship between the input data and the output data. Thus, it is necessary to design an effective and efficient SQ measurement system for measuring multiple quality factors. To that end, we propose a deep learning framework that employed a Latent Dirichlet Allocation (LDA) with Convolutional Neural Networks (CNN), called CNN-LDA, to classify input data into topics that are related to CQ features and to identify hidden patterns and correlations in programming data. Three SQ metrics (i.e., readability, security, and testability) and machine learning techniques (e.g., random forest (RF) and support vector machine (SVM)) are taken into account to validate the proposed model. The proposed CNN-LDA outperformed its peers across the vast majority of datasets examined. The average overall F-measure for readability, security, and testability are 94%,94% and 93%. The average overall accuracy for readability, security, and testability are 93%,93% and 92%. The superiority of LDA-CNN over the other classifiers was very clear based on a Wilcoxon’s non-parametric statistical test (α=0.05).http://www.sciencedirect.com/science/article/pii/S131915782200026XCode Quality (CQ)Latent Dirichlet Allocation (LDA)Convolutional Neural Networks (CNN)Deep Learning (DL)Classification |
spellingShingle | Shaymaa E. Sorour Hanan E. Abdelkader Karam M. Sallam Ripon K. Chakrabortty Michael J. Ryan Amr Abohany An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks Journal of King Saud University: Computer and Information Sciences Code Quality (CQ) Latent Dirichlet Allocation (LDA) Convolutional Neural Networks (CNN) Deep Learning (DL) Classification |
title | An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks |
title_full | An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks |
title_fullStr | An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks |
title_full_unstemmed | An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks |
title_short | An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks |
title_sort | analytical code quality methodology using latent dirichlet allocation and convolutional neural networks |
topic | Code Quality (CQ) Latent Dirichlet Allocation (LDA) Convolutional Neural Networks (CNN) Deep Learning (DL) Classification |
url | http://www.sciencedirect.com/science/article/pii/S131915782200026X |
work_keys_str_mv | AT shaymaaesorour ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT hananeabdelkader ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT karammsallam ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT riponkchakrabortty ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT michaeljryan ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT amrabohany ananalyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT shaymaaesorour analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT hananeabdelkader analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT karammsallam analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT riponkchakrabortty analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT michaeljryan analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks AT amrabohany analyticalcodequalitymethodologyusinglatentdirichletallocationandconvolutionalneuralnetworks |