Deep Code-Comment Understanding and Assessment

Code comments are a key software component for program comprehension and software maintainability. High-quality code and comments are urgently needed by data-driven models widely used in tasks like code summarization. Many existing approaches for assessing the quality of comments are machine learnin...

Full description

Bibliographic Details
Main Authors: Deze Wang, Yong Guo, Wei Dong, Zhiming Wang, Haoran Liu, Shanshan Li
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8920024/
_version_ 1818910115565142016
author Deze Wang
Yong Guo
Wei Dong
Zhiming Wang
Haoran Liu
Shanshan Li
author_facet Deze Wang
Yong Guo
Wei Dong
Zhiming Wang
Haoran Liu
Shanshan Li
author_sort Deze Wang
collection DOAJ
description Code comments are a key software component for program comprehension and software maintainability. High-quality code and comments are urgently needed by data-driven models widely used in tasks like code summarization. Many existing approaches for assessing the quality of comments are machine learning based classification algorithms or rely on heuristic rules. These approaches are difficult to capture the complicated features of text data and are often limited in accuracy, efficiency, and generalization ability. In this paper, we convert the quality assessment of code comments into a classification problem based on the multi-input neural network. We summarize the input, the code and comments, into vectors using the attention-based Bi-LSTM model and the weighted GloVe model, respectively, and concatenate the code vectors and the comment vectors as the input of the Multiple-Layer Perceptron classifier for the comment quality assessment. Experimental results show that our approach, in general, outperforms the previous technique, on both our labeled dataset and the public dataset, with the F1-score of 96.91% and 91.90%, respectively. Using the training set and the testing set from distinct sources, our approach can still achieve reasonable performance, which demonstrates its generalization ability.
first_indexed 2024-12-19T22:37:41Z
format Article
id doaj.art-60dfcd0949bd40128ab48bf6d9d0cc8d
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-19T22:37:41Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-60dfcd0949bd40128ab48bf6d9d0cc8d2022-12-21T20:03:09ZengIEEEIEEE Access2169-35362019-01-01717420017420910.1109/ACCESS.2019.29574248920024Deep Code-Comment Understanding and AssessmentDeze Wang0https://orcid.org/0000-0001-7935-6840Yong Guo1https://orcid.org/0000-0001-5903-5302Wei Dong2https://orcid.org/0000-0002-8033-7943Zhiming Wang3https://orcid.org/0000-0002-4933-3303Haoran Liu4https://orcid.org/0000-0002-4493-4265Shanshan Li5https://orcid.org/0000-0003-0798-974XCollege of Computer Science, National University of Defense Technology, Changsha, ChinaCollege of Systems Engineering, National University of Defense Technology, Changsha, ChinaCollege of Computer Science, National University of Defense Technology, Changsha, ChinaCollege of Computer Science, National University of Defense Technology, Changsha, ChinaCollege of Computer Science, National University of Defense Technology, Changsha, ChinaCollege of Computer Science, National University of Defense Technology, Changsha, ChinaCode comments are a key software component for program comprehension and software maintainability. High-quality code and comments are urgently needed by data-driven models widely used in tasks like code summarization. Many existing approaches for assessing the quality of comments are machine learning based classification algorithms or rely on heuristic rules. These approaches are difficult to capture the complicated features of text data and are often limited in accuracy, efficiency, and generalization ability. In this paper, we convert the quality assessment of code comments into a classification problem based on the multi-input neural network. We summarize the input, the code and comments, into vectors using the attention-based Bi-LSTM model and the weighted GloVe model, respectively, and concatenate the code vectors and the comment vectors as the input of the Multiple-Layer Perceptron classifier for the comment quality assessment. Experimental results show that our approach, in general, outperforms the previous technique, on both our labeled dataset and the public dataset, with the F1-score of 96.91% and 91.90%, respectively. Using the training set and the testing set from distinct sources, our approach can still achieve reasonable performance, which demonstrates its generalization ability.https://ieeexplore.ieee.org/document/8920024/Code commentsource codemulti-input neural networktext classification
spellingShingle Deze Wang
Yong Guo
Wei Dong
Zhiming Wang
Haoran Liu
Shanshan Li
Deep Code-Comment Understanding and Assessment
IEEE Access
Code comment
source code
multi-input neural network
text classification
title Deep Code-Comment Understanding and Assessment
title_full Deep Code-Comment Understanding and Assessment
title_fullStr Deep Code-Comment Understanding and Assessment
title_full_unstemmed Deep Code-Comment Understanding and Assessment
title_short Deep Code-Comment Understanding and Assessment
title_sort deep code comment understanding and assessment
topic Code comment
source code
multi-input neural network
text classification
url https://ieeexplore.ieee.org/document/8920024/
work_keys_str_mv AT dezewang deepcodecommentunderstandingandassessment
AT yongguo deepcodecommentunderstandingandassessment
AT weidong deepcodecommentunderstandingandassessment
AT zhimingwang deepcodecommentunderstandingandassessment
AT haoranliu deepcodecommentunderstandingandassessment
AT shanshanli deepcodecommentunderstandingandassessment