Deep Code-Comment Understanding and Assessment

Code comments are a key software component for program comprehension and software maintainability. High-quality code and comments are urgently needed by data-driven models widely used in tasks like code summarization. Many existing approaches for assessing the quality of comments are machine learnin...

Full description

Bibliographic Details
Main Authors:	Deze Wang, Yong Guo, Wei Dong, Zhiming Wang, Haoran Liu, Shanshan Li
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Code comment source code multi-input neural network text classification
Online Access:	https://ieeexplore.ieee.org/document/8920024/

_version_	1830345119392333824
author	Deze Wang Yong Guo Wei Dong Zhiming Wang Haoran Liu Shanshan Li
author_facet	Deze Wang Yong Guo Wei Dong Zhiming Wang Haoran Liu Shanshan Li
author_sort	Deze Wang
collection	DOAJ
description	Code comments are a key software component for program comprehension and software maintainability. High-quality code and comments are urgently needed by data-driven models widely used in tasks like code summarization. Many existing approaches for assessing the quality of comments are machine learning based classification algorithms or rely on heuristic rules. These approaches are difficult to capture the complicated features of text data and are often limited in accuracy, efficiency, and generalization ability. In this paper, we convert the quality assessment of code comments into a classification problem based on the multi-input neural network. We summarize the input, the code and comments, into vectors using the attention-based Bi-LSTM model and the weighted GloVe model, respectively, and concatenate the code vectors and the comment vectors as the input of the Multiple-Layer Perceptron classifier for the comment quality assessment. Experimental results show that our approach, in general, outperforms the previous technique, on both our labeled dataset and the public dataset, with the F1-score of 96.91% and 91.90%, respectively. Using the training set and the testing set from distinct sources, our approach can still achieve reasonable performance, which demonstrates its generalization ability.
first_indexed	2024-12-19T22:37:41Z
format	Article
id	doaj.art-60dfcd0949bd40128ab48bf6d9d0cc8d
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-19T22:37:41Z
publishDate	2019-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-60dfcd0949bd40128ab48bf6d9d0cc8d2022-12-21T20:03:09ZengIEEEIEEE Access2169-35362019-01-01717420017420910.1109/ACCESS.2019.29574248920024Deep Code-Comment Understanding and AssessmentDeze Wang0https://orcid.org/0000-0001-7935-6840Yong Guo1https://orcid.org/0000-0001-5903-5302Wei Dong2https://orcid.org/0000-0002-8033-7943Zhiming Wang3https://orcid.org/0000-0002-4933-3303Haoran Liu4https://orcid.org/0000-0002-4493-4265Shanshan Li5https://orcid.org/0000-0003-0798-974XCollege of Computer Science, National University of Defense Technology, Changsha, ChinaCollege of Systems Engineering, National University of Defense Technology, Changsha, ChinaCollege of Computer Science, National University of Defense Technology, Changsha, ChinaCollege of Computer Science, National University of Defense Technology, Changsha, ChinaCollege of Computer Science, National University of Defense Technology, Changsha, ChinaCollege of Computer Science, National University of Defense Technology, Changsha, ChinaCode comments are a key software component for program comprehension and software maintainability. High-quality code and comments are urgently needed by data-driven models widely used in tasks like code summarization. Many existing approaches for assessing the quality of comments are machine learning based classification algorithms or rely on heuristic rules. These approaches are difficult to capture the complicated features of text data and are often limited in accuracy, efficiency, and generalization ability. In this paper, we convert the quality assessment of code comments into a classification problem based on the multi-input neural network. We summarize the input, the code and comments, into vectors using the attention-based Bi-LSTM model and the weighted GloVe model, respectively, and concatenate the code vectors and the comment vectors as the input of the Multiple-Layer Perceptron classifier for the comment quality assessment. Experimental results show that our approach, in general, outperforms the previous technique, on both our labeled dataset and the public dataset, with the F1-score of 96.91% and 91.90%, respectively. Using the training set and the testing set from distinct sources, our approach can still achieve reasonable performance, which demonstrates its generalization ability.https://ieeexplore.ieee.org/document/8920024/Code commentsource codemulti-input neural networktext classification
spellingShingle	Deze Wang Yong Guo Wei Dong Zhiming Wang Haoran Liu Shanshan Li Deep Code-Comment Understanding and Assessment IEEE Access Code comment source code multi-input neural network text classification
title	Deep Code-Comment Understanding and Assessment
title_full	Deep Code-Comment Understanding and Assessment
title_fullStr	Deep Code-Comment Understanding and Assessment
title_full_unstemmed	Deep Code-Comment Understanding and Assessment
title_short	Deep Code-Comment Understanding and Assessment
title_sort	deep code comment understanding and assessment
topic	Code comment source code multi-input neural network text classification
url	https://ieeexplore.ieee.org/document/8920024/
work_keys_str_mv	AT dezewang deepcodecommentunderstandingandassessment AT yongguo deepcodecommentunderstandingandassessment AT weidong deepcodecommentunderstandingandassessment AT zhimingwang deepcodecommentunderstandingandassessment AT haoranliu deepcodecommentunderstandingandassessment AT shanshanli deepcodecommentunderstandingandassessment

Deep Code-Comment Understanding and Assessment

Similar Items