Deep Code-Comment Understanding and Assessment
Code comments are a key software component for program comprehension and software maintainability. High-quality code and comments are urgently needed by data-driven models widely used in tasks like code summarization. Many existing approaches for assessing the quality of comments are machine learnin...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8920024/ |
_version_ | 1818910115565142016 |
---|---|
author | Deze Wang Yong Guo Wei Dong Zhiming Wang Haoran Liu Shanshan Li |
author_facet | Deze Wang Yong Guo Wei Dong Zhiming Wang Haoran Liu Shanshan Li |
author_sort | Deze Wang |
collection | DOAJ |
description | Code comments are a key software component for program comprehension and software maintainability. High-quality code and comments are urgently needed by data-driven models widely used in tasks like code summarization. Many existing approaches for assessing the quality of comments are machine learning based classification algorithms or rely on heuristic rules. These approaches are difficult to capture the complicated features of text data and are often limited in accuracy, efficiency, and generalization ability. In this paper, we convert the quality assessment of code comments into a classification problem based on the multi-input neural network. We summarize the input, the code and comments, into vectors using the attention-based Bi-LSTM model and the weighted GloVe model, respectively, and concatenate the code vectors and the comment vectors as the input of the Multiple-Layer Perceptron classifier for the comment quality assessment. Experimental results show that our approach, in general, outperforms the previous technique, on both our labeled dataset and the public dataset, with the F1-score of 96.91% and 91.90%, respectively. Using the training set and the testing set from distinct sources, our approach can still achieve reasonable performance, which demonstrates its generalization ability. |
first_indexed | 2024-12-19T22:37:41Z |
format | Article |
id | doaj.art-60dfcd0949bd40128ab48bf6d9d0cc8d |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-19T22:37:41Z |
publishDate | 2019-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-60dfcd0949bd40128ab48bf6d9d0cc8d2022-12-21T20:03:09ZengIEEEIEEE Access2169-35362019-01-01717420017420910.1109/ACCESS.2019.29574248920024Deep Code-Comment Understanding and AssessmentDeze Wang0https://orcid.org/0000-0001-7935-6840Yong Guo1https://orcid.org/0000-0001-5903-5302Wei Dong2https://orcid.org/0000-0002-8033-7943Zhiming Wang3https://orcid.org/0000-0002-4933-3303Haoran Liu4https://orcid.org/0000-0002-4493-4265Shanshan Li5https://orcid.org/0000-0003-0798-974XCollege of Computer Science, National University of Defense Technology, Changsha, ChinaCollege of Systems Engineering, National University of Defense Technology, Changsha, ChinaCollege of Computer Science, National University of Defense Technology, Changsha, ChinaCollege of Computer Science, National University of Defense Technology, Changsha, ChinaCollege of Computer Science, National University of Defense Technology, Changsha, ChinaCollege of Computer Science, National University of Defense Technology, Changsha, ChinaCode comments are a key software component for program comprehension and software maintainability. High-quality code and comments are urgently needed by data-driven models widely used in tasks like code summarization. Many existing approaches for assessing the quality of comments are machine learning based classification algorithms or rely on heuristic rules. These approaches are difficult to capture the complicated features of text data and are often limited in accuracy, efficiency, and generalization ability. In this paper, we convert the quality assessment of code comments into a classification problem based on the multi-input neural network. We summarize the input, the code and comments, into vectors using the attention-based Bi-LSTM model and the weighted GloVe model, respectively, and concatenate the code vectors and the comment vectors as the input of the Multiple-Layer Perceptron classifier for the comment quality assessment. Experimental results show that our approach, in general, outperforms the previous technique, on both our labeled dataset and the public dataset, with the F1-score of 96.91% and 91.90%, respectively. Using the training set and the testing set from distinct sources, our approach can still achieve reasonable performance, which demonstrates its generalization ability.https://ieeexplore.ieee.org/document/8920024/Code commentsource codemulti-input neural networktext classification |
spellingShingle | Deze Wang Yong Guo Wei Dong Zhiming Wang Haoran Liu Shanshan Li Deep Code-Comment Understanding and Assessment IEEE Access Code comment source code multi-input neural network text classification |
title | Deep Code-Comment Understanding and Assessment |
title_full | Deep Code-Comment Understanding and Assessment |
title_fullStr | Deep Code-Comment Understanding and Assessment |
title_full_unstemmed | Deep Code-Comment Understanding and Assessment |
title_short | Deep Code-Comment Understanding and Assessment |
title_sort | deep code comment understanding and assessment |
topic | Code comment source code multi-input neural network text classification |
url | https://ieeexplore.ieee.org/document/8920024/ |
work_keys_str_mv | AT dezewang deepcodecommentunderstandingandassessment AT yongguo deepcodecommentunderstandingandassessment AT weidong deepcodecommentunderstandingandassessment AT zhimingwang deepcodecommentunderstandingandassessment AT haoranliu deepcodecommentunderstandingandassessment AT shanshanli deepcodecommentunderstandingandassessment |