Bi-LSTM-Based Neural Source Code Summarization

Code summarization is a task that is often employed by software developers for fixing code or reusing code. Software documentation is essential when it comes to software maintenance. The highest cost in software development goes to maintenance because of the difficulty of code modification. To help...

Full description

Bibliographic Details
Main Authors: Sarah Aljumah, Lamia Berriche
Format: Article
Language:English
Published: MDPI AG 2022-12-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/24/12587
_version_ 1797461617537974272
author Sarah Aljumah
Lamia Berriche
author_facet Sarah Aljumah
Lamia Berriche
author_sort Sarah Aljumah
collection DOAJ
description Code summarization is a task that is often employed by software developers for fixing code or reusing code. Software documentation is essential when it comes to software maintenance. The highest cost in software development goes to maintenance because of the difficulty of code modification. To help in reducing the cost and time spent on software development and maintenance, we introduce an automated comment summarization and commenting technique using state-of-the-art techniques in summarization. We use deep neural networks, specifically bidirectional long short-term memory (Bi-LSTM), combined with an attention model to enhance performance. In this study, we propose two different scenarios: one that uses the code text and the structure of the code represented in an abstract syntax tree (AST) and another that uses only code text. We propose two encoder-based models for the first scenario that encodes the code text and the AST independently. Previous works have used different techniques in deep neural networks to generate comments. This study’s proposed methodologies scored higher than previous works based on the gated recurrent unit encoder. We conducted our experiment on a dataset of 2.1 million pairs of Java methods and comments. Additionally, we showed that the code structure is beneficial for methods’ signatures featuring unclear words.
first_indexed 2024-03-09T17:22:55Z
format Article
id doaj.art-4ef46a1bfeba483dab3fd7bafc576b73
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T17:22:55Z
publishDate 2022-12-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-4ef46a1bfeba483dab3fd7bafc576b732023-11-24T13:01:12ZengMDPI AGApplied Sciences2076-34172022-12-0112241258710.3390/app122412587Bi-LSTM-Based Neural Source Code SummarizationSarah Aljumah0Lamia Berriche1College of Computer & Information Sciences, Prince Sultan University, Riyadh 12435, Saudi ArabiaCollege of Computer & Information Sciences, Prince Sultan University, Riyadh 12435, Saudi ArabiaCode summarization is a task that is often employed by software developers for fixing code or reusing code. Software documentation is essential when it comes to software maintenance. The highest cost in software development goes to maintenance because of the difficulty of code modification. To help in reducing the cost and time spent on software development and maintenance, we introduce an automated comment summarization and commenting technique using state-of-the-art techniques in summarization. We use deep neural networks, specifically bidirectional long short-term memory (Bi-LSTM), combined with an attention model to enhance performance. In this study, we propose two different scenarios: one that uses the code text and the structure of the code represented in an abstract syntax tree (AST) and another that uses only code text. We propose two encoder-based models for the first scenario that encodes the code text and the AST independently. Previous works have used different techniques in deep neural networks to generate comments. This study’s proposed methodologies scored higher than previous works based on the gated recurrent unit encoder. We conducted our experiment on a dataset of 2.1 million pairs of Java methods and comments. Additionally, we showed that the code structure is beneficial for methods’ signatures featuring unclear words.https://www.mdpi.com/2076-3417/12/24/12587software engineeringneural networkcode summarizationsoftware developmentsoftware maintenancedeep learning
spellingShingle Sarah Aljumah
Lamia Berriche
Bi-LSTM-Based Neural Source Code Summarization
Applied Sciences
software engineering
neural network
code summarization
software development
software maintenance
deep learning
title Bi-LSTM-Based Neural Source Code Summarization
title_full Bi-LSTM-Based Neural Source Code Summarization
title_fullStr Bi-LSTM-Based Neural Source Code Summarization
title_full_unstemmed Bi-LSTM-Based Neural Source Code Summarization
title_short Bi-LSTM-Based Neural Source Code Summarization
title_sort bi lstm based neural source code summarization
topic software engineering
neural network
code summarization
software development
software maintenance
deep learning
url https://www.mdpi.com/2076-3417/12/24/12587
work_keys_str_mv AT sarahaljumah bilstmbasedneuralsourcecodesummarization
AT lamiaberriche bilstmbasedneuralsourcecodesummarization