Pseudocode Generation from Source Code Using the BART Model

In the software development process, more than one developer may work on developing the same program and bugs in the program may be fixed by a different developer; therefore, understanding the source code is an important issue. Pseudocode plays an important role in solving this problem, as it helps...

Full description

Bibliographic Details
Main Authors: Anas Alokla, Walaa Gad, Waleed Nazih, Mustafa Aref, Abdel-badeeh Salem
Format: Article
Language:English
Published: MDPI AG 2022-10-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/10/21/3967
_version_ 1797467399605190656
author Anas Alokla
Walaa Gad
Waleed Nazih
Mustafa Aref
Abdel-badeeh Salem
author_facet Anas Alokla
Walaa Gad
Waleed Nazih
Mustafa Aref
Abdel-badeeh Salem
author_sort Anas Alokla
collection DOAJ
description In the software development process, more than one developer may work on developing the same program and bugs in the program may be fixed by a different developer; therefore, understanding the source code is an important issue. Pseudocode plays an important role in solving this problem, as it helps the developer to understand the source code. Recently, transformer-based pre-trained models achieved remarkable results in machine translation, which is similar to pseudocode generation. In this paper, we propose a novel automatic pseudocode generation from the source code based on a pre-trained Bidirectional and Auto-Regressive Transformer (BART) model. We fine-tuned two pre-trained BART models (i.e., large and base) using a dataset containing source code and its equivalent pseudocode. In addition, two benchmark datasets (i.e., Django and SPoC) were used to evaluate the proposed model. The proposed model based on the BART large model outperforms other state-of-the-art models in terms of BLEU measurement by 15% and 27% for Django and SPoC datasets, respectively.
first_indexed 2024-03-09T18:53:05Z
format Article
id doaj.art-9ec0963c527e44bba2e10e8fcfb5615c
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-09T18:53:05Z
publishDate 2022-10-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-9ec0963c527e44bba2e10e8fcfb5615c2023-11-24T05:42:51ZengMDPI AGMathematics2227-73902022-10-011021396710.3390/math10213967Pseudocode Generation from Source Code Using the BART ModelAnas Alokla0Walaa Gad1Waleed Nazih2Mustafa Aref3Abdel-badeeh Salem4Faculty of Computers and Information Sciences, Ain Shams University, Abassia, Cairo 11566, EgyptFaculty of Computers and Information Sciences, Ain Shams University, Abassia, Cairo 11566, EgyptCollege of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al Kharj 11942, Saudi ArabiaFaculty of Computers and Information Sciences, Ain Shams University, Abassia, Cairo 11566, EgyptFaculty of Computers and Information Sciences, Ain Shams University, Abassia, Cairo 11566, EgyptIn the software development process, more than one developer may work on developing the same program and bugs in the program may be fixed by a different developer; therefore, understanding the source code is an important issue. Pseudocode plays an important role in solving this problem, as it helps the developer to understand the source code. Recently, transformer-based pre-trained models achieved remarkable results in machine translation, which is similar to pseudocode generation. In this paper, we propose a novel automatic pseudocode generation from the source code based on a pre-trained Bidirectional and Auto-Regressive Transformer (BART) model. We fine-tuned two pre-trained BART models (i.e., large and base) using a dataset containing source code and its equivalent pseudocode. In addition, two benchmark datasets (i.e., Django and SPoC) were used to evaluate the proposed model. The proposed model based on the BART large model outperforms other state-of-the-art models in terms of BLEU measurement by 15% and 27% for Django and SPoC datasets, respectively.https://www.mdpi.com/2227-7390/10/21/3967pseudocode generationBERTGPTBARTnatural language processingneural machine translation
spellingShingle Anas Alokla
Walaa Gad
Waleed Nazih
Mustafa Aref
Abdel-badeeh Salem
Pseudocode Generation from Source Code Using the BART Model
Mathematics
pseudocode generation
BERT
GPT
BART
natural language processing
neural machine translation
title Pseudocode Generation from Source Code Using the BART Model
title_full Pseudocode Generation from Source Code Using the BART Model
title_fullStr Pseudocode Generation from Source Code Using the BART Model
title_full_unstemmed Pseudocode Generation from Source Code Using the BART Model
title_short Pseudocode Generation from Source Code Using the BART Model
title_sort pseudocode generation from source code using the bart model
topic pseudocode generation
BERT
GPT
BART
natural language processing
neural machine translation
url https://www.mdpi.com/2227-7390/10/21/3967
work_keys_str_mv AT anasalokla pseudocodegenerationfromsourcecodeusingthebartmodel
AT walaagad pseudocodegenerationfromsourcecodeusingthebartmodel
AT waleednazih pseudocodegenerationfromsourcecodeusingthebartmodel
AT mustafaaref pseudocodegenerationfromsourcecodeusingthebartmodel
AT abdelbadeehsalem pseudocodegenerationfromsourcecodeusingthebartmodel