Retrieval-Based Transformer Pseudocode Generation

The comprehension of source code is very difficult, especially if the programmer is not familiar with the programming language. Pseudocode explains and describes code contents that are based on the semantic analysis and understanding of the source code. In this paper, a novel retrieval-based transfo...

Full description

Bibliographic Details
Main Authors: Anas Alokla, Walaa Gad, Waleed Nazih, Mustafa Aref, Abdel-Badeeh Salem
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/10/4/604
_version_ 1797478361605341184
author Anas Alokla
Walaa Gad
Waleed Nazih
Mustafa Aref
Abdel-Badeeh Salem
author_facet Anas Alokla
Walaa Gad
Waleed Nazih
Mustafa Aref
Abdel-Badeeh Salem
author_sort Anas Alokla
collection DOAJ
description The comprehension of source code is very difficult, especially if the programmer is not familiar with the programming language. Pseudocode explains and describes code contents that are based on the semantic analysis and understanding of the source code. In this paper, a novel retrieval-based transformer pseudocode generation model is proposed. The proposed model adopts different retrieval similarity methods and neural machine translation to generate pseudocode. The proposed model handles words of low frequency and words that do not exist in the training dataset. It consists of three steps. First, we retrieve the sentences that are similar to the input sentence using different similarity methods. Second, pass the source code retrieved (input retrieved) to the deep learning model based on the transformer to generate the pseudocode retrieved. Third, the replacement process is performed to obtain the target pseudo code. The proposed model is evaluated using Django and SPoC datasets. The experiments show promising performance results compared to other language models of machine translation. It reaches 61.96 and 50.28 in terms of BLEU performance measures for Django and SPoC, respectively.
first_indexed 2024-03-09T21:30:50Z
format Article
id doaj.art-bbcf3cdf0c6f4e2db61568aab23f4034
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-09T21:30:50Z
publishDate 2022-02-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-bbcf3cdf0c6f4e2db61568aab23f40342023-11-23T20:57:20ZengMDPI AGMathematics2227-73902022-02-0110460410.3390/math10040604Retrieval-Based Transformer Pseudocode GenerationAnas Alokla0Walaa Gad1Waleed Nazih2Mustafa Aref3Abdel-Badeeh Salem4Faculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, EgyptFaculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, EgyptCollege of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj 11942, Saudi ArabiaFaculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, EgyptFaculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, EgyptThe comprehension of source code is very difficult, especially if the programmer is not familiar with the programming language. Pseudocode explains and describes code contents that are based on the semantic analysis and understanding of the source code. In this paper, a novel retrieval-based transformer pseudocode generation model is proposed. The proposed model adopts different retrieval similarity methods and neural machine translation to generate pseudocode. The proposed model handles words of low frequency and words that do not exist in the training dataset. It consists of three steps. First, we retrieve the sentences that are similar to the input sentence using different similarity methods. Second, pass the source code retrieved (input retrieved) to the deep learning model based on the transformer to generate the pseudocode retrieved. Third, the replacement process is performed to obtain the target pseudo code. The proposed model is evaluated using Django and SPoC datasets. The experiments show promising performance results compared to other language models of machine translation. It reaches 61.96 and 50.28 in terms of BLEU performance measures for Django and SPoC, respectively.https://www.mdpi.com/2227-7390/10/4/604natural language processingretrieval-basedneural machine translationpseudocode generationdeep learning-based transformer
spellingShingle Anas Alokla
Walaa Gad
Waleed Nazih
Mustafa Aref
Abdel-Badeeh Salem
Retrieval-Based Transformer Pseudocode Generation
Mathematics
natural language processing
retrieval-based
neural machine translation
pseudocode generation
deep learning-based transformer
title Retrieval-Based Transformer Pseudocode Generation
title_full Retrieval-Based Transformer Pseudocode Generation
title_fullStr Retrieval-Based Transformer Pseudocode Generation
title_full_unstemmed Retrieval-Based Transformer Pseudocode Generation
title_short Retrieval-Based Transformer Pseudocode Generation
title_sort retrieval based transformer pseudocode generation
topic natural language processing
retrieval-based
neural machine translation
pseudocode generation
deep learning-based transformer
url https://www.mdpi.com/2227-7390/10/4/604
work_keys_str_mv AT anasalokla retrievalbasedtransformerpseudocodegeneration
AT walaagad retrievalbasedtransformerpseudocodegeneration
AT waleednazih retrievalbasedtransformerpseudocodegeneration
AT mustafaaref retrievalbasedtransformerpseudocodegeneration
AT abdelbadeehsalem retrievalbasedtransformerpseudocodegeneration