Retrieval-Based Transformer Pseudocode Generation
The comprehension of source code is very difficult, especially if the programmer is not familiar with the programming language. Pseudocode explains and describes code contents that are based on the semantic analysis and understanding of the source code. In this paper, a novel retrieval-based transfo...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-02-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/10/4/604 |
_version_ | 1797478361605341184 |
---|---|
author | Anas Alokla Walaa Gad Waleed Nazih Mustafa Aref Abdel-Badeeh Salem |
author_facet | Anas Alokla Walaa Gad Waleed Nazih Mustafa Aref Abdel-Badeeh Salem |
author_sort | Anas Alokla |
collection | DOAJ |
description | The comprehension of source code is very difficult, especially if the programmer is not familiar with the programming language. Pseudocode explains and describes code contents that are based on the semantic analysis and understanding of the source code. In this paper, a novel retrieval-based transformer pseudocode generation model is proposed. The proposed model adopts different retrieval similarity methods and neural machine translation to generate pseudocode. The proposed model handles words of low frequency and words that do not exist in the training dataset. It consists of three steps. First, we retrieve the sentences that are similar to the input sentence using different similarity methods. Second, pass the source code retrieved (input retrieved) to the deep learning model based on the transformer to generate the pseudocode retrieved. Third, the replacement process is performed to obtain the target pseudo code. The proposed model is evaluated using Django and SPoC datasets. The experiments show promising performance results compared to other language models of machine translation. It reaches 61.96 and 50.28 in terms of BLEU performance measures for Django and SPoC, respectively. |
first_indexed | 2024-03-09T21:30:50Z |
format | Article |
id | doaj.art-bbcf3cdf0c6f4e2db61568aab23f4034 |
institution | Directory Open Access Journal |
issn | 2227-7390 |
language | English |
last_indexed | 2024-03-09T21:30:50Z |
publishDate | 2022-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj.art-bbcf3cdf0c6f4e2db61568aab23f40342023-11-23T20:57:20ZengMDPI AGMathematics2227-73902022-02-0110460410.3390/math10040604Retrieval-Based Transformer Pseudocode GenerationAnas Alokla0Walaa Gad1Waleed Nazih2Mustafa Aref3Abdel-Badeeh Salem4Faculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, EgyptFaculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, EgyptCollege of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj 11942, Saudi ArabiaFaculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, EgyptFaculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, EgyptThe comprehension of source code is very difficult, especially if the programmer is not familiar with the programming language. Pseudocode explains and describes code contents that are based on the semantic analysis and understanding of the source code. In this paper, a novel retrieval-based transformer pseudocode generation model is proposed. The proposed model adopts different retrieval similarity methods and neural machine translation to generate pseudocode. The proposed model handles words of low frequency and words that do not exist in the training dataset. It consists of three steps. First, we retrieve the sentences that are similar to the input sentence using different similarity methods. Second, pass the source code retrieved (input retrieved) to the deep learning model based on the transformer to generate the pseudocode retrieved. Third, the replacement process is performed to obtain the target pseudo code. The proposed model is evaluated using Django and SPoC datasets. The experiments show promising performance results compared to other language models of machine translation. It reaches 61.96 and 50.28 in terms of BLEU performance measures for Django and SPoC, respectively.https://www.mdpi.com/2227-7390/10/4/604natural language processingretrieval-basedneural machine translationpseudocode generationdeep learning-based transformer |
spellingShingle | Anas Alokla Walaa Gad Waleed Nazih Mustafa Aref Abdel-Badeeh Salem Retrieval-Based Transformer Pseudocode Generation Mathematics natural language processing retrieval-based neural machine translation pseudocode generation deep learning-based transformer |
title | Retrieval-Based Transformer Pseudocode Generation |
title_full | Retrieval-Based Transformer Pseudocode Generation |
title_fullStr | Retrieval-Based Transformer Pseudocode Generation |
title_full_unstemmed | Retrieval-Based Transformer Pseudocode Generation |
title_short | Retrieval-Based Transformer Pseudocode Generation |
title_sort | retrieval based transformer pseudocode generation |
topic | natural language processing retrieval-based neural machine translation pseudocode generation deep learning-based transformer |
url | https://www.mdpi.com/2227-7390/10/4/604 |
work_keys_str_mv | AT anasalokla retrievalbasedtransformerpseudocodegeneration AT walaagad retrievalbasedtransformerpseudocodegeneration AT waleednazih retrievalbasedtransformerpseudocodegeneration AT mustafaaref retrievalbasedtransformerpseudocodegeneration AT abdelbadeehsalem retrievalbasedtransformerpseudocodegeneration |