Improving Text-to-Code Generation with Features of Code Graph on GPT-2

Code generation, as a very hot application area of deep learning models for text, consists of two different fields: code-to-code and text-to-code. A recent approach, GraphCodeBERT uses code graph, which is called data flow, and showed good performance improvement. The base model architecture of it i...

Full description

Bibliographic Details
Main Authors: Incheon Paik, Jun-Wei Wang
Format: Article
Language:English
Published: MDPI AG 2021-11-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/10/21/2706
_version_ 1797512613324652544
author Incheon Paik
Jun-Wei Wang
author_facet Incheon Paik
Jun-Wei Wang
author_sort Incheon Paik
collection DOAJ
description Code generation, as a very hot application area of deep learning models for text, consists of two different fields: code-to-code and text-to-code. A recent approach, GraphCodeBERT uses code graph, which is called data flow, and showed good performance improvement. The base model architecture of it is bidirectional encoder representations from transformers (BERT), which uses the encoder part of a transformer. On the other hand, generative pre-trained transformer (GPT)—another multiple transformer architecture—uses the decoder part and shows great performance in the multilayer perceptron model. In this study, we investigate the improvement of code graphs with several variances on GPT-2 to refer to the abstract semantic tree used to collect the features of variables in the code. Here, we mainly focus on GPT-2 with additional features of code graphs that allow the model to learn the effect of the data stream. The experimental phase is divided into two parts: fine-tuning of the existing GPT-2 model, and pre-training from scratch using code data. When we pre-train a new model from scratch, the model produces an outperformed result compared with using the code graph with enough data.
first_indexed 2024-03-10T06:04:15Z
format Article
id doaj.art-0dfd2a59b1ae40249d53cb816a316f1d
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-10T06:04:15Z
publishDate 2021-11-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-0dfd2a59b1ae40249d53cb816a316f1d2023-11-22T20:39:39ZengMDPI AGElectronics2079-92922021-11-011021270610.3390/electronics10212706Improving Text-to-Code Generation with Features of Code Graph on GPT-2Incheon Paik0Jun-Wei Wang1School of Computer Science and Engineering, The University of Aizu, Fukushima 965-8580, JapanDepartment of Computer Science and Information Engineering, ChaoYang University of Technology, Taichung 413310, TaiwanCode generation, as a very hot application area of deep learning models for text, consists of two different fields: code-to-code and text-to-code. A recent approach, GraphCodeBERT uses code graph, which is called data flow, and showed good performance improvement. The base model architecture of it is bidirectional encoder representations from transformers (BERT), which uses the encoder part of a transformer. On the other hand, generative pre-trained transformer (GPT)—another multiple transformer architecture—uses the decoder part and shows great performance in the multilayer perceptron model. In this study, we investigate the improvement of code graphs with several variances on GPT-2 to refer to the abstract semantic tree used to collect the features of variables in the code. Here, we mainly focus on GPT-2 with additional features of code graphs that allow the model to learn the effect of the data stream. The experimental phase is divided into two parts: fine-tuning of the existing GPT-2 model, and pre-training from scratch using code data. When we pre-train a new model from scratch, the model produces an outperformed result compared with using the code graph with enough data.https://www.mdpi.com/2079-9292/10/21/2706code generationdata flowBERTASTGPT-2
spellingShingle Incheon Paik
Jun-Wei Wang
Improving Text-to-Code Generation with Features of Code Graph on GPT-2
Electronics
code generation
data flow
BERT
AST
GPT-2
title Improving Text-to-Code Generation with Features of Code Graph on GPT-2
title_full Improving Text-to-Code Generation with Features of Code Graph on GPT-2
title_fullStr Improving Text-to-Code Generation with Features of Code Graph on GPT-2
title_full_unstemmed Improving Text-to-Code Generation with Features of Code Graph on GPT-2
title_short Improving Text-to-Code Generation with Features of Code Graph on GPT-2
title_sort improving text to code generation with features of code graph on gpt 2
topic code generation
data flow
BERT
AST
GPT-2
url https://www.mdpi.com/2079-9292/10/21/2706
work_keys_str_mv AT incheonpaik improvingtexttocodegenerationwithfeaturesofcodegraphongpt2
AT junweiwang improvingtexttocodegenerationwithfeaturesofcodegraphongpt2