DiffuD2T: Empowering Data-to-Text Generation with Diffusion

Surrounded by structured data, such as medical data, financial data, knowledge bases, etc., data-to-text generation has become an important natural language processing task that can help people better understand the meaning of those data by providing them with user-friendly text. Existing methods fo...

Full description

Bibliographic Details
Main Authors:	Heng Gong, Xiaocheng Feng, Bing Qin
Format:	Article
Language:	English
Published:	MDPI AG 2023-05-01
Series:	Electronics
Subjects:	diffusion data-to-text generation natural language processing artificial intelligence
Online Access:	https://www.mdpi.com/2079-9292/12/9/2136

_version_	1797602791812759552
author	Heng Gong Xiaocheng Feng Bing Qin
author_facet	Heng Gong Xiaocheng Feng Bing Qin
author_sort	Heng Gong
collection	DOAJ
description	Surrounded by structured data, such as medical data, financial data, knowledge bases, etc., data-to-text generation has become an important natural language processing task that can help people better understand the meaning of those data by providing them with user-friendly text. Existing methods for data-to-text generation show promising results in tackling two major challenges: content planning and surface realization, which transform structured data into fluent text. However, they lack an iterative refinement process for generating text, which can enable the model to perfect the text step-by-step while accepting control over the process. In this paper, we explore enhancing data-to-text generation with an iterative refinement process via diffusion. We have four main contributions: (1) we use the diffusion model to improve the prefix tuning for data-to-text generation; (2) we propose a look-ahead guiding loss to supervise the iterative refinement process for better text generation; (3) we extract content plans from reference text and propose a planning-then-writing pipeline to give the model content planning ability; and (4) we conducted experiments on three data-to-text generation datasets and both automatic evaluation criteria (BLEU, NIST, METEOR, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>ROUGE</mi><mi>L</mi></msub></semantics></math></inline-formula>, CIDEr, TER, MoverScore, BLEURT, and BERTScore) and human evaluation criteria (Quality and Naturalness) show the effectiveness of our model. Our model can improve the competitive prefix tuning method by 2.19% in terms of a widely-used automatic evaluation criterion BLEU (BiLingual Evaluation Understudy) on WebNLG dataset with GPT-2 Large as the pretrained language model backbone. Human evaluation criteria also show that our model can improve the quality and naturalness of the generated text across all three datasets.
first_indexed	2024-03-11T04:20:29Z
format	Article
id	doaj.art-c100428eef0f4a068c6e7d7b34236ac1
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-11T04:20:29Z
publishDate	2023-05-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-c100428eef0f4a068c6e7d7b34236ac12023-11-17T22:49:09ZengMDPI AGElectronics2079-92922023-05-01129213610.3390/electronics12092136DiffuD2T: Empowering Data-to-Text Generation with DiffusionHeng Gong0Xiaocheng Feng1Bing Qin2School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, ChinaSchool of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, ChinaSchool of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, ChinaSurrounded by structured data, such as medical data, financial data, knowledge bases, etc., data-to-text generation has become an important natural language processing task that can help people better understand the meaning of those data by providing them with user-friendly text. Existing methods for data-to-text generation show promising results in tackling two major challenges: content planning and surface realization, which transform structured data into fluent text. However, they lack an iterative refinement process for generating text, which can enable the model to perfect the text step-by-step while accepting control over the process. In this paper, we explore enhancing data-to-text generation with an iterative refinement process via diffusion. We have four main contributions: (1) we use the diffusion model to improve the prefix tuning for data-to-text generation; (2) we propose a look-ahead guiding loss to supervise the iterative refinement process for better text generation; (3) we extract content plans from reference text and propose a planning-then-writing pipeline to give the model content planning ability; and (4) we conducted experiments on three data-to-text generation datasets and both automatic evaluation criteria (BLEU, NIST, METEOR, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>ROUGE</mi><mi>L</mi></msub></semantics></math></inline-formula>, CIDEr, TER, MoverScore, BLEURT, and BERTScore) and human evaluation criteria (Quality and Naturalness) show the effectiveness of our model. Our model can improve the competitive prefix tuning method by 2.19% in terms of a widely-used automatic evaluation criterion BLEU (BiLingual Evaluation Understudy) on WebNLG dataset with GPT-2 Large as the pretrained language model backbone. Human evaluation criteria also show that our model can improve the quality and naturalness of the generated text across all three datasets.https://www.mdpi.com/2079-9292/12/9/2136diffusiondata-to-text generationnatural language processingartificial intelligence
spellingShingle	Heng Gong Xiaocheng Feng Bing Qin DiffuD2T: Empowering Data-to-Text Generation with Diffusion Electronics diffusion data-to-text generation natural language processing artificial intelligence
title	DiffuD2T: Empowering Data-to-Text Generation with Diffusion
title_full	DiffuD2T: Empowering Data-to-Text Generation with Diffusion
title_fullStr	DiffuD2T: Empowering Data-to-Text Generation with Diffusion
title_full_unstemmed	DiffuD2T: Empowering Data-to-Text Generation with Diffusion
title_short	DiffuD2T: Empowering Data-to-Text Generation with Diffusion
title_sort	diffud2t empowering data to text generation with diffusion
topic	diffusion data-to-text generation natural language processing artificial intelligence
url	https://www.mdpi.com/2079-9292/12/9/2136
work_keys_str_mv	AT henggong diffud2tempoweringdatatotextgenerationwithdiffusion AT xiaochengfeng diffud2tempoweringdatatotextgenerationwithdiffusion AT bingqin diffud2tempoweringdatatotextgenerationwithdiffusion

DiffuD2T: Empowering Data-to-Text Generation with Diffusion

Similar Items