Quality Control for Distantly-Supervised Data-to-Text Generation via Meta Learning

Data-to-text generation plays an important role in natural language processing by processing structured data and helping people understand those data by generating user-friendly descriptive text. It can be applied to news generation, financial report generation, customer service, etc. However, in pr...

Full description

Bibliographic Details
Main Authors:	Heng Gong, Xiaocheng Feng, Bing Qin
Format:	Article
Language:	English
Published:	MDPI AG 2023-04-01
Series:	Applied Sciences
Subjects:	data-to-text generation Natural Language Generation natural language processing deep learning meta learning Artificial Intelligence
Online Access:	https://www.mdpi.com/2076-3417/13/9/5573

_version_	1797602966405906432
author	Heng Gong Xiaocheng Feng Bing Qin
author_facet	Heng Gong Xiaocheng Feng Bing Qin
author_sort	Heng Gong
collection	DOAJ
description	Data-to-text generation plays an important role in natural language processing by processing structured data and helping people understand those data by generating user-friendly descriptive text. It can be applied to news generation, financial report generation, customer service, etc. However, in practice, it needs to adapt to different domains that may lack an annotated training corpus. To alleviate this dataset scarcity problem, distantly-supervised data-to-text generation has emerged, which constructs a training corpus automatically and is more practical to apply to new domains when well-aligned data is expensive to obtain. However, this distant supervision method of training induces an over-generation problem since the automatically aligned text includes hallucination. These expressions cannot be inferred from the data, misguiding the model to produce unfaithful text. To exploit the noisy dataset while maintaining faithfulness, we empower the neural data-to-text model by dynamically increasing the weights of those well-aligned training instances and reducing the weights of the low-quality ones via meta learning. To our best knowledge, we are the first to alleviate the noise in distantly-supervised data-to-text generation via meta learning. In addition, we rewrite those low-quality texts to provide better training instances. Finally, we construct a new distantly-supervised dataset, DIST-ToTTo (abbreviation for Distantly-supervised Table-To-Text), and conduct experiments on both the benchmark WITA (abbreviation for the data source Wikipedia and Wikidata) and DIST-ToTTo datasets. The evaluation results show that our model can improve the state-of-the-art DSG (abbreviation for Distant Supervision Generation) model across all automatic evaluation metrics, with an improvement of 3.72% on the WITA dataset and 3.82% on the DIST-ToTTo dataset in terms of the widely used metric BLEU (abbreviation for BiLingual Evaluation Understudy). Furthermore, based on human evaluation, our model can generate more grammatically correct and more faithful text compared to the state-of-the-art DSG model.
first_indexed	2024-03-11T04:24:08Z
format	Article
id	doaj.art-5309fe3f58b4447a91fc30058277cdd2
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-11T04:24:08Z
publishDate	2023-04-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-5309fe3f58b4447a91fc30058277cdd22023-11-17T22:35:52ZengMDPI AGApplied Sciences2076-34172023-04-01139557310.3390/app13095573Quality Control for Distantly-Supervised Data-to-Text Generation via Meta LearningHeng Gong0Xiaocheng Feng1Bing Qin2Harbin Institute of Technology, Harbin 150001, ChinaHarbin Institute of Technology, Harbin 150001, ChinaHarbin Institute of Technology, Harbin 150001, ChinaData-to-text generation plays an important role in natural language processing by processing structured data and helping people understand those data by generating user-friendly descriptive text. It can be applied to news generation, financial report generation, customer service, etc. However, in practice, it needs to adapt to different domains that may lack an annotated training corpus. To alleviate this dataset scarcity problem, distantly-supervised data-to-text generation has emerged, which constructs a training corpus automatically and is more practical to apply to new domains when well-aligned data is expensive to obtain. However, this distant supervision method of training induces an over-generation problem since the automatically aligned text includes hallucination. These expressions cannot be inferred from the data, misguiding the model to produce unfaithful text. To exploit the noisy dataset while maintaining faithfulness, we empower the neural data-to-text model by dynamically increasing the weights of those well-aligned training instances and reducing the weights of the low-quality ones via meta learning. To our best knowledge, we are the first to alleviate the noise in distantly-supervised data-to-text generation via meta learning. In addition, we rewrite those low-quality texts to provide better training instances. Finally, we construct a new distantly-supervised dataset, DIST-ToTTo (abbreviation for Distantly-supervised Table-To-Text), and conduct experiments on both the benchmark WITA (abbreviation for the data source Wikipedia and Wikidata) and DIST-ToTTo datasets. The evaluation results show that our model can improve the state-of-the-art DSG (abbreviation for Distant Supervision Generation) model across all automatic evaluation metrics, with an improvement of 3.72% on the WITA dataset and 3.82% on the DIST-ToTTo dataset in terms of the widely used metric BLEU (abbreviation for BiLingual Evaluation Understudy). Furthermore, based on human evaluation, our model can generate more grammatically correct and more faithful text compared to the state-of-the-art DSG model.https://www.mdpi.com/2076-3417/13/9/5573data-to-text generationNatural Language Generationnatural language processingdeep learningmeta learningArtificial Intelligence
spellingShingle	Heng Gong Xiaocheng Feng Bing Qin Quality Control for Distantly-Supervised Data-to-Text Generation via Meta Learning Applied Sciences data-to-text generation Natural Language Generation natural language processing deep learning meta learning Artificial Intelligence
title	Quality Control for Distantly-Supervised Data-to-Text Generation via Meta Learning
title_full	Quality Control for Distantly-Supervised Data-to-Text Generation via Meta Learning
title_fullStr	Quality Control for Distantly-Supervised Data-to-Text Generation via Meta Learning
title_full_unstemmed	Quality Control for Distantly-Supervised Data-to-Text Generation via Meta Learning
title_short	Quality Control for Distantly-Supervised Data-to-Text Generation via Meta Learning
title_sort	quality control for distantly supervised data to text generation via meta learning
topic	data-to-text generation Natural Language Generation natural language processing deep learning meta learning Artificial Intelligence
url	https://www.mdpi.com/2076-3417/13/9/5573
work_keys_str_mv	AT henggong qualitycontrolfordistantlysuperviseddatatotextgenerationviametalearning AT xiaochengfeng qualitycontrolfordistantlysuperviseddatatotextgenerationviametalearning AT bingqin qualitycontrolfordistantlysuperviseddatatotextgenerationviametalearning

Quality Control for Distantly-Supervised Data-to-Text Generation via Meta Learning

Similar Items