Enhancing SPARQL Query Generation for Knowledge Base Question Answering Systems by Learning to Correct Triplets

Generating SPARQL queries from natural language questions is challenging in Knowledge Base Question Answering (KBQA) systems. The current state-of-the-art models heavily rely on fine-tuning pretrained models such as T5. However, these methods still encounter critical issues such as triple-flip error...

Full description

Bibliographic Details
Main Authors:	Jiexing Qi, Chang Su, Zhixin Guo, Lyuwen Wu, Zanwei Shen, Luoyi Fu, Xinbing Wang, Chenghu Zhou
Format:	Article
Language:	English
Published:	MDPI AG 2024-02-01
Series:	Applied Sciences
Subjects:	Knowledge Base Question Answering Text-to-SPARQL semantic parsing further pretraining Triplet Structure
Online Access:	https://www.mdpi.com/2076-3417/14/4/1521

_version_	1827344240088186880
author	Jiexing Qi Chang Su Zhixin Guo Lyuwen Wu Zanwei Shen Luoyi Fu Xinbing Wang Chenghu Zhou
author_facet	Jiexing Qi Chang Su Zhixin Guo Lyuwen Wu Zanwei Shen Luoyi Fu Xinbing Wang Chenghu Zhou
author_sort	Jiexing Qi
collection	DOAJ
description	Generating SPARQL queries from natural language questions is challenging in Knowledge Base Question Answering (KBQA) systems. The current state-of-the-art models heavily rely on fine-tuning pretrained models such as T5. However, these methods still encounter critical issues such as triple-flip errors (e.g., (subject, relation, object) is predicted as (object, relation, subject)). To address this limitation, we introduce <b>TSET</b> (<b>T</b>riplet <b>S</b>tructure <b>E</b>nhanced <b>T</b>5), a model with a novel pretraining stage positioned between the initial T5 pretraining and the fine-tuning for the Text-to-SPARQL task. In this intermediary stage, we introduce a new objective called Triplet Structure Correction (TSC) to train the model on a SPARQL corpus derived from Wikidata. This objective aims to deepen the model’s understanding of the order of triplets. After this specialized pretraining, the model undergoes fine-tuning for SPARQL query generation, augmenting its query-generation capabilities. We also propose a method named “semantic transformation” to fortify the model’s grasp of SPARQL syntax and semantics without compromising the pre-trained weights of T5. Experimental results demonstrate that our proposed TSET outperforms existing methods on three well-established KBQA datasets: LC-QuAD 2.0, QALD-9 plus, and QALD-10, establishing a new state-of-the-art performance (95.0% <i>F</i>1 and 93.1% QM on LC-QuAD 2.0, 75.85% <i>F</i>1 and 61.76% QM on QALD-9 plus, 51.37% <i>F</i>1 and 40.05% QM on QALD-10).
first_indexed	2024-03-07T22:43:11Z
format	Article
id	doaj.art-00155dfcffba4e7aaa57ae5120f73e40
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-07T22:43:11Z
publishDate	2024-02-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-00155dfcffba4e7aaa57ae5120f73e402024-02-23T15:06:16ZengMDPI AGApplied Sciences2076-34172024-02-01144152110.3390/app14041521Enhancing SPARQL Query Generation for Knowledge Base Question Answering Systems by Learning to Correct TripletsJiexing Qi0Chang Su1Zhixin Guo2Lyuwen Wu3Zanwei Shen4Luoyi Fu5Xinbing Wang6Chenghu Zhou7School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaSchool of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaSchool of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaSchool of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaSchool of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaSchool of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaSchool of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaSchool of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaGenerating SPARQL queries from natural language questions is challenging in Knowledge Base Question Answering (KBQA) systems. The current state-of-the-art models heavily rely on fine-tuning pretrained models such as T5. However, these methods still encounter critical issues such as triple-flip errors (e.g., (subject, relation, object) is predicted as (object, relation, subject)). To address this limitation, we introduce <b>TSET</b> (<b>T</b>riplet <b>S</b>tructure <b>E</b>nhanced <b>T</b>5), a model with a novel pretraining stage positioned between the initial T5 pretraining and the fine-tuning for the Text-to-SPARQL task. In this intermediary stage, we introduce a new objective called Triplet Structure Correction (TSC) to train the model on a SPARQL corpus derived from Wikidata. This objective aims to deepen the model’s understanding of the order of triplets. After this specialized pretraining, the model undergoes fine-tuning for SPARQL query generation, augmenting its query-generation capabilities. We also propose a method named “semantic transformation” to fortify the model’s grasp of SPARQL syntax and semantics without compromising the pre-trained weights of T5. Experimental results demonstrate that our proposed TSET outperforms existing methods on three well-established KBQA datasets: LC-QuAD 2.0, QALD-9 plus, and QALD-10, establishing a new state-of-the-art performance (95.0% <i>F</i>1 and 93.1% QM on LC-QuAD 2.0, 75.85% <i>F</i>1 and 61.76% QM on QALD-9 plus, 51.37% <i>F</i>1 and 40.05% QM on QALD-10).https://www.mdpi.com/2076-3417/14/4/1521Knowledge Base Question AnsweringText-to-SPARQLsemantic parsingfurther pretrainingTriplet Structure
spellingShingle	Jiexing Qi Chang Su Zhixin Guo Lyuwen Wu Zanwei Shen Luoyi Fu Xinbing Wang Chenghu Zhou Enhancing SPARQL Query Generation for Knowledge Base Question Answering Systems by Learning to Correct Triplets Applied Sciences Knowledge Base Question Answering Text-to-SPARQL semantic parsing further pretraining Triplet Structure
title	Enhancing SPARQL Query Generation for Knowledge Base Question Answering Systems by Learning to Correct Triplets
title_full	Enhancing SPARQL Query Generation for Knowledge Base Question Answering Systems by Learning to Correct Triplets
title_fullStr	Enhancing SPARQL Query Generation for Knowledge Base Question Answering Systems by Learning to Correct Triplets
title_full_unstemmed	Enhancing SPARQL Query Generation for Knowledge Base Question Answering Systems by Learning to Correct Triplets
title_short	Enhancing SPARQL Query Generation for Knowledge Base Question Answering Systems by Learning to Correct Triplets
title_sort	enhancing sparql query generation for knowledge base question answering systems by learning to correct triplets
topic	Knowledge Base Question Answering Text-to-SPARQL semantic parsing further pretraining Triplet Structure
url	https://www.mdpi.com/2076-3417/14/4/1521
work_keys_str_mv	AT jiexingqi enhancingsparqlquerygenerationforknowledgebasequestionansweringsystemsbylearningtocorrecttriplets AT changsu enhancingsparqlquerygenerationforknowledgebasequestionansweringsystemsbylearningtocorrecttriplets AT zhixinguo enhancingsparqlquerygenerationforknowledgebasequestionansweringsystemsbylearningtocorrecttriplets AT lyuwenwu enhancingsparqlquerygenerationforknowledgebasequestionansweringsystemsbylearningtocorrecttriplets AT zanweishen enhancingsparqlquerygenerationforknowledgebasequestionansweringsystemsbylearningtocorrecttriplets AT luoyifu enhancingsparqlquerygenerationforknowledgebasequestionansweringsystemsbylearningtocorrecttriplets AT xinbingwang enhancingsparqlquerygenerationforknowledgebasequestionansweringsystemsbylearningtocorrecttriplets AT chenghuzhou enhancingsparqlquerygenerationforknowledgebasequestionansweringsystemsbylearningtocorrecttriplets

Enhancing SPARQL Query Generation for Knowledge Base Question Answering Systems by Learning to Correct Triplets

Similar Items