SE-HCL: Schema Enhanced Hybrid Curriculum Learning for Multi-Turn Text-to-SQL

Existing multi-turn Text-to-SQL approaches, mainly use data in a randomized order when training the model, ignoring the rich structural information contained in the dialog and schema. In this paper, we propose to use curriculum learning (CL) to better leverage the curriculum structure of schema, que...

Full description

Bibliographic Details
Main Authors: Yiyun Zhang, Sheng'an Zhou, Gengsheng Huang
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10433573/
_version_ 1797243279704588288
author Yiyun Zhang
Sheng'an Zhou
Gengsheng Huang
author_facet Yiyun Zhang
Sheng'an Zhou
Gengsheng Huang
author_sort Yiyun Zhang
collection DOAJ
description Existing multi-turn Text-to-SQL approaches, mainly use data in a randomized order when training the model, ignoring the rich structural information contained in the dialog and schema. In this paper, we propose to use curriculum learning (CL) to better leverage the curriculum structure of schema, query, and dialog for multi-turn question-query pairs. We design a model-agnostic framework named Schema Enhanced Hybrid Curriculum Learning (SE-HCL) for multi-turn Text-to-SQL to help the models gain a full contextual semantic understanding. Concretely, We measure the difficulty of the data from both a structural and model perspective. In terms of data structure, we mainly consider the turns of the question and the complexity of the schema and SQL query. Accordingly, we designed a data course module to dynamically adjust the difficulty of the data based on the convergence of the model and the schema enhancement method we designed. In terms of the model, we propose a scoring module that will judge the difficulty of a problem based on whether the model could solve the question effectively. Finally, we will consider both aspects and design a hybrid curriculum to determine the flow of model training. Our experiments show that our proposed method improves SQL-generated performance over previous state-of-the-art models on SparC and CoSQL, especially for hard and long-turn questions.
first_indexed 2024-04-24T18:52:36Z
format Article
id doaj.art-47a6f795963b4862b5300ec1b9f11ca2
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-24T18:52:36Z
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-47a6f795963b4862b5300ec1b9f11ca22024-03-26T17:47:43ZengIEEEIEEE Access2169-35362024-01-0112399023991210.1109/ACCESS.2024.336552210433573SE-HCL: Schema Enhanced Hybrid Curriculum Learning for Multi-Turn Text-to-SQLYiyun Zhang0https://orcid.org/0009-0008-5885-1111Sheng'an Zhou1Gengsheng Huang2Institute of Electronic Information, Guangdong Vocational College, Guangzhou, ChinaInstitute of Electronic Information, Guangdong Vocational College, Guangzhou, ChinaInstitute of Electronic Information, Guangdong Vocational College, Guangzhou, ChinaExisting multi-turn Text-to-SQL approaches, mainly use data in a randomized order when training the model, ignoring the rich structural information contained in the dialog and schema. In this paper, we propose to use curriculum learning (CL) to better leverage the curriculum structure of schema, query, and dialog for multi-turn question-query pairs. We design a model-agnostic framework named Schema Enhanced Hybrid Curriculum Learning (SE-HCL) for multi-turn Text-to-SQL to help the models gain a full contextual semantic understanding. Concretely, We measure the difficulty of the data from both a structural and model perspective. In terms of data structure, we mainly consider the turns of the question and the complexity of the schema and SQL query. Accordingly, we designed a data course module to dynamically adjust the difficulty of the data based on the convergence of the model and the schema enhancement method we designed. In terms of the model, we propose a scoring module that will judge the difficulty of a problem based on whether the model could solve the question effectively. Finally, we will consider both aspects and design a hybrid curriculum to determine the flow of model training. Our experiments show that our proposed method improves SQL-generated performance over previous state-of-the-art models on SparC and CoSQL, especially for hard and long-turn questions.https://ieeexplore.ieee.org/document/10433573/Natural language processingsemantic parsingmulti-turn text-to-SQLcurriculum learning
spellingShingle Yiyun Zhang
Sheng'an Zhou
Gengsheng Huang
SE-HCL: Schema Enhanced Hybrid Curriculum Learning for Multi-Turn Text-to-SQL
IEEE Access
Natural language processing
semantic parsing
multi-turn text-to-SQL
curriculum learning
title SE-HCL: Schema Enhanced Hybrid Curriculum Learning for Multi-Turn Text-to-SQL
title_full SE-HCL: Schema Enhanced Hybrid Curriculum Learning for Multi-Turn Text-to-SQL
title_fullStr SE-HCL: Schema Enhanced Hybrid Curriculum Learning for Multi-Turn Text-to-SQL
title_full_unstemmed SE-HCL: Schema Enhanced Hybrid Curriculum Learning for Multi-Turn Text-to-SQL
title_short SE-HCL: Schema Enhanced Hybrid Curriculum Learning for Multi-Turn Text-to-SQL
title_sort se hcl schema enhanced hybrid curriculum learning for multi turn text to sql
topic Natural language processing
semantic parsing
multi-turn text-to-SQL
curriculum learning
url https://ieeexplore.ieee.org/document/10433573/
work_keys_str_mv AT yiyunzhang sehclschemaenhancedhybridcurriculumlearningformultiturntexttosql
AT shenganzhou sehclschemaenhancedhybridcurriculumlearningformultiturntexttosql
AT gengshenghuang sehclschemaenhancedhybridcurriculumlearningformultiturntexttosql