Mallet: SQL Dialect Translation with LLM Rule Generation

aiDM ’24, June 14, 2024, Santiago, AA, Chile

Bibliographic Details
Main Authors: Ngom, Amadou Latyr, Kraska, Tim
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:English
Published: ACM 2024
Online Access:https://hdl.handle.net/1721.1/155537
_version_ 1811094156398821376
author Ngom, Amadou Latyr
Kraska, Tim
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Ngom, Amadou Latyr
Kraska, Tim
author_sort Ngom, Amadou Latyr
collection MIT
description aiDM ’24, June 14, 2024, Santiago, AA, Chile
first_indexed 2024-09-23T15:55:48Z
format Article
id mit-1721.1/155537
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-24T16:06:21Z
publishDate 2024
publisher ACM
record_format dspace
spelling mit-1721.1/1555372024-09-23T04:10:56Z Mallet: SQL Dialect Translation with LLM Rule Generation Ngom, Amadou Latyr Kraska, Tim Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory aiDM ’24, June 14, 2024, Santiago, AA, Chile Translating between the SQL dialects of different systems is important for migration and federated query processing. Existing approaches rely on hand-crafted translation rules, which tend to be incomplete and hard to maintain, especially as the number of dialects to translate increases. Thus, dialect translation remains a largely unsolved problem. To address this issue, we introduce Mallet, a system that leverages Large Language Models (LLMs) to automate the generation of SQL-to-SQL translation rules, namely schema conversion, automated UDF generation, extension selection, and expression composition. Once the rules are generated, they are infinitely reusable on new workloads without putting the LLM on the critical path of query execution. Mallet enhances the accuracy of the LLMs by (1) performing retrieval augmented generation (RAG) over system documentation and human expertise, (2) subjecting the rules to empirical validation using the actual SQL systems to detect hallucinations, and (3) automatically creating accurate few-shot learning instances. Contributors, without knowing the system's code, can improve Mallet by providing natural-language expertise for RAG. 2024-07-09T16:27:07Z 2024-07-09T16:27:07Z 2024-06-09 2024-07-01T08:00:43Z Article http://purl.org/eprint/type/ConferencePaper 979-8-4007-0680-6 https://hdl.handle.net/1721.1/155537 Ngom, Amadou Latyr and Kraska, Tim. 2024. "Mallet: SQL Dialect Translation with LLM Rule Generation." PUBLISHER_CC en 10.1145/3663742.3663973 Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. The author(s) application/pdf ACM Association for Computing Machinery
spellingShingle Ngom, Amadou Latyr
Kraska, Tim
Mallet: SQL Dialect Translation with LLM Rule Generation
title Mallet: SQL Dialect Translation with LLM Rule Generation
title_full Mallet: SQL Dialect Translation with LLM Rule Generation
title_fullStr Mallet: SQL Dialect Translation with LLM Rule Generation
title_full_unstemmed Mallet: SQL Dialect Translation with LLM Rule Generation
title_short Mallet: SQL Dialect Translation with LLM Rule Generation
title_sort mallet sql dialect translation with llm rule generation
url https://hdl.handle.net/1721.1/155537
work_keys_str_mv AT ngomamadoulatyr malletsqldialecttranslationwithllmrulegeneration
AT kraskatim malletsqldialecttranslationwithllmrulegeneration