Mallet: SQL Dialect Translation with LLM Rule Generation
aiDM ’24, June 14, 2024, Santiago, AA, Chile
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
ACM
2024
|
Online Access: | https://hdl.handle.net/1721.1/155537 |
_version_ | 1811094156398821376 |
---|---|
author | Ngom, Amadou Latyr Kraska, Tim |
author2 | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory |
author_facet | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Ngom, Amadou Latyr Kraska, Tim |
author_sort | Ngom, Amadou Latyr |
collection | MIT |
description | aiDM ’24, June 14, 2024, Santiago, AA, Chile |
first_indexed | 2024-09-23T15:55:48Z |
format | Article |
id | mit-1721.1/155537 |
institution | Massachusetts Institute of Technology |
language | English |
last_indexed | 2024-09-24T16:06:21Z |
publishDate | 2024 |
publisher | ACM |
record_format | dspace |
spelling | mit-1721.1/1555372024-09-23T04:10:56Z Mallet: SQL Dialect Translation with LLM Rule Generation Ngom, Amadou Latyr Kraska, Tim Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory aiDM ’24, June 14, 2024, Santiago, AA, Chile Translating between the SQL dialects of different systems is important for migration and federated query processing. Existing approaches rely on hand-crafted translation rules, which tend to be incomplete and hard to maintain, especially as the number of dialects to translate increases. Thus, dialect translation remains a largely unsolved problem. To address this issue, we introduce Mallet, a system that leverages Large Language Models (LLMs) to automate the generation of SQL-to-SQL translation rules, namely schema conversion, automated UDF generation, extension selection, and expression composition. Once the rules are generated, they are infinitely reusable on new workloads without putting the LLM on the critical path of query execution. Mallet enhances the accuracy of the LLMs by (1) performing retrieval augmented generation (RAG) over system documentation and human expertise, (2) subjecting the rules to empirical validation using the actual SQL systems to detect hallucinations, and (3) automatically creating accurate few-shot learning instances. Contributors, without knowing the system's code, can improve Mallet by providing natural-language expertise for RAG. 2024-07-09T16:27:07Z 2024-07-09T16:27:07Z 2024-06-09 2024-07-01T08:00:43Z Article http://purl.org/eprint/type/ConferencePaper 979-8-4007-0680-6 https://hdl.handle.net/1721.1/155537 Ngom, Amadou Latyr and Kraska, Tim. 2024. "Mallet: SQL Dialect Translation with LLM Rule Generation." PUBLISHER_CC en 10.1145/3663742.3663973 Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. The author(s) application/pdf ACM Association for Computing Machinery |
spellingShingle | Ngom, Amadou Latyr Kraska, Tim Mallet: SQL Dialect Translation with LLM Rule Generation |
title | Mallet: SQL Dialect Translation with LLM Rule Generation |
title_full | Mallet: SQL Dialect Translation with LLM Rule Generation |
title_fullStr | Mallet: SQL Dialect Translation with LLM Rule Generation |
title_full_unstemmed | Mallet: SQL Dialect Translation with LLM Rule Generation |
title_short | Mallet: SQL Dialect Translation with LLM Rule Generation |
title_sort | mallet sql dialect translation with llm rule generation |
url | https://hdl.handle.net/1721.1/155537 |
work_keys_str_mv | AT ngomamadoulatyr malletsqldialecttranslationwithllmrulegeneration AT kraskatim malletsqldialecttranslationwithllmrulegeneration |