Staged Multi-Strategy Framework With Open-Source Large Language Models for Natural Language to SQL Generation

被引：0

作者：

Liu, Chuanlong ^{[1
]}

Liao, Wei ^{[1
]}

Xu, Zhen ^{[2
]}

机构：

[1] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai 201620, Peoples R China

[2] Shanghai Univ Engn Sci, Sch Mech & Automot Engn, Shanghai 201620, Peoples R China

来源：

IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING | 2025年

关键词：

open-source large language models; pre-trained language models; natural language to sql; prompt learning;

D O I：

10.1002/tee.24268

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In the field of natural language to SQL (NL2SQL), significant progress has been made with large pre-trained language models. However, these models still have deficiencies in terms of their ability to generalize, particularly in open-source Large Language Models (LLMs). Additionally, most research efforts tend to overlook the impact of key column information and data table content on the accuracy of queries during the SQL statement generation process. In this paper, we propose a staged, multi-strategy framework called Key Columns and Table Contents (KCTC). The framework is divided into two stages. Firstly, it uses fixed prompt content to extract SQL key column information from natural language questions, including selected columns and conditioned columns. It also formats the output of column information. Secondly, it combines variable prompt content to guide the model in generating SQL statements. It uses the content of the data table for constraints to reduce the impact of errors in condition values on SQL statements. We conducted experiments on the Chinese dataset TableQA using several open-source LLMs. The results demonstrate that our method significantly improved the execution accuracy of SQL statements, with an average increase of 60.29% and reaching up to 91.22% accuracy. This result validates the effectiveness of our approach. (c) 2025 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.

引用

页数：10

共 50 条

[41] Open-Source Large Language Models in Anesthesia Perioperative Medicine: ASA-Physical Status Evaluation
Rouholiman, Dara
Goodell, Alex J.
Fung, Ethan
Chandrasoma, Janak T.
Chu, Larry F.
ANESTHESIA AND ANALGESIA, 2024, 139 (06): : 2779 - 2781
[42] Enhancing Commit Message Categorization in Open-Source Repositories Using Structured Taxonomy and Large Language Models
Al-razgan, Muna
Alaqil, Manal
Almuwayshir, Ruba
Alhijji, Zamzam
ADVANCES IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING, 2024, 4 (04): : 2950 - 2968
[43] Text2VQL: Teaching a Model Query Language to Open-Source Language Models with ChatGPT
Lopez, Jose Antonio Hernandez
Foldiak, Mate
Varro, Daniel
27TH INTERNATIONAL ACM/IEEE CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS, MODELS, 2024, : 13 - 24
[44] USING OPEN-SOURCE NATURAL LANGUAGE PROCESSING TO CLASSIFY TRAUMATIC CRANIAL HEMORRHAGES
Lopez, Alexander
Crawford, Malcolm
Tran, Diem Kieu
Chen, Jefferson
JOURNAL OF NEUROTRAUMA, 2021, 38 (14) : A83 - A83
[45] Open-source Natural Language Processing on the PAL Robotics ARI Social Robot
Lemaignan, Severin
Cooper, Sara
Ros, Raquel
Ferrini, Lorenzo
Andriella, Antonio
Irisarri, Aina
COMPANION OF THE ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI 2023, 2023, : 907 - 908
[46] Framework for evaluating code generation ability of large language models
Yeo, Sangyeop
Ma, Yu-Seung
Kim, Sang Cheol
Jun, Hyungkook
Kim, Taeho
ETRI JOURNAL, 2024, 46 (01) : 106 - 117
[47] PMC-LLaMA: toward building open-source language models for medicine
Wu, Chaoyi
Lin, Weixiong
Zhang, Xiaoman
Zhang, Ya
Xie, Weidi
Wang, Yanfeng
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 1833 - 1843
[48] Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library
Tarride, Solene
Schneider, Yoann
Generali-Lince, Marie
Boillet, Melodie
Abadie, Bastien
Kermorvant, Christopher
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT V, 2024, 14808 : 387 - 404
[49] Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data
Chen, Yuhao
Wang, Zhimu
Zulkernine, Farhana
2024 IEEE INTERNATIONAL CONFERENCE ON DIGITAL HEALTH, ICDH 2024, 2024, : 126 - 128
[50] Leveraging Open-Source Large Language Models for Data Augmentation in Hospital Staff Surveys: Mixed Methods Study
Ehrett, Carl
Hegde, Sudeep
Andre, Kwame
Liu, Dixizi
Wilson, Timothy
JMIR MEDICAL EDUCATION, 2024, 10

← 1 2 3 4 5 →