FI-NL2PY2SQL: Financial Industry NL2SQL Innovation Model Based on Python']Python and Large Language Model

被引:0
作者
Du, Xiaozheng [1 ]
Hu, Shijing [1 ]
Zhou, Feng [2 ]
Wang, Cheng [3 ]
Nguyen, Binh Minh [4 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200438, Peoples R China
[2] Shanghai Normal Univ, Tianhua Coll, Sch Artificial Intelligence, 1661 Shengxin North Rd, Shanghai 201815, Peoples R China
[3] GienTech Technol Co Ltd, Business Anal BU, Shanghai 200232, Peoples R China
[4] Hanoi Univ Sci & Technol, Sch Informat & Commun Technol, 1 Dai Co Viet, Hai Ba Trung 100000, Hanoi, Vietnam
基金
中国国家自然科学基金;
关键词
LLM; NL2SQL; pre-training; prompt; !text type='Python']Python[!/text;
D O I
10.3390/fi17010012
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid development of prominent models, NL2SQL has made many breakthroughs, but customers still hope that the accuracy of NL2SQL can be continuously improved through optimization. The method based on large models has brought revolutionary changes to NL2SQL. This paper innovatively proposes a new NL2SQL method based on a large language model (LLM), which could be adapted to an edge-cloud computing platform. First, natural language is converted into Python language, and then SQL is generated through Python. At the same time, considering the traceability characteristics of financial industry regulatory requirements, this paper uses the open-source big model DeepSeek. After testing on the BIRD dataset, compared with most NL2SQL models based on large language models, EX is at least 2.73% higher than the original method, F1 is at least 3.72 higher than the original method, and VES is 6.34% higher than the original method. Through this innovative algorithm, the accuracy of NL2SQL in the financial industry is greatly improved, which can provide business personnel with a robust database access mode.
引用
收藏
页数:24
相关论文
共 33 条
  • [1] Adewusi A.O., 2024, Comput. Sci. IT Res. J, V5, P415, DOI [10.51594/csitrj.v5i2.791, DOI 10.51594/CSITRJ.V5I2.791]
  • [2] Caferoglu HA, 2024, Arxiv, DOI [arXiv:2409.16751, 10.48550/ARXIV.2409.16751, DOI 10.48550/ARXIV.2409.16751]
  • [3] Fu Y, 2024, Arxiv, DOI arXiv:2402.10171
  • [4] Gao DW, 2023, Arxiv, DOI [arXiv:2308.15363, 10.48550/arXiv.2308.15363, DOI 10.48550/ARXIV.2308.15363]
  • [5] Transformers in source code generation: A comprehensive survey
    Ghaemi, Hadi
    Alizadehsani, Zakieh
    Shahraki, Amin
    Corchado, Juan M.
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 153
  • [6] Gu Zihui, 2023, Proceedings of the ACM on Management of Data, V1, DOI 10.1145/3589292
  • [7] Hao SB, 2023, Arxiv, DOI [arXiv:2305.14992, 10.18653/v1/2023.emnlp-main.507]
  • [8] Hong ZJ, 2024, Arxiv, DOI arXiv:2406.08426
  • [9] Jiang JY, 2024, Arxiv, DOI arXiv:2406.00515
  • [10] A survey on deep learning approaches for text-to-SQL
    Katsogiannis-Meimarakis, George
    Koutrika, Georgia
    [J]. VLDB JOURNAL, 2023, 32 (04) : 905 - 936