DB-GPT: Large Language Model Meets Database

被引：0

作者：

Xuanhe Zhou

Zhaoyan Sun

Guoliang Li

机构：

[1] Tsinghua University,Department of Computer Science

来源：

Data Science and Engineering | 2024年 / 9卷

关键词：

Large language model; Database;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Large language models (LLMs) have shown superior performance in various areas. And LLMs have the potential to revolutionize data management by serving as the "brain" of next-generation database systems. However, there are several challenges that utilize LLMs to optimize databases. First, it is challenging to provide appropriate prompts (e.g., instructions and demonstration examples) to enable LLMs to understand the database optimization problems. Second, LLMs only capture the logical database characters (e.g., SQL semantics) but are not aware of physical characters (e.g., data distributions), and it requires to fine-tune LLMs to capture both physical and logical information. Third, LLMs are not well trained for databases with strict constraints (e.g., query plan equivalence) and privacy-preserving requirements, and it is challenging to train database-specific LLMs while ensuring database privacy. To overcome these challenges, this vision paper proposes a LLM-based database framework (DB-GPT), including automatic prompt generation, DB-specific model fine-tuning, and DB-specific model design and pre-training. Preliminary experiments show that DB-GPT achieves relatively good performance in database tasks like query rewrite and index tuning. The source code and datasets are available at github.com/TsinghuaDatabaseGroup/DB-GPT.

引用

页码：102 / 111

页数：9

共 77 条

[1] Brown Tom B(2020)Language models are few-shot learners Advances in neural information processing systems 2020 1877-1901
[2] Liu J(2022)What Makes Good In-Context Examples for GPT-3? DeeLIO 2022 100-114
[3] Shen D(2020)GPT-3: its nature, scope, limits, and consequences Minds Mach 30 681-694
[4] Zhang Y(2022)Database meets artificial intelligence: a survey IEEE Trans Knowl Data Eng 34 1096-1116
[5] Dolan B(2022)Chain-of-thought prompting elicits reasoning in large language models Advances in Neural Information Processing Systems 35 24824-24837
[6] Carin L(2022)Training language models to follow instructions with human feedback Advances in Neural Information Processing Systems 35 27730-27744
[7] Chen W(2021)A comprehensive survey of privacy-preserving federated learning: a taxonomy, review, and future directions ACM Comput Surv 54 1-81
[8] Floridi L(2020)Graph neural networks: a review of methods and applications AI open 1 57-1428
[9] Chiriatti M(2020)Query performance prediction for concurrent queries using graph embedding Proc VLDB Endow 13 1416-658
[10] Zhou X(2022)Reframing human-AI collaboration for generating free-text explanations NAACL 2022 632-1819

← 1 2 3 4 5 6 7 8 →