Demystifying Data Management for Large Language Models

被引:0
|
作者
Miao, Xupeng [1 ]
Jia, Zhihao [1 ]
Cui, Bin [2 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Peking Univ, Beijing, Peoples R China
来源
COMPANION OF THE 2024 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SIGMOD-COMPANION 2024 | 2024年
关键词
Large Language Model; Pre-training; Fine-tuning; Inference; Data Management; Database; Distributed Computing; Knowledge Data; SYSTEM;
D O I
10.1145/3626246.3654683
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Navigating the intricacies of data management in the era of Large Language Models (LLMs) presents both challenges and opportunities for database and data management communities. In this tutorial, we offer a comprehensive exploration into the vital role of data management across the development and deployment phases of advanced LLMs. We provide an in-depth survey of existing techniques of managing knowledge and parameter data during the whole LLM lifecycle, emphasizing the balance between efficiency and effectiveness. This tutorial stands to offer participants valuable insights into the best practices and contemporary challenges in data management for LLMs, equipping them with the knowledge to navigate and contribute to this rapidly evolving field.
引用
收藏
页码:547 / 555
页数:9
相关论文
共 50 条
  • [1] Exploring the Potential of Large Language Models in Supply Chain Management: A Study Using Big Data
    Srivastava, Santosh Kumar
    Routray, Susmi
    Bag, Surajit
    Gupta, Shivam
    Zhang, Justin Zuopeng
    JOURNAL OF GLOBAL INFORMATION MANAGEMENT, 2024, 32 (01) : 1 - 29
  • [2] Incorporating Citizen-Generated Data into Large Language Models
    Vadapalli, Jagadeesh
    Gupta, Srishti
    Karki, Bishwa
    Tsai, Chun-Hua
    PROCEEDINGS OF THE 25TH ANNUAL INTERNATIONAL CONFERENCE ON DIGITAL GOVERNMENT RESEARCH, DGO 2024, 2024, : 1023 - 1025
  • [3] A Method for Efficient Structured Data Generation with Large Language Models
    Hou, Zongzhi
    Zhao, Ruohan
    Li, Zhongyang
    Wang, Zheng
    Wu, Yizhen
    Gou, Junwei
    Zhu, Zhifeng
    PROCEEDINGS OF THE 2ND WORKSHOP ON LARGE GENERATIVE MODELS MEET MULTIMODAL APPLICATIONS, LGM(CUBE)A 2024, 2024, : 36 - 44
  • [4] Enhancing Network Management Using Code Generated by Large Language Models
    Mani, Sathiya Kumaran
    Zhou, Yajie
    Hsieh, Kevin
    Segarra, Santiago
    Eberl, Trevor
    Azulai, Eliran
    Frizler, Ido
    Chandra, Ranveer
    Kandula, Srikanth
    PROCEEDINGS OF THE 22ND ACM WORKSHOP ON HOT TOPICS IN NETWORKS, HOTNETS 2023, 2023, : 196 - 204
  • [5] Comparison of Large Language Models in Diagnosis and Management of Challenging Clinical Cases
    Shanmugam, Sujeeth Krishna
    Browning, David J.
    CLINICAL OPHTHALMOLOGY, 2024, 18 : 3239 - 3247
  • [6] Data augmentation based on large language models for radiological report classification
    Collado-Montanez, Jaime
    Martin-Valdivia, Maria-Teresa
    Martinez-Camara, Eugenio
    KNOWLEDGE-BASED SYSTEMS, 2025, 308
  • [7] MediGPT: Exploring Potentials of Conventional and Large Language Models on Medical Data
    Rony, Mohammad Abu Tareq
    Islam, Mohammad Shariful
    Sultan, Tipu
    Alshathri, Samah
    El-Shafai, Walid
    IEEE ACCESS, 2024, 12 : 103473 - 103487
  • [8] From text to insight: large language models for chemical data extraction
    Schilling-Wilhelmi, Mara
    Rios-Garcia, Martino
    Shabih, Sherjeel
    Gil, Maria Victoria
    Miret, Santiago
    Koch, Christoph T.
    Marquez, Jose A.
    Jablonka, Kevin Maik
    CHEMICAL SOCIETY REVIEWS, 2025, 54 (03) : 1125 - 1150
  • [9] Understanding Sarcoidosis Using Large Language Models and Social Media Data
    Xi, Nan Miles
    Ji, Hong-Long
    Wang, Lin
    JOURNAL OF HEALTHCARE INFORMATICS RESEARCH, 2024,
  • [10] Detecting Data Races in OpenMP with Deep Learning and Large Language Models
    Alsofyani, May
    Wang, Liqiang
    53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024, 2024, : 96 - 103