TITANIC: Towards Production Federated Learning with Large Language Models

被引：0

作者：

Su, Ningxin ^{[1
]}

Hu, Chenghao ^{[1
]}

Li, Baochun ^{[1
]}

Li, Bo ^{[2
]}

机构：

[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON, Canada

[2] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

来源：

IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS | 2024年

关键词：

D O I：

10.1109/INFOCOM52122.2024.10621164

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the recent surge of research interests in Large Language Models (LLMs), a natural question that arises is how pre-trained LLMs can be fine-tuned to tailor to specific needs of enterprises and individual users, while preserving the privacy of data used in the fine-tuning process. On the one hand, sending private data to cloud datacenters for fine-tuning is, without a doubt, unacceptable from a privacy perspective. On the other hand, conventional federated learning requires each client to perform local training, which is not feasible for LLMs with respect to both computation costs and communication overhead, since they involve billions of model parameters. In this paper, we present TITANIC, a new distributed training paradigm that allows LLMs to be fine-tuned in a privacy-preserving fashion directly on the client devices where private data is produced, while operating within the resource constraints on computation and communication bandwidth. TITANIC first optimally selects a subset of clients with an efficient solution to an integer optimization problem, then partitions an LLM across multiple client devices, and finally fine-tunes the model with no or minimal losses in training performance. A primary focus in the design of TITANIC is its feasibility in real-world systems: it is first and foremost designed for production-quality systems, featuring a model-agnostic partitioning mechanism that is fully automated. Our experimental results show that TITANIC achieves superior training performance as compared to conventional federated learning, while preserving data privacy and satisfying all constraints on local computation and bandwidth resources.

引用

页码：611 / 620

页数：10

共 24 条

[1] Chu YH, 2000, PERF E R SI, V28, P1, DOI 10.1145/345063.339337
[2] Cohen B., 2003, P WORKSH EC PEER TO, V6, P68
[3] Distributed learning of deep neural network over multiple agents
Gupta, Otkrist
Raskar, Ramesh
[J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2018, 116 : 1 - 8
[4] Hu E. J., 2022, P INT C LEARN REPR
[5] Huang YP, 2019, ADV NEUR IN, V32
[6] HuggingFace, 2023, Open LLM Leaderboard
[7] Jennings C., 2023, WebRTC: Real-Time Communication in Browsers
[8] Lai F, 2021, PROCEEDINGS OF THE 15TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '21), P19
[9] Lalitha A, 2019, Arxiv, DOI arXiv:1901.11173
[10] Li Z., 2022, IEEE Transactions on Big Data, P1

← 1 2 3 →