Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

被引：0

作者：

Chen, Zixiang ^{[1
]}

Deng, Yihe ^{[1
]}

Yuan, Huizhuo ^{[1
]}

Ji, Kaixuan ^{[1
]}

Gu, Quanquan ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING | 2024年 / 235卷

关键词：

GAME;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the prospect of growing a strong LLM out of a weak one without the need for acquiring additional human-annotated data. We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN), which starts from a supervised fine-tuned model. At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself. More specifically, the LLM generates its own training data from its previous iterations, refining its policy by discerning these self-generated responses from those obtained from human-annotated data. Our method progressively elevates the LLM from a nascent model to a formidable one, unlocking the full potential of human-annotated demonstration data for SFT. Theoretically, we prove that the global optimum to the training objective function of our method is achieved only when the LLM policy aligns with the target data distribution. Empirically, we evaluate our method on several benchmark datasets including the HuggingFace Open LLM Leaderboard, MT-Bench, and datasets from Big-Bench. Our results show that SPIN can significantly improve the LLM's performance across a variety of benchmarks and even outperform models trained through direct preference optimization (DPO) supplemented with extra GPT-4 preference data. This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents. Codes are available at https://github.com/uclaml/SPIN.

引用

页数：22

共 50 条

[21] Distributed Inference and Fine-tuning of Large Language Models Over The Internet
Borzunov, Alexander
Ryabinin, Max
Chumachenko, Artem
Baranchuk, Dmitry
Dettmers, Tim
Belkada, Younes
Samygin, Pavel
Raffel, Colin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[22] Fine-Tuning Large Enterprise Language Models via Ontological Reasoning
Baldazzi, Teodoro
Bellomarini, Luigi
Ceri, Stefano
Colombo, Andrea
Gentili, Andrea
Sallinger, Emanuel
RULES AND REASONING, RULEML+RR 2023, 2023, 14244 : 86 - 94
[23] Fine-Tuning Pre-Trained Language Models with Gaze Supervision
Deng, Shuwen
Prasse, Paul
Reich, David R.
Scheffer, Tobias
Jager, Lena A.
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 217 - 224
[24] Repeatability of Fine-Tuning Large Language Models Illustrated Using QLoRA
Alahmari, Saeed S.
Hall, Lawrence O.
Mouton, Peter R.
Goldgof, Dmitry B.
IEEE ACCESS, 2024, 12 : 153221 - 153231
[25] Fine-tuning large language models for rare disease concept normalization
Wang, Andy
Liu, Cong
Yang, Jingye
Weng, Chunhua
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 2076 - 2083
[26] Robust Fine-Tuning of Vision-Language Models for Domain Generalization
Vogt-Lowell, Kevin
Lee, Noah
Tsiligkaridis, Theodoros
Vaillant, Marc
2023 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE, HPEC, 2023,
[27] Fine Tuning of large language Models for Arabic Language
Tamer, Ahmed
Hassan, Al-Amir
Ali, Asmaa
Salah, Nada
Medhat, Walaa
2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
[28] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
Zong, Yongshuo
Bohdal, Ondrej
Yu, Tingyang
Yang, Yongxin
Hospedales, Timothy
arXiv, 1600,
[29] Fine-tuning Language Models for Joint Rewriting and Completion of Code with Potential Bugs
Wang, Dingmin
Zhao, Jinman
Pei, Hengzhi
Tana, Samson
Zha, Sheng
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 15854 - 15868
[30] Parameter-efficient fine-tuning in large language models: a survey of methodologies
Luping Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
Artificial Intelligence Review, 58 (8)

← 1 2 3 4 5 →