Enhancing Task Performance in Continual Instruction Fine-tuning Through Format Uniformity

被引：0

作者：

Tan, Xiaoyu ^{[1
]}

Cheng, Leijun ^{[2
]}

Qiu, Xihe ^{[2
]}

Shi, Shaojie ^{[2
]}

Cheng, Yuan ^{[3
]}

Chu, Wei ^{[1
]}

Xu, Yinghui ^{[3
]}

Qi, Yuan ^{[3
]}

机构：

[1] INF Technol Shanghai Co Ltd, Shanghai, Peoples R China

[2] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai, Peoples R China

[3] Fudan Univ, AI3 Inst, Shanghai, Peoples R China

来源：

PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

Large Language Models; Continual Instruction Fine-tuning; Format Uniformity; Catastrophic Forgetting;

D O I：

10.1145/3626772.3657920

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent advancements, large language models (LLMs) have demonstrated remarkable capabilities in diverse tasks, primarily through interactive question-answering with humans. This development marks significant progress towards artificial general intelligence (AGI). Despite their superior performance, LLMs often exhibit limitations when adapted to domain-specific tasks through instruction fine-tuning (IF). The primary challenge lies in the discrepancy between the data distribution in general and domain-specific contexts, leading to suboptimal accuracy in specialized tasks. To address this, continual instruction fine-tuning (CIF), particularly supervised finetuning (SFT), on targeted domain-specific instruction datasets is necessary. Our ablation study reveals that the structure of these instruction datasets critically influences CIF performance, with substantial data distributional shifts resulting in notable performance degradation. In this paper, we introduce a novel framework that enhances CIF by promoting format uniformity. We assess our approach using the Llama2 chat model across various domain-specific instruction datasets. The results demonstrate not only an improvement in task-specific performance under CIF but also a reduction in catastrophic forgetting (CF). This study contributes to the optimization of LLMs for domain-specific applications, highlighting the significance of data structure and distribution in CIF.

引用

页码：2384 / 2389

页数：6

共 42 条

[1] Ahmad Lama, 2023, GPT-4 Technical Report
[2] Alberti G. S., 2021, ADV NEURAL INFORM PR, V34, P25205
[3] [Anonymous], 2023, FINDINGS ASS COMPUTA, DOI DOI 10.18653/V1/2023.FINDINGS-EMNLP.633
[4] Fine-Tuning Large Enterprise Language Models via Ontological Reasoning
Baldazzi, Teodoro
Bellomarini, Luigi
Ceri, Stefano
Colombo, Andrea
Gentili, Andrea
Sallinger, Emanuel
[J]. RULES AND REASONING, RULEML+RR 2023, 2023, 14244 : 86 - 94
[5] TRAINING WITH NOISE IS EQUIVALENT TO TIKHONOV REGULARIZATION
BISHOP, CM
[J]. NEURAL COMPUTATION, 1995, 7 (01) : 108 - 116
[6] Brown TB, 2020, ADV NEUR IN, V33
[7] Chen Nuo, 2023, FINDINGS ASS COMPUT, P8506
[8] Christiano PF, 2017, ADV NEUR IN, V30
[9] Conover Mike, 2023, Free dolly: Introducing the world's first truly open instruction-tuned llm
[10] Darrin Maxime, 2023, P 2023 C EMPIRICAL M, P5831, DOI 10.18653/v1/2023.emnlp- main

← 1 2 3 4 5 →