Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

被引:0
作者
Kang, Minki [1 ,2 ,5 ]
Lee, Seanie [2 ]
Baek, Jinheon [2 ]
Kawaguchi, Kenji [3 ]
Hwang, Sung Ju [2 ,4 ]
机构
[1] KRAFTON, Seongnam, South Korea
[2] Korea Adv Inst Sci & Technol, Daejeon, South Korea
[3] Natl Univ Singapore, Singapore, Singapore
[4] DeepAuto Ai, Seoul, South Korea
[5] AITRICS, Seoul, South Korea
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to their high computational requirements and concerns on data privacy. Previous studies have focused on building task-specific small Language Models (LMs) by fine-tuning them with labeled data or distilling LLMs. However, these approaches are ill-suited for knowledge-intensive reasoning tasks due to the limited capacity of small LMs in memorizing the knowledge required. Motivated by our theoretical analysis on memorization, we propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales obtained from LLMs with augmented knowledge retrieved from an external knowledge base. Moreover, we further propose a neural reranker to obtain documents relevant to rationale generation. We empirically show that KARD significantly improves the performance of small T5 and GPT models on the challenging knowledge-intensive reasoning datasets, namely MedQA-USMLE, StrategyQA, and OpenbookQA. Notably, our method makes the 250M T5 models achieve superior performance against the fine-tuned 3B models, having 12 times larger parameters, on both MedQA-USMLE and StrategyQA benchmarks.
引用
收藏
页数:30
相关论文
共 59 条
  • [1] BehnamGhader Parishad, 2022, ARXIV, DOI [10.48550/ arXiv.2212.09146, DOI 10.48550/ARXIV.2212.09146]
  • [2] Bolton E, BioMedLM
  • [3] BORGEAUD S, 2022, PMLR, P2206
  • [4] When Is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning?
    Brown, Gavin
    Bun, Mark
    Feldman, Vitaly
    Smith, Adam
    Talwar, Kunal
    [J]. STOC '21: PROCEEDINGS OF THE 53RD ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2021, : 123 - 132
  • [5] Brown TB, 2020, ADV NEUR IN, V33
  • [6] Carlini N., 2023, 11 INT C LEARN REPR
  • [7] Reading Wikipedia to Answer Open-Domain Questions
    Chen, Danqi
    Fisch, Adam
    Weston, Jason
    Bordes, Antoine
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1870 - 1879
  • [8] Chowdhery A., 2022, ARXIV, DOI DOI 10.48550/ARXIV.2204.02311
  • [9] Chung Hyung Won, 2022, Journal of Machine Learning Research, DOI [DOI 10.48550/ARXIV.2210.11416, 10.48550/ARXIV.2210.11416]
  • [10] Cobbe K, 2021, ARXIV