Commonsense Knowledge Transfer for Pre-trained Language Models

被引:0
作者
Zhou, Wangchunshu [1 ]
Le Bras, Ronan [2 ]
Choi, Yejin [2 ,3 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Allen Inst AI, Seattle, WA USA
[3] Univ Washington, Paul G Allen Sch Comp Sci, Seattle, WA USA
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite serving as the foundation models for a wide range of NLP benchmarks, pre-trained language models have shown limited capabilities of acquiring implicit commonsense knowledge from self-supervision alone, compared to learning linguistic and factual knowledge that appear more explicitly in the surface patterns in text. In this work, we introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model. It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model and then refines the language model with two self-supervised objectives: commonsense mask in-filling and commonsense relation prediction, which align human language with the underlying commonsense knowledge. Empirical results show that our approach consistently improves the model's performance on downstream tasks that require commonsense reasoning. Moreover, we find that the improvement is more significant in the few-shot setting. This suggests that our approach helps language models better transfer to downstream tasks without extensive supervision by injecting commonsense knowledge into their parameters.
引用
收藏
页码:5946 / 5960
页数:15
相关论文
共 52 条
[1]  
Bender Emily M., 2020, P 58 ANN M ASS COMP, P5185, DOI DOI 10.18653/V1/2020.ACL-MAIN.463
[2]  
Bhagavatula Chandra, 2020, 8 INT C LEARN REPR I
[3]  
Bisk Y, 2020, AAAI CONF ARTIF INTE, V34, P7432
[4]  
Bosselut A, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P4762
[5]  
Conneau A, 2018, PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), P1699
[6]  
Cui W., 2021, ARXIV
[7]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8]  
Dolan W. B., 2005, 3 INT WORKSHOP PARAP
[9]  
Falcon William, 2019, GitHub, V3
[10]  
He Bin, 2020, ARXIV