How to Fine-Tune BERT for Text Classification?

被引:805
作者
Sun, Chi [1 ]
Qiu, Xipeng [1 ]
Xu, Yige [1 ]
Huang, Xuanjing [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, 825 Zhangheng Rd, Shanghai, Peoples R China
来源
CHINESE COMPUTATIONAL LINGUISTICS, CCL 2019 | 2019年 / 11856卷
关键词
Transfer learning; BERT; Text classification;
D O I
10.1007/978-3-030-32381-3_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets.
引用
收藏
页码:194 / 206
页数:13
相关论文
共 29 条
[1]  
[Anonymous], 2017, 5 INT C LEARNING REP
[2]  
[Anonymous], 2017, ARXIV170407156
[3]  
[Anonymous], 2016, P C ASS MACH TRANSL
[4]  
[Anonymous], 2018, INT C LEARN REPR
[5]  
Caruana R, 1993, P 10 INT C MACHINE L
[6]  
Chen Z., 2017, GradNorm: gradient normalization for adaptive loss balancing in deep multitask networks
[7]  
Collobert R., 2008, P 25 INT C MACHINE L, P160, DOI [10.1145/1390156.1390177, DOI 10.1145/1390156.1390177]
[8]  
Conneau A., 2017, ARXIV170502364
[9]  
Dai AM, 2015, ADV NEUR IN, V28
[10]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171