Conditional BERT Contextual Augmentation

被引:193
作者
Wu, Xing [1 ,2 ]
Lv, Shangwen [1 ,2 ]
Zang, Liangjun [1 ]
Han, Jizhong [1 ,2 ]
Hu, Songlin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
来源
COMPUTATIONAL SCIENCE - ICCS 2019, PT IV | 2019年 / 11539卷
基金
中国国家自然科学基金;
关键词
D O I
10.1007/978-3-030-22747-0_7
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data augmentation methods are often applied to prevent overfitting and improve generalization of deep neural network models. Recently proposed contextual augmentation augments labeled sentences by randomly replacing words with more varied substitutions predicted by language model. Bidirectional Encoder Representations from Transformers (BERT) demonstrates that a deep bidirectional language model is more powerful than either an unidirectional language model or the shallow concatenation of a forward and backward model. We propose a novel data augmentation method for labeled sentences called conditional BERT contextual augmentation. We retrofit BERT to conditional BERT by introducing a new conditional masked language model (The term "conditional masked language model" appeared once in original BERT paper, which indicates context-conditional, is equivalent to term "masked language model". In our paper, "conditional masked language model" indicates we apply extra label-conditional constraint to the "masked language model".) task. The well trained conditional BERT can be applied to enhance contextual augmentation. Experiments on six various different text classification tasks show that our method can be easily applied to both convolutional or recurrent neural networks classifier to obtain improvement.
引用
收藏
页码:84 / 95
页数:12
相关论文
共 34 条
[11]  
Howard J, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P328
[12]  
Jaitly N., 2013, VOCAL TRACT LENGTH P, V117
[13]  
Jia R., 2017, Adversarial examples for evaluating reading comprehension systems
[14]  
Kim Y., 2014, P 2014 C EMP METH NA, P1746, DOI [10.3115/v1/D14-1181, DOI 10.3115/V1/D14-1181]
[15]  
Kingma DP, 2014, ADV NEUR IN, V27
[16]  
Ko T., 2015, AUDIO AUGMENTATION S
[17]  
Kobayashi S., 2018, NAACL, P452
[18]  
Kolomiyets O., 2011, MODEL PORTABILITY EX, P271
[19]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[20]  
Li X., 2002, LEARNING QUESTION CL, P1