ADAPTING GPT, GPT-2 AND BERT LANGUAGE MODELS FOR SPEECH RECOGNITION

被引:30
作者
Zheng, Xianrui [1 ]
Zhang, Chao [1 ]
Woodland, Philip C. [1 ]
机构
[1] Univ Cambridge, Engn Dept, Trumpington St, Cambridge CB2 1PZ, England
来源
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2021年
关键词
Bidirectional LM; GPT; GPT-2; BERT;
D O I
10.1109/ASRU51503.2021.9688232
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Language models (LMs) pre-trained on massive amounts of text, in particular bidirectional encoder representations from Transformers (BERT), generative pre-training (GPT), and GPT-2, have become a key technology for many natural language processing tasks. In this paper, we present results using fine-tuned GPT, GPT-2, and their combination for automatic speech recognition (ASR). Unlike unidirectional LM GPT and GPT-2, BERT is bidirectional whose direct product of the output probabilities is no longer a valid language prior probability. A conversion method is proposed to compute the correct language prior probability based on bidirectional LM outputs in a mathematically exact way. Experimental results on the widely used AMI and Switchboard ASR tasks showed that the combination of the fine-tuned GPT and GPT-2 outperformed the combination of three neural LMs with different architectures trained from scratch on the in-domain text by up to a 12% relative word error rate reduction (WERR). Furthermore, on the AMI corpus, the proposed conversion for language prior probabilities enables BERT to obtain an extra 3% relative WERR, and the combination of BERT, GPT and GPT-2 results in further improvements.
引用
收藏
页码:162 / 168
页数:7
相关论文
共 49 条
[1]  
[Anonymous], 2013, P ICASSP
[2]  
Arisoy E., 2015, P ICASSP
[3]  
Ba J. L, 2016, P NIPS
[4]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
[5]  
Bengio Yoshua, 2003, J MACH LEARN RES
[6]  
Birch A., 2016, PROC ACL
[7]  
Brown T. B., 2020, P NIPS
[8]  
Carbonell J., 2019, PROC ACL
[9]  
Chiu S.-H., 2021, P SLT
[10]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171