ADAPTING GPT, GPT-2 AND BERT LANGUAGE MODELS FOR SPEECH RECOGNITION

被引：30

作者：

Zheng, Xianrui ^{[1
]}

Zhang, Chao ^{[1
]}

Woodland, Philip C. ^{[1
]}

机构：

[1] Univ Cambridge, Engn Dept, Trumpington St, Cambridge CB2 1PZ, England

来源：

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2021年

关键词：

Bidirectional LM; GPT; GPT-2; BERT;

D O I：

10.1109/ASRU51503.2021.9688232

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Language models (LMs) pre-trained on massive amounts of text, in particular bidirectional encoder representations from Transformers (BERT), generative pre-training (GPT), and GPT-2, have become a key technology for many natural language processing tasks. In this paper, we present results using fine-tuned GPT, GPT-2, and their combination for automatic speech recognition (ASR). Unlike unidirectional LM GPT and GPT-2, BERT is bidirectional whose direct product of the output probabilities is no longer a valid language prior probability. A conversion method is proposed to compute the correct language prior probability based on bidirectional LM outputs in a mathematically exact way. Experimental results on the widely used AMI and Switchboard ASR tasks showed that the combination of the fine-tuned GPT and GPT-2 outperformed the combination of three neural LMs with different architectures trained from scratch on the in-domain text by up to a 12% relative word error rate reduction (WERR). Furthermore, on the AMI corpus, the proposed conversion for language prior probabilities enables BERT to obtain an extra 3% relative WERR, and the combination of BERT, GPT and GPT-2 results in further improvements.

引用

页码：162 / 168

页数：7

共 49 条

[1]

[Anonymous], 2013, P ICASSP

[2]

Arisoy E., 2015, P ICASSP

[3]

Ba J. L, 2016, P NIPS

[4]

Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]

[5]

Bengio Yoshua, 2003, J MACH LEARN RES

[6]

Birch A., 2016, PROC ACL

[7]

Brown T. B., 2020, P NIPS

[8]

Carbonell J., 2019, PROC ACL

[9]

Chiu S.-H., 2021, P SLT

[10]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

← 1 2 3 4 5 →