INNOVATIVE BERT-BASED RERANKING LANGUAGE MODELS FOR SPEECH RECOGNITION

被引:15
作者
Chiu, Shih-Hsuan [1 ]
Chen, Berlin [1 ]
机构
[1] Natl Taiwan Normal Univ, Taipei, Taiwan
来源
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT) | 2021年
关键词
automatic speech recognition; language models; BERT; N-best hypotheses reranking;
D O I
10.1109/SLT48900.2021.9383557
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
More recently, Bidirectional Encoder Representations from Transformers (BERT) was proposed and has achieved impressive success on many natural language processing (NLP) tasks such as question answering and language understanding, due mainly to its effective pre-training then fine-tuning paradigm as well as strong local contextual modeling ability. In view of the above, this paper presents a novel instantiation of the BERT-based contextualized language models (LMs) for use in reranking of N-best hypotheses produced by automatic speech recognition (ASR). To this end, we frame N-best hypothesis reranking with BERT as a prediction problem, which aims to predict the oracle hypothesis that has the lowest word error rate (WER) given the N-best hypotheses (denoted by PBERT). In particular, we also explore to capitalize on task-specific global topic information in an unsupervised manner to assist PBERT in N-best hypothesis reranking (denoted by TPBERT). Extensive experiments conducted on the AMI benchmark corpus demonstrate the effectiveness and feasibility of our methods in comparison to the conventional autoregressive models like the recurrent neural network (RNN) and a recently proposed method that employed BERT to compute pseudo-log-likelihood (PLL) scores for N-best hypothesis reranking.
引用
收藏
页码:266 / 271
页数:6
相关论文
共 37 条
[1]  
[Anonymous], 2011, IEEE WORKSH AUT SPEE
[2]  
[Anonymous], PROCEEDINGS
[3]  
Arisoy E, 2015, INT CONF ACOUST SPEE, P5421, DOI 10.1109/ICASSP.2015.7179007
[4]  
Carletta J, 2005, LECT NOTES COMPUT SC, V3869, P28
[5]   An empirical study of smoothing techniques for language modeling [J].
Chen, SF ;
Goodman, J .
COMPUTER SPEECH AND LANGUAGE, 1999, 13 (04) :359-394
[6]   Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition [J].
Chen, X. ;
Ragni, A. ;
Liu, X. ;
Gales, M. J. F. .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :269-273
[7]  
Chen X, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3511
[8]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9]  
Fan-Jiang S. W., 2020, P IEEE INT C ACOUSTI
[10]   A bit of progress in language modeling [J].
Goodman, JT .
COMPUTER SPEECH AND LANGUAGE, 2001, 15 (04) :403-434