Bayesian Learning of a Language Model from Continuous Speech

被引:22
作者
Neubig, Graham [1 ]
Mimura, Masato [1 ]
Mori, Shinsuke [1 ]
Kawahara, Tatsuya [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan
关键词
language modeling; automatic speech recognition; Bayesian learning; weighted finite state transducers; SEGMENTATION;
D O I
10.1587/transinf.E95.D.614
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a novel scheme to learn a language model (LM) for automatic speech recognition (ASR) directly from continuous speech. In the proposed method, we first generate phoneme lattices using an acoustic model with no linguistic constraints, then perform training over these phoneme lattices, simultaneously learning both lexical units and an LM. As a statistical framework for this learning problem, we use non-parametric Bayesian statistics, which make it possible to balance the learned model's complexity (such as the size of the learned vocabulary) and expressive power, and provide a principled learning algorithm through the use of Gibbs sampling. Implementation is performed using weighted finite state transducers (WFSTs), which allow for the simple handling of lattice input. Experimental results on natural, adult-directed speech demonstrate that LMs built using only continuous speech are able to significantly reduce ASR phoneme error rates. The proposed technique of joint Bayesian learning of lexical units and an LM over lattices is shown to significantly contribute to this improvement.
引用
收藏
页码:614 / 625
页数:12
相关论文
共 40 条
[1]  
Abney S, 2010, ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, P88
[2]   Statistical Transformation of Language and Pronunciation Models for Spontaneous Speech Recognition [J].
Akita, Yuya ;
Kawahara, Tatsuya .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06) :1539-1549
[3]  
[Anonymous], 2008, P 25 INT C MACH LEAR
[4]  
[Anonymous], 2004, ACM Transactions on Applied Perception (TAP), DOI DOI 10.1145/1008722.1008727
[5]  
[Anonymous], 2009, P HUM LANG TECHN 200
[6]  
Asuncion A., 2008, P 22 ANN C NEUR INF, V1
[7]  
Bazzi I., 2001, Proceedings of European Conference on Speech Communication and Technology, Aalborg, P61
[8]   VARIABLE-LENGTH SEQUENCE MODELING - MULTIGRAMS [J].
BIMBOT, F ;
PIERACCINI, R ;
LEVIN, E ;
ATAL, B .
IEEE SIGNAL PROCESSING LETTERS, 1995, 2 (06) :111-113
[9]  
Bosch Louis Ten., 2007, Proc. INTERSPEECH, P1481
[10]   An efficient, probabilistically sound algorithm for segmentation and word discovery [J].
Brent, MR .
MACHINE LEARNING, 1999, 34 (1-3) :71-105