Bayesian Learning of a Language Model from Continuous Speech

被引：22

作者：

Neubig, Graham ^{[1
]}

Mimura, Masato ^{[1
]}

Mori, Shinsuke ^{[1
]}

Kawahara, Tatsuya ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2012年 / E95D卷 / 02期

关键词：

language modeling; automatic speech recognition; Bayesian learning; weighted finite state transducers; SEGMENTATION;

D O I：

10.1587/transinf.E95.D.614

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a novel scheme to learn a language model (LM) for automatic speech recognition (ASR) directly from continuous speech. In the proposed method, we first generate phoneme lattices using an acoustic model with no linguistic constraints, then perform training over these phoneme lattices, simultaneously learning both lexical units and an LM. As a statistical framework for this learning problem, we use non-parametric Bayesian statistics, which make it possible to balance the learned model's complexity (such as the size of the learned vocabulary) and expressive power, and provide a principled learning algorithm through the use of Gibbs sampling. Implementation is performed using weighted finite state transducers (WFSTs), which allow for the simple handling of lattice input. Experimental results on natural, adult-directed speech demonstrate that LMs built using only continuous speech are able to significantly reduce ASR phoneme error rates. The proposed technique of joint Bayesian learning of lexical units and an LM over lattices is shown to significantly contribute to this improvement.

引用

页码：614 / 625

页数：12

共 40 条

[1]

Abney S, 2010, ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, P88

[2] Statistical Transformation of Language and Pronunciation Models for Spontaneous Speech Recognition [J].

Akita, Yuya ;

Kawahara, Tatsuya .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06) :1539-1549

[3]

[Anonymous], 2008, P 25 INT C MACH LEAR

[4]

[Anonymous], 2004, ACM Transactions on Applied Perception (TAP), DOI DOI 10.1145/1008722.1008727

[5]

[Anonymous], 2009, P HUM LANG TECHN 200

[6]

Asuncion A., 2008, P 22 ANN C NEUR INF, V1

[7]

Bazzi I., 2001, Proceedings of European Conference on Speech Communication and Technology, Aalborg, P61

[8] VARIABLE-LENGTH SEQUENCE MODELING - MULTIGRAMS [J].

BIMBOT, F ;

PIERACCINI, R ;

LEVIN, E ;

ATAL, B .

IEEE SIGNAL PROCESSING LETTERS, 1995, 2 (06) :111-113

[9]

Bosch Louis Ten., 2007, Proc. INTERSPEECH, P1481

[10] An efficient, probabilistically sound algorithm for segmentation and word discovery [J].

Brent, MR .

MACHINE LEARNING, 1999, 34 (1-3) :71-105

← 1 2 3 4 →