The IBM 2016 English Conversational Telephone Speech Recognition System

被引：45

作者：

Saon, George ^{[1
]}

Sercu, Tom ^{[1
]}

Rennie, Steven ^{[1
]}

Kuo, Hong-Kwang J. ^{[1
]}

机构：

[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

recurrent neural networks; convolutional neural networks; conversational speech recognition;

D O I：

10.21437/Interspeech.2016-1460

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We describe a collection of acoustic and language modeling techniques that lowered the word error rate of our English conversational telephone LVCSR system to a record 6.6% on the Switchboard subset of the Hub5 2000 evaluation testset. On the acoustic side, we use a score fusion of three strong models: recurrent nets with maxout activations, very deep convolutional nets with 3x3 kernels, and bidirectional long short-term memory nets which operate on FMLLR and i-vector features. On the language modeling side, we use an updated model "M" and hierarchical neural network LMs.

引用

页码：7 / 11

页数：5

共 27 条

[1]

Abdel-Hamid O, 2013, INTERSPEECH, P3365

[2]

[Anonymous], P ICASSP

[3]

[Anonymous], P ICASSP

[4]

[Anonymous], 2013, ICML

[5]

[Anonymous], P ASRU

[6]

Bengio Y, 2001, ADV NEUR IN, V13, P932

[7]

Chen S. F., 2009, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, P468

[8] An empirical study of smoothing techniques for language modeling [J].

Chen, SF ;

Goodman, J .

COMPUTER SPEECH AND LANGUAGE, 1999, 13 (04) :359-394

[9]

Collobert R, 2011, BIGLEARN NIPS WORKSH, P1

[10]

Emami A., 2006, THESIS

← 1 2 3 →