On the Use of MLP Features for Broadcast News Transcription

被引：0

作者：

Fousek, Petr ^{[1
]}

Lamel, Lori ^{[1
]}

Gauvain, Jean-Luc ^{[1
]}

机构：

[1] LIMSI, CNRS, Spoken Language Proc Grp, Paris, France

来源：

TEXT, SPEECH AND DIALOGUE, PROCEEDINGS | 2008年 / 5246卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-Layer Perceptron (MLP) feauters have recently been attracting growing interest for automatic speech recognition due to their complementarity with cepstral features. In this paper the use of MLP features is evaluated in a large vocabulary continuous speech recognition task, exploring different types of MLP features and their combination. Cepstral features and three types of Bottle-Neck MLP features were first evaluated without and with unsupervised model adaption using models with the same number of parameters. When used with MLLR adaption on a broadcast news Arabic transcription task, Bottle-Neck MLP features perform as well as or even slightly better than a standard 39 PLP based front-end. This paper also explores different combination schemes (feature concatenations, cross adaptation, and hypothesis combination). Extending the feature vector by combining various features sets led to a 9% relative word error rate reduction relative to the PLP baseline. Significant gains are also reported with both ROVER hypothesis combination and cross-model adaption. Feature concatenation appears to be the most efficient combination method, providing the best gain with the lowest decoding cost.

引用

页码：303 / 310

页数：8

共 11 条

[1] ATHINEOS M, 2004, ICSLP 2004
[2] FISCUS J, 1997, POSTPROCESSING SYSTE
[3] FOUSEK P, 2007, THESIS CZECH TU PRAG
[4] The LIMSI Broadcast News transcription system
Gauvain, JL
Lamel, L
Adda, G
[J]. SPEECH COMMUNICATION, 2002, 37 (1-2) : 89 - 108
[5] GREZL F, 2008, ICASSP 2008
[6] Grézl F, 2007, INT CONF ACOUST SPEE, P757
[7] HERMANSKY H, 1999, ICSLP 1998
[8] HERMANSKY H, 2000, ICASSP 2000
[9] LAMEL L, 2007, INTERSPEECH 2007
[10] MAXIMUM-LIKELIHOOD LINEAR-REGRESSION FOR SPEAKER ADAPTATION OF CONTINUOUS DENSITY HIDDEN MARKOV-MODELS
LEGGETTER, CJ
WOODLAND, PC
[J]. COMPUTER SPEECH AND LANGUAGE, 1995, 9 (02) : 171 - 185

← 1 2 →