Optimizing bottle-neck features for LVCSR

被引:79
作者
Grezl, Frantisek [1 ]
Fousek, Petr [2 ]
机构
[1] Brno Univ Technol, Speech FIT, Brno, Czech Republic
[2] LIMSI, CNRS, Spoken Language Proc Grp, F-91403 Orsay, France
来源
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 | 2008年
关键词
bottle-neck; ULP structure; features; LVCSR;
D O I
10.1109/ICASSP.2008.4518713
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work continues in development of the recently proposed Bottle-Neck features for ASR. A five-layers RLP used in bottleneck feature extraction allows to obtain arbitrary feature size without dimensionality reduction by transforms, independently on the MLP training targets. The MLP topology - number and sizes of layers, suitable training targets, the impact of output feature transforms, the need of delta features, and the dimensionality of the final feature vector are studied with respect to the best ASR result. Optimized features are employed in three LVCSR tasks: Arabic broadcast news, English conversational telephone speech and English meetings. Improvements over standard cepstral features and probabilistic MLP features are shown for different tasks and different neural net input representations. A significant improvement is observed when phoneme MLP training targets are replaced by phoneme states and when delta features are added.
引用
收藏
页码:4729 / +
页数:2
相关论文
共 12 条
  • [1] CHEN B, 2004, RT 04 WORKSH PAL NY
  • [2] FOUSEK P, 2007, THESIS CZECH TU PRAG
  • [3] GAUVAIN JL, 2002, SPEECH COMMUNICATION, V37
  • [4] GREZL F, 2003, EUROSPEECH 03
  • [5] GREZL F, 2007, ICASSP 07
  • [6] HAIN T, 2005, RT 05 WORKSH ED
  • [7] HERMANSKY H, 2003, EUROSPEECH 03
  • [8] HERMANSKY H, 1998, ICSLP 98
  • [9] HERMANSKY H, 2005, INTERSPEECH 05
  • [10] Janin A, 2006, LECT NOTES COMPUT SC, V4299, P444