Hierarchical multi-stream posterior based speech recognition system

被引:0
作者
Ketabdar, H [1 ]
Bourlard, H
Bengio, S
机构
[1] IDIAP Res Inst, Martigny, Switzerland
[2] EPFL, Lausanne, Switzerland
来源
MACHINE LEARNING FOR MULTIMODAL INTERACTION | 2005年 / 3869卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present initial results towards boosting posterior based speech recognition systems by estimating more informative posteriors using multiple streams of features and taking into account acoustic context (e.g., as available in the whole utterance), as well as possible prior information (such as topological constraints). These posteriors are estimated based on "state gamma posterior" definition (typically used in standard HMMs training) extended to the case of multi-stream HMMs. This approach provides a new, principled, theoretical framework for hierarchical estimation/use of posteriors, multi-stream feature combination, and integrating appropriate context and prior knowledge in posterior estimates. In the present work, we used the resulting gamma posteriors as features for a standard HMM/CMM layer. On the OGI Digits database and on a reduced vocabulary version (1000 words) of the DARPA Conversational Telephone Speech-to-text (CTS) task, this resulted in significant performance improvement, compared to the state-of-the-art Tandem systems.
引用
收藏
页码:294 / 306
页数:13
相关论文
共 16 条
  • [1] ABDOU S, 2004, SPEECH COMMUN, P409
  • [2] BENGIO S, 2005, 0522 IDIAP RR
  • [3] BERNARDIS G, 1998, P INT C SPOK LANG PR, P775
  • [4] BOURLARD H, 1997, P IEEE INT C AC SPEE, P1251
  • [5] BOURLARD H, 2004, RT04 DARPA
  • [6] Bourlard H. A., 1994, Connectionist speech recognition: a hybrid approach
  • [7] COLE R, 1994, P ISCLP YOK JAP, P1815
  • [8] COLE RA, 1995, P 4 EUR C SPEECH COM, P821
  • [9] Audio-Visual Speech Modeling for Continuous Speech Recognition
    Dupont, Stephane
    Luettin, Juergen
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
  • [10] PERCEPTUAL LINEAR PREDICTIVE (PLP) ANALYSIS OF SPEECH
    HERMANSKY, H
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) : 1738 - 1752