Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding

被引:5
作者
Lecouteux, Benjamin [1 ]
Linares, Georges [2 ]
Esteve, Yannick [3 ]
Gravier, Guillaume [4 ]
机构
[1] LIG Univ Grenoble Alpes, GETALP Team, F-38041 Grenoble 9, France
[2] LIA Univ Avignon, Speech Proc Grp, F-84911 Avignon 9, France
[3] Univ Le Mans LIUM, Lab Informat, F-72085 Le Mans 9, France
[4] CNRS IRISA, F-35042 Rennes, France
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 06期
关键词
Automatic speech recognition; speech processing; system combination;
D O I
10.1109/TASL.2013.2248716
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Combining automatic speech recognition (ASR) systems generally relies on the posterior merging of the outputs or on acoustic cross-adaptation. In this paper, we propose an integrated approach where outputs of secondary systems are integrated in the search algorithm of a primary one. In this driven decoding algorithm (DDA), the secondary systems are viewed as observation sources that should be evaluated and combined to others by a primary search algorithm. DDA is evaluated on a subset of the ESTER I corpus consisting of 4 hours of French radio broadcast news. Results demonstrate DDA significantly outperforms vote-based approaches: we obtain an improvement of 14.5% relative word error rate over the best single-systems, as opposed to the the 6.7% with a ROVER combination. An in-depth analysis of the DDA shows its ability to improve robustness (gains are greater in adverse conditions) and a relatively low dependency on the search algorithm. The application of DDA to both A* and beam-search-based decoder yields similar performances.
引用
收藏
页码:1251 / 1260
页数:10
相关论文
共 31 条
  • [11] Hillard D., 2007, P HLT
  • [12] Hoffmeister B., 2006, P ICSLP
  • [13] Huet S., 2010, COMPUT SPEECH LANG
  • [14] Lecouteux B, 2007, INT CONF ACOUST SPEE, P341
  • [15] Lecouteux B., 2009, P INT C SPEECH COMM
  • [16] Generalized driven decoding for speech recognition system combination
    Lecouteux, Benjamin
    Linares, Georges
    Esteve, Yannick
    Gravier, Guillaume
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 1549 - +
  • [17] Integrating imperfect transcripts into speech recognition systems for building high-quality corpora
    Lecouteux, Benjamin
    Linares, Georges
    Oger, Stanislas
    [J]. COMPUTER SPEECH AND LANGUAGE, 2012, 26 (02) : 67 - 89
  • [18] Massonie D., 2005, P INTERSPEECH 05 LIS
  • [19] Mauclair J., 2006, P LREC 06 GEN IT MAY
  • [20] Nocera P., 2004, P SWIM LECT MAST SPE