Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding

被引:5
作者
Lecouteux, Benjamin [1 ]
Linares, Georges [2 ]
Esteve, Yannick [3 ]
Gravier, Guillaume [4 ]
机构
[1] LIG Univ Grenoble Alpes, GETALP Team, F-38041 Grenoble 9, France
[2] LIA Univ Avignon, Speech Proc Grp, F-84911 Avignon 9, France
[3] Univ Le Mans LIUM, Lab Informat, F-72085 Le Mans 9, France
[4] CNRS IRISA, F-35042 Rennes, France
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 06期
关键词
Automatic speech recognition; speech processing; system combination;
D O I
10.1109/TASL.2013.2248716
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Combining automatic speech recognition (ASR) systems generally relies on the posterior merging of the outputs or on acoustic cross-adaptation. In this paper, we propose an integrated approach where outputs of secondary systems are integrated in the search algorithm of a primary one. In this driven decoding algorithm (DDA), the secondary systems are viewed as observation sources that should be evaluated and combined to others by a primary search algorithm. DDA is evaluated on a subset of the ESTER I corpus consisting of 4 hours of French radio broadcast news. Results demonstrate DDA significantly outperforms vote-based approaches: we obtain an improvement of 14.5% relative word error rate over the best single-systems, as opposed to the the 6.7% with a ROVER combination. An in-depth analysis of the DDA shows its ability to improve robustness (gains are greater in adverse conditions) and a relatively low dependency on the search algorithm. The application of DDA to both A* and beam-search-based decoder yields similar performances.
引用
收藏
页码:1251 / 1260
页数:10
相关论文
共 31 条
  • [1] Frame-based acoustic feature integration for speech understanding
    Barrault, Loic
    Servan, Christophe
    Matrouf, Driss
    Linares, Georges
    De Mori, Renato
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4997 - 5000
  • [2] Bonastre J.-F., 2005, P ICASSP 05 PHIL PA
  • [3] Breslin C, 2007, INT CONF ACOUST SPEE, P337
  • [4] Burget L., 2004, THESIS VUT BRNO
  • [5] Chen I.-F., 2006, P INT 06 ICSLP PITTS
  • [6] Deleglise P., 2005, P INT 05 EUR LISB PO
  • [7] Evermann G., 2000, NIST SPEECH TRANSCR, P78
  • [8] A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)
    Fiscus, JG
    [J]. 1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 347 - 354
  • [9] Galliano S., 2005, P EUR C SPEECH COMM
  • [10] Goel V., 2000, P ICSLP