Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding

被引：5

作者：

Lecouteux, Benjamin ^{[1
]}

Linares, Georges ^{[2
]}

Esteve, Yannick ^{[3
]}

Gravier, Guillaume ^{[4
]}

机构：

[1] LIG Univ Grenoble Alpes, GETALP Team, F-38041 Grenoble 9, France

[2] LIA Univ Avignon, Speech Proc Grp, F-84911 Avignon 9, France

[3] Univ Le Mans LIUM, Lab Informat, F-72085 Le Mans 9, France

[4] CNRS IRISA, F-35042 Rennes, France

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 06期

关键词：

Automatic speech recognition; speech processing; system combination;

D O I：

10.1109/TASL.2013.2248716

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Combining automatic speech recognition (ASR) systems generally relies on the posterior merging of the outputs or on acoustic cross-adaptation. In this paper, we propose an integrated approach where outputs of secondary systems are integrated in the search algorithm of a primary one. In this driven decoding algorithm (DDA), the secondary systems are viewed as observation sources that should be evaluated and combined to others by a primary search algorithm. DDA is evaluated on a subset of the ESTER I corpus consisting of 4 hours of French radio broadcast news. Results demonstrate DDA significantly outperforms vote-based approaches: we obtain an improvement of 14.5% relative word error rate over the best single-systems, as opposed to the the 6.7% with a ROVER combination. An in-depth analysis of the DDA shows its ability to improve robustness (gains are greater in adverse conditions) and a relatively low dependency on the search algorithm. The application of DDA to both A* and beam-search-based decoder yields similar performances.

引用

页码：1251 / 1260

页数：10

共 31 条

[1] Frame-based acoustic feature integration for speech understanding
Barrault, Loic
Servan, Christophe
Matrouf, Driss
Linares, Georges
De Mori, Renato
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4997 - 5000
[2] Bonastre J.-F., 2005, P ICASSP 05 PHIL PA
[3] Breslin C, 2007, INT CONF ACOUST SPEE, P337
[4] Burget L., 2004, THESIS VUT BRNO
[5] Chen I.-F., 2006, P INT 06 ICSLP PITTS
[6] Deleglise P., 2005, P INT 05 EUR LISB PO
[7] Evermann G., 2000, NIST SPEECH TRANSCR, P78
[8] A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)
Fiscus, JG
[J]. 1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 347 - 354
[9] Galliano S., 2005, P EUR C SPEECH COMM
[10] Goel V., 2000, P ICSLP

← 1 2 3 4 →