Composite decision by Bayesian inference in distant-talking speech recognition

被引:0
作者
Ji, Mikyong [1 ]
Kim, Sungtak [1 ]
Kim, Hoirin [1 ]
机构
[1] Informat & Commun Univ, SRT Lab, Taejon 305732, South Korea
来源
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS | 2006年 / 4188卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes an integrated system to produce a composite recognition output on distant-talking speech when the recognition results from multiple microphone inputs are available. In many cases, the composite recognition result has lower error rate than any other individual output. In this work, the composite recognition result is obtained by applying Bayesian inference. The log likelihood score is assumed. to follow a Gaussian distribution, at least approximately. First, the distribution of the likelihood score is estimated in the development set. Then, the confidence interval for the likelihood score is used to remove unreliable microphone channels. Finally, the area under the distribution between the likelihood score of a hypothesis and that of the (N+1)(st) hypothesis is obtained for every channel and integrated for all channels by Bayesian inference. The proposed system shows considerable performance improvement compared with the result using an ordinary method by the summation of likelihoods as well as any of the recognition results of the channels.
引用
收藏
页码:463 / 470
页数:8
相关论文
共 50 条
[21]   Investigations into Early and Late Reflections on Distant-Talking Speech Recognition Toward Suitable Reverberation Criteria [J].
Nishiura, Takanobu ;
Hirano, Yoshiki ;
Denda, Yuki ;
Nakayama, Masato .
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, :1369-1372
[22]   Phase and reverberation aware DNN for distant-talking speech enhancement [J].
Zeyan Oo ;
Longbiao Wang ;
Khomdet Phapatanaburi ;
Masahiro Iwahashi ;
Seiichi Nakagawa ;
Jianwu Dang .
Multimedia Tools and Applications, 2018, 77 :18865-18880
[23]   Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm [J].
Wang, Longbiao ;
Kitaoka, Norihide ;
Nakagawa, Seiichi .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (03) :659-667
[24]   Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition [J].
Sehr, Armin ;
Maas, Roland ;
Kellermann, Walter .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1676-1691
[25]   Distant-talking robust speech recognition using late reflection components of room impulse response [J].
Gomez, Randy ;
Even, Jani ;
Saruwatari, Hiroshi ;
Shikano, Kiyohiro .
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :4581-4584
[26]   JOINT SPARSE REPRESENTATION BASED CEPSTRAL-DOMAIN DEREVERBERATION FOR DISTANT-TALKING SPEECH RECOGNITION [J].
Li, Weifeng ;
Wang, Longbiao ;
Zhou, Fei ;
Liao, Qingmin .
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, :7117-7120
[27]   Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments - Newest Part of the CENSREC Series - [J].
Nishiura, Takanobu ;
Nakayama, Masato ;
Denda, Yuki ;
Kitaoka, Norihide ;
Yamamoto, Kazumasa ;
Yamada, Takeshi ;
Tsuge, Satoru ;
Miyajima, Chiyomi ;
Fujimoto, Masakiyo ;
Takiguchi, Tetsuya ;
Tamura, Satoshi ;
Kuroiwa, Shingo ;
Takeda, Kazuya ;
Nakamura, Satoshi .
SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, :1828-1834
[28]   CENSREC-4: Development of Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments [J].
Nakayama, Masato ;
Nishiura, Takanobu ;
Denda, Yuki ;
Kitaoka, Norihide ;
Yamamoto, Kazumasa ;
Yamada, Takeshi ;
Tsuge, Satoru ;
Miyajima, Chiyomi ;
Fujimoto, Masakiyo ;
Takiguchi, Tetsuya ;
Tamura, Satoshi ;
Ogawa, Tetsuji ;
Matsuda, Shigeki ;
Kuroiwa, Shingo ;
Takeda, Kazuya ;
Nakamura, Satoshi .
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, :968-+
[29]   Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array [J].
Yamada, T ;
Nakamura, S ;
Shikano, K .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (02) :48-56
[30]   Dereverberantion based on Generalized Spectral Subtraction for Distant-talking Speaker Recognition [J].
Zhang, Zhaofeng ;
Wang, Longbiao ;
Kai, Atsuhiko .
2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,