Composite decision by Bayesian inference in distant-talking speech recognition

被引:0
作者
Ji, Mikyong [1 ]
Kim, Sungtak [1 ]
Kim, Hoirin [1 ]
机构
[1] Informat & Commun Univ, SRT Lab, Taejon 305732, South Korea
来源
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS | 2006年 / 4188卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes an integrated system to produce a composite recognition output on distant-talking speech when the recognition results from multiple microphone inputs are available. In many cases, the composite recognition result has lower error rate than any other individual output. In this work, the composite recognition result is obtained by applying Bayesian inference. The log likelihood score is assumed. to follow a Gaussian distribution, at least approximately. First, the distribution of the likelihood score is estimated in the development set. Then, the confidence interval for the likelihood score is used to remove unreliable microphone channels. Finally, the area under the distribution between the likelihood score of a hypothesis and that of the (N+1)(st) hypothesis is obtained for every channel and integrated for all channels by Bayesian inference. The proposed system shows considerable performance improvement compared with the result using an ordinary method by the summation of likelihoods as well as any of the recognition results of the channels.
引用
收藏
页码:463 / 470
页数:8
相关论文
共 50 条
[41]   Simultaneous recognition of distant-talking speech of multiple talkers based on the 3-D N-best search method [J].
Heracleous, P ;
Nakamura, S ;
Shikano, K .
JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3) :105-116
[42]   Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D N-Best Search Method [J].
Panikos Heracleous ;
Satoshi Nakamura ;
Kiyohiro Shikano .
Journal of VLSI signal processing systems for signal, image and video technology, 2004, 36 :105-116
[43]   A prototype of distant-talking interface for control of interactive TV [J].
Omologo, Maurizio .
2010 CONFERENCE RECORD OF THE FORTY FOURTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2010, :1711-1715
[44]   A reverberation robust target speech detection method using dual-microphone in distant-talking scene [J].
Wang, Xiaofei ;
Guo, Yanmeng ;
Wu, Chao ;
Fu, Qiang ;
Yan, Yonghong .
SPEECH COMMUNICATION, 2015, 72 :47-58
[45]   Simultaneous recognition of distant-talking speech of multiple sound sources based on 3-D N-best search algorithm [J].
Heracleous, P ;
Nakamura, S ;
Shikano, K .
ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, :111-114
[46]   Effective Acoustic Adaptation for A Distant-talking Interactive TV System [J].
Huang, Jing ;
Epstein, Mark ;
Matassoni, Marco .
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, :1709-+
[47]   Group Delay Based Methods for Recognition of Distant talking Speech [J].
Mandala, Rohan ;
Shukla, Mrityunjaya ;
Hegde, Rajesh .
2010 CONFERENCE RECORD OF THE FORTY FOURTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2010, :1702-1706
[48]   Prediction, Bayesian inference and feedback in speech recognition [J].
Norris, Dennis ;
McQueen, James M. ;
Cutler, Anne .
LANGUAGE COGNITION AND NEUROSCIENCE, 2016, 31 (01) :4-18
[49]   Using artificially reverberated training data in distant-talking ASR [J].
Haderlein, T ;
Nöth, E ;
Herbordt, W ;
Kellermann, W ;
Niemann, H .
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 :226-233
[50]   A TWO-MICROPHONE BASED VOICE ACTIVITY DETECTION FOR DISTANT-TALKING SPEECH IN WIDE RANGE OF DIRECTION OF ARRIVAL [J].
Guo, Yanmeng ;
Li, Kai ;
Fu, Qiang ;
Yan, Yonghong .
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, :4901-4904