Composite decision by Bayesian inference in distant-talking speech recognition

被引:0
作者
Ji, Mikyong [1 ]
Kim, Sungtak [1 ]
Kim, Hoirin [1 ]
机构
[1] Informat & Commun Univ, SRT Lab, Taejon 305732, South Korea
来源
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS | 2006年 / 4188卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes an integrated system to produce a composite recognition output on distant-talking speech when the recognition results from multiple microphone inputs are available. In many cases, the composite recognition result has lower error rate than any other individual output. In this work, the composite recognition result is obtained by applying Bayesian inference. The log likelihood score is assumed. to follow a Gaussian distribution, at least approximately. First, the distribution of the likelihood score is estimated in the development set. Then, the confidence interval for the likelihood score is used to remove unreliable microphone channels. Finally, the area under the distribution between the likelihood score of a hypothesis and that of the (N+1)(st) hypothesis is obtained for every channel and integrated for all channels by Bayesian inference. The proposed system shows considerable performance improvement compared with the result using an ordinary method by the summation of likelihoods as well as any of the recognition results of the channels.
引用
收藏
页码:463 / 470
页数:8
相关论文
共 50 条
[31]   Distant-talking speech recognition with microphone-array sound pickup and NN/MLLR environment equalization [J].
Lin, QG ;
Flanagan, J ;
Che, CW .
PROGRESS IN CONNECTIONIST-BASED INFORMATION SYSTEMS, VOLS 1 AND 2, 1998, :1099-1102
[32]   3-D N-best search for simultaneous recognition of distant-talking speech of multiple talkers [J].
Nakamura, S ;
Heracleous, P .
FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS, 2002, :59-63
[33]   Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization [J].
Ueda, Yuma ;
Wang, Longbiao ;
Kai, Atsuhiko ;
Xiao, Xiong ;
Chng, Eng Siong ;
Li, Haizhou .
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, :379-+
[34]   Distant-talking speech recognition using multi-channel LMS and multiple-step linear prediction [J].
Shiota, Satoshi ;
Wang, Longbiao ;
Odani, Kyohei ;
Kai, Atsuhiko ;
Li, Weifeng .
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, :384-+
[35]   Minimum Kullback-Leibler distance based multivariate Gaussian feature adaptation for distant-talking speech recognition [J].
Pan, Y ;
Waibel, A .
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, :1029-1032
[36]   Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization [J].
Yuma Ueda ;
Longbiao Wang ;
Atsuhiko Kai ;
Xiong Xiao ;
Eng Siong Chng ;
Haizhou Li .
Journal of Signal Processing Systems, 2016, 82 :151-161
[37]   Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization [J].
Ueda, Yuma ;
Wang, Longbiao ;
Kai, Atsuhiko ;
Xiao, Xiong ;
Chng, Eng Siong ;
Li, Haizhou .
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02) :151-161
[38]   Distant-talking Speech Recognition Based on Multi-objective Learning using Phase and Magnitude-based Feature [J].
Li, Dongbo ;
Wang, Longbiao ;
Dang, Jianwu ;
Ge, Meng ;
Guan, Haotian .
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, :394-398
[39]   Deep learning based distant-talking speech processing in real-world sound environments [J].
Araki, Shoko ;
Fujimoto, Masakiyo ;
Yoshioka, Takuya ;
Delcroix, Marc ;
Espi, Miquel ;
Nakatani, Tomohiro .
NTT Technical Review, 2015, 13 (11)
[40]   EXPERIMENTS ON DISTANT-TALKING SPEAKER VERIFICATION IN TV SCENARIO [J].
Zieger, Christian ;
Matassoni, Marco ;
Omologo, Maurizio .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4538-4541