Detection and Separation of Speech Events in Meeting Recordings Using a Microphone Array

被引:0
作者
Futoshi Asano
Kiyoshi Yamamoto
Jun Ogata
Miichi Yamada
Masami Nakamura
机构
[1] National Institute of Advanced Industrial Science and Technology,Information Technology Research Institute
[2] Advanced Media,undefined
[3] Inc.,undefined
来源
EURASIP Journal on Audio, Speech, and Music Processing | / 2007卷
关键词
Acoustics; Speech Recognition; Automatic Speech Recognition; Spontaneous Speech; Microphone Array;
D O I
暂无
中图分类号
学科分类号
摘要
When applying automatic speech recognition (ASR) to meeting recordings including spontaneous speech, the performance of ASR is greatly reduced by the overlap of speech events. In this paper, a method of separating the overlapping speech events by using an adaptive beamforming (ABF) framework is proposed. The main feature of this method is that all the information necessary for the adaptation of ABF, including microphone calibration, is obtained from meeting recordings based on the results of speech-event detection. The performance of the separation is evaluated via ASR using real meeting recordings.
引用
收藏
相关论文
共 33 条
[1]  
Moore DC(2003)Microphone array speech recognition: experiments on overlapping speech in meetings Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03) 5 497-500
[2]  
McCowan IA(2004)Dynamic Bayesian networks for meeting structuring Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04) 5 629-632
[3]  
Dielmann A(2004)Clustering and segmenting speakers and their locations in meetings Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04) 1 605-608
[4]  
Renals S(1999)A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters IEEE Transactions on Signal Processing 47 2677-2684
[5]  
Ajmera J(2004)Speech enhancement based on the general transfer function GSC and postfiltering IEEE Transactions on Speech and Audio Processing 12 561-571
[6]  
Lathoud G(2000)Speech enhancement based on the subspace method IEEE Transactions on Speech and Audio Processing 8 497-507
[7]  
McCowan I(2004)Detection and separation of speech event using audio and video information fusion and its application to robust speech interface EURASIP Journal on Applied Signal Processing 2004 1727-1738
[8]  
Hoshuyama O(2003)Combined approach of array processing and independent component analysis for blind separation of acoustic signals IEEE Transactions on Speech and Audio Processing 11 204-215
[9]  
Sugiyama A(1986)Multiple emitter location and signal parameter estimation IEEE Transactions on Antennas and Propagation 34 276-280
[10]  
Hirano A(1995)An optimum computer-generated pulse signal suitable for the measurement of very long impulse responses Journal of the Acoustical Society of America 97 1119-1123