Acoustic beamforming for speaker diarization of meetings

被引:287
作者
Anguera, Xavier [1 ]
Wooters, Chuck
Hernando, Javier
机构
[1] Telefon ID, Madrid 28043, Spain
[2] Univ Politecn Cataluna, E-08028 Barcelona, Spain
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 07期
关键词
acoustic beamforming; meeting processing; speaker diarization; speaker segmentation and clustering;
D O I
10.1109/TASL.2007.902460
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
When performing speaker diarization on recordings from meetings, multiple microphones of different qualities are usually available and distributed around the meeting room. Although several approaches have been proposed in recent years to take advantage of multiple microphones, they are either too computationally expensive and not easily scalable or they cannot outperform the simpler case of using the best single microphone. In this paper, the use of classic acoustic beamforming techniques is proposed together with several novel algorithms to create a complete frontend for speaker diarization in the meeting room domain. New techniques we are presenting include blind reference-channel selection, two-step time delay of arrival (TDOA) Viterbi postprocessing, and a dynamic output signal weighting algorithm, together with using such TDOA values in the diarization to complement the acoustic information. Tests on speaker diarization show a 25% relative improvement on the test set compared to using a single most centrally located microphone. Additional experimental results show improvements using these techniques in a speech recognition task.
引用
收藏
页码:2011 / 2022
页数:12
相关论文
共 32 条
[1]   A robust speaker clustering algorithm [J].
Ajmera, J ;
Wooters, C .
ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, :411-416
[2]  
Anguera X, 2005, 2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), P426
[3]  
Anguera X, 2005, LECT NOTES COMPUT SC, V3869, P402
[4]  
ANGUERA X, 2007, P ICASSP APR, P241
[5]  
ANGUERA X, 2006, THESIS U POLITECNICA
[6]  
ANGUERA X, 2006, P ICSLP PITTSB PA SE, P1674
[7]  
Brandstein MS, 1997, INT CONF ACOUST SPEE, P375, DOI 10.1109/ICASSP.1997.599651
[8]  
CASSIDY S, P NIST 2004 SPRING M
[9]  
Chen SS, 1998, INT CONF ACOUST SPEE, P645, DOI 10.1109/ICASSP.1998.675347
[10]  
Fiscus JG, 2006, LECT NOTES COMPUT SC, V4299, P309