Speaker diarization for multiple-distant-microphone meetings using several sources of information

被引:46
作者
Pardo, Jose M.
Anguera, Xavier
Wooters, Charles
机构
[1] Univ Politecn Madrid, ETSI Telecomunicac, E-28040 Madrid, Spain
[2] Telefonica I&D, Barcelona 08021, Spain
[3] Int Comp Sci Inst, Berkeley, CA 94704 USA
关键词
speech source separation; speaker diarization; speaker segmentation; meetings recognition; rich transcription;
D O I
10.1109/TC.2007.1077
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Human-machine interaction in meetings requires the localization and identification of the speakers interacting with the system, as well as the recognition of the words spoken. A seminal step toward this goal is the field of rich transcription research, which includes speaker diarization together with the annotation of sentence boundaries and the elimination of speaker disfluencies. The subarea of speaker diarization attempts to identify the number of participants in a meeting and create a list of speech time intervals for each such participant. In this paper, we analyze the correlation between signals coming from multiple microphones and propose an improved method for carrying out speaker diarization for meetings with multiple distant microphones. The proposed algorithm makes use of acoustic information and information from the delays between signals coming from the different sources. Using this procedure, we were able to achieve state-of-the-art performance in the NIST spring 2006 rich transcription evaluation, improving the Diarization Error Rate ( DER) by 15 percent to 28 percent relative to previous systems.
引用
收藏
页码:1212 / 1224
页数:13
相关论文
共 35 条
[1]  
Ajmera J, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P605
[2]  
AJMERA J, 2003, P IEEE WORKSH AUT SP
[3]  
ANGUERA X, 2006, P IEEE ODY 2006 SPEA
[4]  
ANGUERA X, 2005, P IEEE WORKSH AUT SP
[5]  
ANGUERA X, 2005, P NIST MLMI M REC WO
[6]  
ANGUERA X, 2005, P INTL C SPOK LANG P
[7]  
Anguera X, 2006, LECT NOTES COMPUT SC, V4299, P346
[8]  
Anguera X, 2006, LECT NOTES COMPUT SC, V4299, P248
[9]  
BARRAS C, 2004, P DARPA 2004 RICH TR
[10]  
BRANDSTEIN MS, 1997, P IEEE INTL C AC SPE