Speaker Clustering with Penalty Distance for Speaker Verification with Multi-Speaker Speech

被引:0
作者
Das, Rohan Kumar [1 ]
Yang, Jichen [1 ]
Li, Haizhou [1 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
来源
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2019年
关键词
DIARIZATION; RECOGNITION; SYSTEM;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Speaker verification in a multi-speaker environment is an emerging research topic. Speaker clustering, that separates multiple speakers, can be effective if a predetermined threshold or the number of speakers present in a multi-speaker utterance is given. However, the problem in practice does not provide the leverage for either of the factors. This work proposes to handle such a problem by introducing a penalty distance factor in the pipeline of traditional clustering techniques. The proposed framework first uses traditional clustering techniques to form speaker clusters for a given number of speakers. We then compute the penalty distance based on Bayesian information criterion that is used for merging alike clusters in a multi-speaker utterance. The studies are conducted on speakers in the wild (SITW) and recent NIST SRE 2018 databases that contain multi-speaker conversational speech in noisy environments. The results show the effectiveness of the proposed penalty distance based refinement in such a scenario.
引用
收藏
页码:1630 / 1635
页数:6
相关论文
共 45 条
[1]   Speaker Diarization: A Review of Recent Research [J].
Anguera Miro, Xavier ;
Bozonnet, Simon ;
Evans, Nicholas ;
Fredouille, Corinne ;
Friedland, Gerald ;
Vinyals, Oriol .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02) :356-370
[2]  
[Anonymous], CORR
[3]  
[Anonymous], 2018, NIST 2018 speaker recognition evaluation plan
[4]  
Brummer<spacing diaeresis> N., 2013, The bosaris toolkit: Theory, algorithms and code for surviving the new dcf
[5]  
Chen S., 1998, Paper presented at Proceedings of the Broadcast News Transcription and Understanding Workshop, Lansdowne Conference Resort, Lansdowne, Virginia, February, V8, P127
[6]  
Chen Xianhong, 2018, SPEAK LANG REC WORKS, P134
[7]  
Chung JS, 2018, INTERSPEECH, P1086
[8]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[9]   Fast Single- and Cross-Show Speaker Diarization Using Binary Key Speaker Modeling [J].
Delgado, Hector ;
Anguera, Xavier ;
Fredouille, Corinne ;
Serrano, Javier .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) :2286-2297
[10]   BUT system for DIHARD Speech Diarization Challenge 2018 [J].
Diez, Mireia ;
Landini, Federico ;
Burget, Lukas ;
Rohdin, Johan ;
Silnova, Anna ;
Zmolikova, Katerina ;
Novotny, Ondrej ;
Vesely, Karel ;
Glembek, Ondrej ;
Plchot, Oldrich ;
Mosner, Ladislav ;
Matejka, Pavel .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :2798-2802