Bayesian HMM based x-vector clustering for Speaker Diarization

被引:34
作者
Diez, Mireia [1 ]
Burget, Lukas [1 ]
Wang, Shuai [1 ,2 ]
Rohdin, Johan [1 ]
Cernocky, Jan [1 ]
机构
[1] Brno Univ Technol, Fac Informat Technol, IT4I Ctr Excellence, Brno, Czech Republic
[2] Shanghai Jiao Tong Univ, Speechlab, Dept Comp Sci & Engn, Shanghai, Peoples R China
来源
INTERSPEECH 2019 | 2019年
基金
欧盟地平线“2020”; 美国国家科学基金会;
关键词
Speaker Diarization; Variational Bayes; HMM; x-vector; DIHARD;
D O I
10.21437/Interspeech.2019-2813
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper presents a simplified version of the previously proposed diarization algorithm based on Bayesian Hidden Markov Models, which uses Variational Bayesian inference for very fast and robust clustering of x-vector (neural network based speaker embeddings). The presented results show that this clustering algorithm provides significant improvements in diarization performance as compared to the previously used Agglomerative Hierarchical Clustering. The output of this system can be further employed as an initialization for a second stage VB diarization system, using frame-wise MFCC features as input, to obtain optimal results.
引用
收藏
页码:346 / 350
页数:5
相关论文
共 26 条
[11]  
Chung JS, 2018, INTERSPEECH, P1086
[12]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[13]  
Diez M., 2019, IEEE T AUDIO SPEECH
[14]  
Garcia-Romero D, 2017, INT CONF ACOUST SPEE, P4930, DOI 10.1109/ICASSP.2017.7953094
[15]  
Kaldi, DIH 2018 V2
[16]   Joint factor analysis versus eigenchannels in speaker recognition [J].
Kenny, Patrick ;
Boulianne, Gilles ;
Ouellet, Pierre ;
Dumouchel, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04) :1435-1447
[17]   VoxCeleb: a large-scale speaker identification dataset [J].
Nagrani, Arsha ;
Chung, Joon Son ;
Zisserman, Andrew .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2616-2620
[18]   Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction [J].
Nakatani, Tomohiro ;
Yoshioka, Takuya ;
Kinoshita, Keisuke ;
Miyoshi, Masato ;
Juang, Biing-Hwang .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1717-1731
[19]  
Ryant N., 2018, TECHNICAL REPORT
[20]  
Ryant N., 2018, DIHARD CORPUS LINGUI