Convolutional Neural Network in the Task of Speaker Change Detection

被引:9
作者
Hruz, Marek [1 ]
Kunesova, Marie [1 ,2 ]
机构
[1] Univ West Bohemia Pilsen, NTIS New Technol Informat Soc, Fac Sci Appl, Univ 8, Plzen 30614, Czech Republic
[2] Univ West Bohemia Pilsen, Fac Sci Appl, Dept Cybernet, Univ 8, Plzen 30614, Czech Republic
来源
SPEECH AND COMPUTER | 2016年 / 9811卷
关键词
Convolutional neural network; Speaker change detection; Spectrogram;
D O I
10.1007/978-3-319-43958-7_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an approach to detect speaker changes in telephone conversations. The speaker change problem is presented as a classification problem. We use a Convolutional Neural Network to analyze short audio segments. The Network plays a role of a regressor. It outputs higher values for segments that are more likely to contain a speaker change. Upon thresholding the regressed value the decision about the segment is made. The experiment shows that the Convolutional Neural Network outperforms a baseline system based on the Bayesian Information Criterion. It behaves very well on previously unseen data produced by previously unheard speakers.
引用
收藏
页码:191 / 198
页数:8
相关论文
共 17 条
[1]  
Abdel-Hamid O, 2012, INT CONF ACOUST SPEE, P4277, DOI 10.1109/ICASSP.2012.6288864
[2]  
[Anonymous], 2013, INT C MACHINE LEARNI
[3]  
[Anonymous], ISCA TUT RES WORKSH
[4]  
[Anonymous], 2014, ACM T GRAPHIC, DOI DOI 10.1145/2629500
[5]  
[Anonymous], 1997, CALLHOME american english speech
[6]  
[Anonymous], 1983, SOV MATH DOKL
[7]  
[Anonymous], 2014, Odyssey
[8]  
[Anonymous], 1998, Proc. DARPA Broadcast News Transcription and Understanding Workshop
[9]  
[Anonymous], P INT
[10]  
[Anonymous], 1997, P EUR 97 RHOD GREEC