Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition

被引:0
作者
Li, Lantian [1 ]
Lin, Yiye
Zhang, Zhiyong
Wang, Dong
机构
[1] Tsinghua Univ, Ctr Speech & Language Technol, Div Tech Innovat & Dev, Tsinghua Natl Lab Informat Sci & Technol, Beijing, Peoples R China
来源
2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2015年
关键词
d-vector; time dynamic warping; speaker recognition; VERIFICATION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A deep learning approach has been proposed recently to derive speaker identifies (d-vector) by a deep neural network (DNN). This approach has been applied to text-dependent speaker recognition tasks and shows reasonable performance gains when combined with the conventional i-vector approach. Although promising, the existing d-vector implementation still can not compete with the i-vector baseline. This paper presents two improvements for the deep learning approach: a phone-dependent DNN structure to normalize phone variation, and a new scoring approach based on dynamic time warping (DTW). Experiments on a text-dependent speaker recognition task demonstrated that the proposed methods can provide considerable performance improvement over the existing d-vector implementation.
引用
收藏
页码:426 / 429
页数:4
相关论文
共 11 条
[1]  
[Anonymous], IEEE INT C AC SPEECH
[2]  
[Anonymous], P KDD WORKSH SEATTL
[3]  
[Anonymous], 2014, ODYSSEY
[4]   Support vector machines using GMM supervectors for speaker verification [J].
Campbell, WM ;
Sturim, DE ;
Reynolds, DA .
IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) :308-311
[5]  
Ioffe S, 2006, LECT NOTES COMPUT SC, V3954, P531
[6]   Joint factor analysis versus eigenchannels in speaker recognition [J].
Kenny, Patrick ;
Boulianne, Gilles ;
Ouellet, Pierre ;
Dumouchel, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04) :1435-1447
[7]   Speaker and session variability in GMM-based speaker verification [J].
Kenny, Patrick ;
Boulianne, Gilles ;
Ouellet, Pierre ;
Dumouchel, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04) :1448-1460
[8]   An overview of text-independent speaker recognition: From features to supervectors [J].
Kinnunen, Tomi ;
Li, Haizhou .
SPEECH COMMUNICATION, 2010, 52 (01) :12-40
[9]  
Li JY, 2012, IEEE W SP LANG TECH, P131, DOI 10.1109/SLT.2012.6424210
[10]   Speaker verification using adapted Gaussian mixture models [J].
Reynolds, DA ;
Quatieri, TF ;
Dunn, RB .
DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :19-41