Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition

被引：0

作者：

Li, Lantian ^{[1
]}

Lin, Yiye

Zhang, Zhiyong

Wang, Dong

机构：

[1] Tsinghua Univ, Ctr Speech & Language Technol, Div Tech Innovat & Dev, Tsinghua Natl Lab Informat Sci & Technol, Beijing, Peoples R China

来源：

2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2015年

关键词：

d-vector; time dynamic warping; speaker recognition; VERIFICATION;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

A deep learning approach has been proposed recently to derive speaker identifies (d-vector) by a deep neural network (DNN). This approach has been applied to text-dependent speaker recognition tasks and shows reasonable performance gains when combined with the conventional i-vector approach. Although promising, the existing d-vector implementation still can not compete with the i-vector baseline. This paper presents two improvements for the deep learning approach: a phone-dependent DNN structure to normalize phone variation, and a new scoring approach based on dynamic time warping (DTW). Experiments on a text-dependent speaker recognition task demonstrated that the proposed methods can provide considerable performance improvement over the existing d-vector implementation.

引用

页码：426 / 429

页数：4

共 11 条

[1]

[Anonymous], IEEE INT C AC SPEECH

[2]

[Anonymous], P KDD WORKSH SEATTL

[3]

[Anonymous], 2014, ODYSSEY

[4] Support vector machines using GMM supervectors for speaker verification [J].

Campbell, WM ;

Sturim, DE ;

Reynolds, DA .

IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) :308-311

[5]

Ioffe S, 2006, LECT NOTES COMPUT SC, V3954, P531

[6] Joint factor analysis versus eigenchannels in speaker recognition [J].

Kenny, Patrick ;

Boulianne, Gilles ;

Ouellet, Pierre ;

Dumouchel, Pierre .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04) :1435-1447

[7] Speaker and session variability in GMM-based speaker verification [J].

Kenny, Patrick ;

Boulianne, Gilles ;

Ouellet, Pierre ;

Dumouchel, Pierre .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04) :1448-1460

[8] An overview of text-independent speaker recognition: From features to supervectors [J].

Kinnunen, Tomi ;

Li, Haizhou .

SPEECH COMMUNICATION, 2010, 52 (01) :12-40

[9]

Li JY, 2012, IEEE W SP LANG TECH, P131, DOI 10.1109/SLT.2012.6424210

[10] Speaker verification using adapted Gaussian mixture models [J].

Reynolds, DA ;

Quatieri, TF ;

Dunn, RB .

DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :19-41

← 1 2 →