DNN Senone MAP Multinomial i-vectors for Phonotactic Language Recognition

被引：0

作者：

McCree, Alan ^{[1
]}

Garcia-Romero, Daniel ^{[1
]}

机构：

[1] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

language recognition; i-vector; phonotactic; DNN;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep neural networks have recently shown great promise for language recognition. In particular, the expected counts of clustered context-dependent phone states (senones) can serve as a simple but effective phonotactic system. This paper introduces multinomial i-vectors applied to senone counts and shows that they work better than current PCA approaches. In addition, we show that a new approach using a standard normal prior and MAP multinomial i-vector estimation further improves performance, particularly for shorter test durations. Finally, we present a reduced-complexity version of Newton's method to greatly accelerate multinomial i-vector extraction. Experimental results on the NIST LRE11 task show that this approach performs significantly better than top-performing acoustic and phonotactic systems from that evaluation.

引用

页码：394 / 397

页数：4

共 14 条

[1] [Anonymous], INTERSPEECH 2011
[2] [Anonymous], 2011, INTERSPEECH
[3] [Anonymous], 2011, NIST YEAR 2011 LANGU
[4] [Anonymous], [No title captured]
[5] [Anonymous], 2009, NIST YEAR 2009 LANGU
[6] Front-End Factor Analysis for Speaker Verification
Dehak, Najim
Kenny, Patrick J.
Dehak, Reda
Dumouchel, Pierre
Ouellet, Pierre
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798
[7] Ferrer L., 2014, INTERSPEECH-2014, P2150
[8] Garcia-Romero D., 2014, P IEEE WORKSH SLT
[9] Kockmann M, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, P1061
[10] Lei Y, 2014, OD 2014 SPEAK LANG R, P287

← 1 2 →