DNN Senone MAP Multinomial i-vectors for Phonotactic Language Recognition

被引:0
作者
McCree, Alan [1 ]
Garcia-Romero, Daniel [1 ]
机构
[1] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
来源
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年
关键词
language recognition; i-vector; phonotactic; DNN;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks have recently shown great promise for language recognition. In particular, the expected counts of clustered context-dependent phone states (senones) can serve as a simple but effective phonotactic system. This paper introduces multinomial i-vectors applied to senone counts and shows that they work better than current PCA approaches. In addition, we show that a new approach using a standard normal prior and MAP multinomial i-vector estimation further improves performance, particularly for shorter test durations. Finally, we present a reduced-complexity version of Newton's method to greatly accelerate multinomial i-vector extraction. Experimental results on the NIST LRE11 task show that this approach performs significantly better than top-performing acoustic and phonotactic systems from that evaluation.
引用
收藏
页码:394 / 397
页数:4
相关论文
共 14 条
  • [1] [Anonymous], INTERSPEECH 2011
  • [2] [Anonymous], 2011, INTERSPEECH
  • [3] [Anonymous], 2011, NIST YEAR 2011 LANGU
  • [4] [Anonymous], [No title captured]
  • [5] [Anonymous], 2009, NIST YEAR 2009 LANGU
  • [6] Front-End Factor Analysis for Speaker Verification
    Dehak, Najim
    Kenny, Patrick J.
    Dehak, Reda
    Dumouchel, Pierre
    Ouellet, Pierre
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798
  • [7] Ferrer L., 2014, INTERSPEECH-2014, P2150
  • [8] Garcia-Romero D., 2014, P IEEE WORKSH SLT
  • [9] Kockmann M, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, P1061
  • [10] Lei Y, 2014, OD 2014 SPEAK LANG R, P287