Unleashing the Unused Potential of I-Vectors Enabled by GPU Acceleration

被引:1
作者
Vestman, Ville [1 ,2 ]
Lee, Kong Aik [1 ]
Kinnunen, Tomi H. [2 ]
Koshinaka, Takafumi [1 ]
机构
[1] NEC Corp Ltd, Biometr Res Labs, Tokyo, Japan
[2] Univ Eastern Finland, Computat Speech Grp, Kuopio, Finland
来源
INTERSPEECH 2019 | 2019年
基金
芬兰科学院;
关键词
speaker recognition; PyTorch; factor analysis; total variability model;
D O I
10.21437/Interspeech.2019-1955
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speaker embeddings are continuous-value vector representations that allow easy comparison between voices of speakers with simple geometric operations. Among others, i-vector and x-vector have emerged as the mainstream methods for speaker embedding. In this paper, we illustrate the use of modern computation platform to harness the benefit of GPU acceleration for i-vector extraction. In particular, we achieve an acceleration of 3000 times in frame posterior computation compared to real time and 25 times in training the i-vector extractor compared to the CPU baseline from Kaldi toolkit. This significant speed-up allows the exploration of ideas that were hitherto impossible. In particular, we show that it is beneficial to update the universal background model (UBM) and re-compute frame alignments while training the i-vector extractor. Additionally, we are able to study different variations of i-vector extractors more rigorously than before. In this process, we reveal some undocumented details of Kaldi's i-vector extractor and show that it outperforms the standard formulation by a margin of 1 to 2% when tested with VoxCeleb speaker verification protocol. All of our findings are asserted by ensemble averaging the results from multiple runs with random start.
引用
收藏
页码:351 / 355
页数:5
相关论文
共 28 条
  • [1] Can D., 2018, 2018 IEEE INT C AC
  • [2] Chung JS, 2018, INTERSPEECH, P1086
  • [3] Factorized Sub-Space Estimation for Fast and Memory Effective I-vector Extraction
    Cumani, Sandro
    Laface, Pietro
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (01) : 248 - 259
  • [4] Dehak N., 2009, Ecole de Technologie Superieure (Canada)
  • [5] Front-End Factor Analysis for Speaker Verification
    Dehak, Najim
    Kenny, Patrick J.
    Dehak, Reda
    Dumouchel, Pierre
    Ouellet, Pierre
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798
  • [6] Garcia-Romero D, 2011, 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, P256
  • [7] Glembek O, 2011, INT CONF ACOUST SPEE, P4516
  • [8] Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
  • [9] UNITARY TRIANGULARIZATION OF A NONSYMMETRIC MATRIX
    HOUSEHOLDER, AS
    [J]. JOURNAL OF THE ACM, 1958, 5 (04) : 339 - 342
  • [10] Jones Eric, 2001, SciPy: Open source scientific tools for Python