Speaker Recognition System Based on Identity Vector Using t-SNE Visualization and Mean-shift Algorithm

被引:2
作者
Kiani, Kourosh [1 ]
Baniasadi, Atefeh [1 ]
机构
[1] Semnan Univ, Elect & Comp Engn Dept, Semnan, Iran
来源
2019 5TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS 2019) | 2019年
关键词
speaker recognition; i-vector; t-SNE; Mean-shift clustering; MACHINES;
D O I
10.1109/icspis48872.2019.9066007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The process of manually labeling data is not affordable. Moreover, the lack of labeled data has led to a big performance gap between scoring baseline techniques in speaker recognition. This paper aims to propose two separate systems to fill this gap. The first system uses the t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm to represent the unlabeled development i-vectors into two space dimensions, then cluster them using the mean-shift algorithm. Finally, the Within-Class Covariance Normalization (WCCN) algorithm and test normalization technique are applied to remove unwanted variability from i-vectors. In the second system, zero normalization is also employed on the baseline system released by the NIST 2014 i-vector challenge dataset. The cosine similarity has been computed for scoring in the proposed methods. The evaluation results on the NIST 2014 i-vector challenge dataset show that the proposed methods achieve 23% and 8% relative improvement of the minimum detection cost function (minDCF) respectively. Moreover, we obtained 25% improvement by fusing these systems.
引用
收藏
页数:4
相关论文
共 20 条
  • [1] [Anonymous], 2014, ODYSSEY
  • [2] [Anonymous], 2014, NETWORK
  • [3] [Anonymous], 2014, ODYSSEY
  • [4] [Anonymous], 2006, 9 INT C SPOK LANG PR
  • [5] [Anonymous], 2014, P OD SPEAK LANG REC
  • [6] [Anonymous], 2006, 2006 IEEE INT C AC S
  • [7] Brummer N., 2011, DOCUMENTATION BOSARI, P24
  • [8] Support vector machines using GMM supervectors for speaker verification
    Campbell, WM
    Sturim, DE
    Reynolds, DA
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) : 308 - 311
  • [9] MEAN SHIFT, MODE SEEKING, AND CLUSTERING
    CHENG, YZ
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (08) : 790 - 799
  • [10] Front-End Factor Analysis for Speaker Verification
    Dehak, Najim
    Kenny, Patrick J.
    Dehak, Reda
    Dumouchel, Pierre
    Ouellet, Pierre
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798