Discriminative Scoring for Speaker Recognition Based on I-vectors

被引:0
|
作者
Wang, Jun [1 ]
Wang, Dong [1 ]
Zhu, Ziwei [1 ]
Zheng, Thomas Fang [1 ]
Soong, Frank [2 ]
机构
[1] Tsinghua Univ, Ctr Speaker & Language Technol CSLT, Beijing 100084, Peoples R China
[2] Microsoft Res Asia, Beijing 100084, Peoples R China
来源
2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2014年
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The popular i-vector approach to speaker recognition represents a speech segment as an i-vector in a low dimensional space. It is well known that i-vectors involve both speaker and session variances, and therefore additional discriminative approaches are required to extract speaker information from the 'total variance' space. Among various methods, the probabilistic linear discriminant analysis (PLDA) achieves state-of-the-art performance, partly due to its generative framework that represents speaker and session variances in a hierarchical way. A disadvantage of PLDA, however, lies in its Gaussian assumption of the prior/conditional distributions on the speaker and session variables, which is not necessarily true in reality. This paper presents a discriminative scoring approach which models i-vector pairs using a neural network (NN) so that the posterior probability that an i-vector pair belongs to the same person is read off from the NN output directly. This discriminative approach does not rely on any artificial assumptions on data distributions and can learn speaker-related information with sufficient accuracy provided that the network is large enough and the training data are abundant. Our experiments on the NIST SRE08 interview speech data demonstrated that the NN based approach outperforms PLDA in the core test condition, and combining the NN and PLDA scores leads to further gains.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Intersession compensation and scoring methods in the i-vectors space for speaker recognition
    Bousquet, Pierre-Michel
    Matrouf, Driss
    Bonastre, Jean-Francois
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 492 - 495
  • [2] ROBUST SPEAKER RECOGNITION BASED ON DNN/I-VECTORS AND SPEECH SEPARATION
    Chang, Jorge
    Wang, DeLiang
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5415 - 5419
  • [3] Emotional Speaker Verification Based on I-vectors
    Mackova, Lenka
    Cizmar, Anton
    2014 5TH IEEE CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM), 2014, : 533 - 536
  • [4] Linguistically-constrained formant-based i-vectors for automatic speaker recognition
    Franco-Pedroso, Javier
    Gonzalez-Rodriguez, Joaquin
    SPEECH COMMUNICATION, 2016, 76 : 61 - 81
  • [5] Senone I-Vectors for Robust Speaker Verification
    Tan, Zhili
    Zhu, Yingke
    Mak, Man-Wai
    Mak, Brian Kan-Wing
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [6] Speaker age estimation using i-vectors
    Bahari, Mohamad Hasan
    McLaren, Mitchell
    Hugo Van Hamme
    van Leeuwen, David A.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2014, 34 : 99 - 108
  • [7] Robust Speaker Verification Using GFCC Based i-Vectors
    Jeevan, Medikonda
    Dhingra, Atul
    Hanmandlu, M.
    Panigrahi, B. K.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL, NETWORKS, COMPUTING, AND SYSTEMS (ICSNCS 2016), VOL 1, 2017, 395 : 85 - 91
  • [8] Speaker recognition in duration-mismatched condition using bootstrapped i-vectors
    Ando, Atsushi
    Asami, Taichi
    Yamaguchi, Yoshikazu
    Aono, Yushi
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [9] ON COMBINING I-VECTORS AND DISCRIMINATIVE ADAPTATION METHODS FOR UNSUPERVISED SPEAKER NORMALIZATION IN DNN ACOUSTIC MODELS
    Samarakoon, Lahiru
    Sim, Khe Chai
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5275 - 5279
  • [10] Speaker Recognition With Random Digit Strings Using Uncertainty Normalized HMM-Based i-Vectors
    Maghsoodi, Nooshin
    Sameti, Hossein
    Zeinal, Hossein
    Stafylakis, Themos
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1815 - 1825