Discriminative Scoring for Speaker Recognition Based on I-vectors

被引:0
|
作者
Wang, Jun [1 ]
Wang, Dong [1 ]
Zhu, Ziwei [1 ]
Zheng, Thomas Fang [1 ]
Soong, Frank [2 ]
机构
[1] Tsinghua Univ, Ctr Speaker & Language Technol CSLT, Beijing 100084, Peoples R China
[2] Microsoft Res Asia, Beijing 100084, Peoples R China
来源
2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2014年
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The popular i-vector approach to speaker recognition represents a speech segment as an i-vector in a low dimensional space. It is well known that i-vectors involve both speaker and session variances, and therefore additional discriminative approaches are required to extract speaker information from the 'total variance' space. Among various methods, the probabilistic linear discriminant analysis (PLDA) achieves state-of-the-art performance, partly due to its generative framework that represents speaker and session variances in a hierarchical way. A disadvantage of PLDA, however, lies in its Gaussian assumption of the prior/conditional distributions on the speaker and session variables, which is not necessarily true in reality. This paper presents a discriminative scoring approach which models i-vector pairs using a neural network (NN) so that the posterior probability that an i-vector pair belongs to the same person is read off from the NN output directly. This discriminative approach does not rely on any artificial assumptions on data distributions and can learn speaker-related information with sufficient accuracy provided that the network is large enough and the training data are abundant. Our experiments on the NIST SRE08 interview speech data demonstrated that the NN based approach outperforms PLDA in the core test condition, and combining the NN and PLDA scores leads to further gains.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] I-VECTORS IN THE CONTEXT OF PHONETICALLY-CONSTRAINED SHORT UTTERANCES FOR SPEAKER VERIFICATION
    Larcher, Anthony
    Bousquet, Pierre-Michel
    Lee, Kong Aik
    Matrouf, Driss
    Li, Haizhou
    Bonastre, Jean-Francois
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4773 - 4776
  • [42] I-vectors and ILP clustering adapted to cross-show speaker diarization
    Dupuy, Gregor
    Rouvier, Mickael
    Meignier, Sylvain
    Esteve, Yannick
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2171 - 2174
  • [43] TIMBRAL MODELING FOR MUSIC ARTIST RECOGNITION USING I-VECTORS
    Eghbal-zadeh, Hamid
    Schedl, Markus
    Widmer, Gerhard
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 1286 - 1290
  • [44] Regional Accents Recognition based on i-vectors approach: The Case of the Algerian linguistic environment
    Djellab, Mourad
    Amrouche, Abderrahmane
    Mehallegue, Noureddine
    Bouridane, Ahmed
    2015 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 166 - U400
  • [45] I-VECTORS IN THE CONTEXT OF PHONETICALLY-CONSTRAINED SHORT UTTERANCES FOR SPEAKER VERIFICATION
    Larcher, Anthony
    Bousquet, Pierre-Michel
    Lee, Kong Aik
    Matrouf, Driss
    Li, Haizhou
    Bonastre, Jean-Francois
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4773 - 4776
  • [46] Co-whitening of i-vectors for short and long duration speaker verification
    Xu, Longting
    Lee, Kong Aik
    Li, Haizhou
    Yang, Zhen
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1066 - 1070
  • [47] Text-dependent speaker verification based on i-vectors, Neural Networks and Hidden Markov Models
    Zeinali, Hossein
    Sameti, Hossein
    Burget, Lukas
    Cernocky, Jan Honza
    COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 53 - 71
  • [48] Probabilistic approach using joint long and short session i-vectors modeling to deal with short utterances for speaker recognition
    Ben Kheder, Waad
    Matrouf, Driss
    Ajili, Moez
    Bonastre, Jean-Francois
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1830 - 1834
  • [49] HANDLING I-VECTORS FROM DIFFERENT RECORDING CONDITIONS USING MULTI-CHANNEL SIMPLIFIED PLDA IN SPEAKER RECOGNITION
    Villalba, Jesus
    Lleida, Eduardo
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6763 - 6767
  • [50] I-vectors for image classification
    Smith, David C.
    APPLICATIONS OF DIGITAL IMAGE PROCESSING XXXVII, 2014, 9217