Discriminative Scoring for Speaker Recognition Based on I-vectors

被引:0
|
作者
Wang, Jun [1 ]
Wang, Dong [1 ]
Zhu, Ziwei [1 ]
Zheng, Thomas Fang [1 ]
Soong, Frank [2 ]
机构
[1] Tsinghua Univ, Ctr Speaker & Language Technol CSLT, Beijing 100084, Peoples R China
[2] Microsoft Res Asia, Beijing 100084, Peoples R China
来源
2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2014年
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The popular i-vector approach to speaker recognition represents a speech segment as an i-vector in a low dimensional space. It is well known that i-vectors involve both speaker and session variances, and therefore additional discriminative approaches are required to extract speaker information from the 'total variance' space. Among various methods, the probabilistic linear discriminant analysis (PLDA) achieves state-of-the-art performance, partly due to its generative framework that represents speaker and session variances in a hierarchical way. A disadvantage of PLDA, however, lies in its Gaussian assumption of the prior/conditional distributions on the speaker and session variables, which is not necessarily true in reality. This paper presents a discriminative scoring approach which models i-vector pairs using a neural network (NN) so that the posterior probability that an i-vector pair belongs to the same person is read off from the NN output directly. This discriminative approach does not rely on any artificial assumptions on data distributions and can learn speaker-related information with sufficient accuracy provided that the network is large enough and the training data are abundant. Our experiments on the NIST SRE08 interview speech data demonstrated that the NN based approach outperforms PLDA in the core test condition, and combining the NN and PLDA scores leads to further gains.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] DISCRIMINATIVELY TRAINED BAYESIAN SPEAKER COMPARISON OF I-VECTORS
    Borgstroem, Bengt J.
    McCree, Alan
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7659 - 7662
  • [22] Evaluation of the Standard i-Vectors Based Speaker Verification Systems on Limited Data
    Curelaru, Florin
    2018 12TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS (COMM), 2018, : 101 - 106
  • [23] SPEAKER DIARIZATION OF BROADCAST STREAMS USING TWO-STAGE CLUSTERING BASED ON I-VECTORS AND COSINE DISTANCE SCORING
    Silovsky, Jan
    Prazak, Jan
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4193 - 4196
  • [24] Privacy-preserving speaker verification system based on binary I-vectors
    Mtibaa, Aymen
    Petrovska-Delacretaz, Dijana
    Boudy, Jerome
    Ben Hamida, Ahmed
    IET BIOMETRICS, 2021, 10 (03) : 233 - 245
  • [25] An Investigation of Non-linear i-vectors for speaker verification
    Chen, Nanxin
    Villalba, Jesus
    Dehak, Najim
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 87 - 91
  • [26] Speaker Diarization with I-Vectors from DNN Senone Posteriors
    Sell, Gregory
    Garcia-Romero, Daniel
    McCree, Alan
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3096 - 3099
  • [27] Duration compensation of i-vectors for short duration speaker verification
    Ma, Jianbo
    Sethu, Vidhyasaharan
    Ambikairajah, Eliathamby
    Lee, Kong Aik
    ELECTRONICS LETTERS, 2017, 53 (06) : 405 - 407
  • [28] Autonomous selection of i-vectors for PLDA modelling in speaker verification
    Biswas, Sangeeta
    Rohdin, Johan
    Shinoda, Koichi
    SPEECH COMMUNICATION, 2015, 72 : 32 - 46
  • [29] Exemplar-Based Sparse Representation for Language Recognition on I-Vectors
    Jiang, Bing
    Song, Yan
    Guo, Wu
    Dai, LiRong
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2055 - 2058
  • [30] Development of Speaker Recognizer Using I-vectors in Two Programming Environments
    Jakubec, Maros
    Lieskovska, Eva
    Jarina, Roman
    PROCEEDINGS OF THE 2020 CONFERENCE ON NEW TRENDS IN SIGNAL PROCESSING (NTSP), 2020, : 34 - 38