Discriminative Scoring for Speaker Recognition Based on I-vectors

被引:0
|
作者
Wang, Jun [1 ]
Wang, Dong [1 ]
Zhu, Ziwei [1 ]
Zheng, Thomas Fang [1 ]
Soong, Frank [2 ]
机构
[1] Tsinghua Univ, Ctr Speaker & Language Technol CSLT, Beijing 100084, Peoples R China
[2] Microsoft Res Asia, Beijing 100084, Peoples R China
来源
2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2014年
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The popular i-vector approach to speaker recognition represents a speech segment as an i-vector in a low dimensional space. It is well known that i-vectors involve both speaker and session variances, and therefore additional discriminative approaches are required to extract speaker information from the 'total variance' space. Among various methods, the probabilistic linear discriminant analysis (PLDA) achieves state-of-the-art performance, partly due to its generative framework that represents speaker and session variances in a hierarchical way. A disadvantage of PLDA, however, lies in its Gaussian assumption of the prior/conditional distributions on the speaker and session variables, which is not necessarily true in reality. This paper presents a discriminative scoring approach which models i-vector pairs using a neural network (NN) so that the posterior probability that an i-vector pair belongs to the same person is read off from the NN output directly. This discriminative approach does not rely on any artificial assumptions on data distributions and can learn speaker-related information with sufficient accuracy provided that the network is large enough and the training data are abundant. Our experiments on the NIST SRE08 interview speech data demonstrated that the NN based approach outperforms PLDA in the core test condition, and combining the NN and PLDA scores leads to further gains.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Auto-Encoding Nearest Neighbor i-vectors for Speaker Verification
    Khan, Umair
    India, Miquel
    Hernando, Javier
    INTERSPEECH 2019, 2019, : 4060 - 4064
  • [32] Speaker Verification using Sparse Representations on Total Variability I-Vectors
    Li, Ming
    Zhang, Xiang
    Yan, Yonghong
    Narayanan, Shrikanth
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2740 - +
  • [33] Native Accent Classification via I-Vectors and Speaker Compensation Fusion
    DeMarco, Andrea
    Cox, Stephen J.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1471 - 1475
  • [34] Multitaper MFCC and PLP features for speaker verification using i-vectors
    Alam, Md Jahangir
    Kinnunen, Tomi
    Kenny, Patrick
    Ouellet, Pierre
    O'Shaughnessy, Douglas
    SPEECH COMMUNICATION, 2013, 55 (02) : 237 - 251
  • [35] Introducing I-Vectors for Joint Anti-spoofing and Speaker Verification
    Khoury, Elie
    Kinnunen, Tomi
    Sizov, Aleksandr
    Wu, Zhizheng
    Marcel, Sebastien
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 61 - 65
  • [36] Speaker Adaptation of Neural Network Acoustic Models Using I-Vectors
    Saon, George
    Soltau, Hagen
    Nahamoo, David
    Picheny, Michael
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 55 - 59
  • [37] Incorporation of discriminative n-grams to improve a phonotactic language recognizer based on i-vectors
    Salamea Palaciosi, Christian
    Fernando D'Haro, Luis
    Cordoba, Ricardo
    Angel Caraballo, Miguel
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2013, (51): : 145 - 152
  • [38] SPEAKER AGE ESTIMATION ON CONVERSATIONAL TELEPHONE SPEECH USING SENONE POSTERIOR BASED I-VECTORS
    Sadjadi, Seyed Omid
    Ganapathy, Sriram
    Pelecanos, Jason W.
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5040 - 5044
  • [39] Source-Normalized LDA for Robust Speaker Recognition Using i-Vectors From Multiple Speech Sources
    McLaren, Mitchell
    van Leeuwen, David
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03): : 755 - 766
  • [40] Real-time Speaker Recognition System using Multi-stream i-vectors for AI Assistant
    Cho, Keunseok
    Roh, Jaeyoung
    Han, Youngho
    Kim, Namhoon
    Lee, Jaewon
    2018 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2018,