Discriminative Scoring for Speaker Recognition Based on I-vectors

被引：0

作者：

Wang, Jun ^{[1
]}

Wang, Dong ^{[1
]}

Zhu, Ziwei ^{[1
]}

Zheng, Thomas Fang ^{[1
]}

Soong, Frank ^{[2
]}

机构：

[1] Tsinghua Univ, Ctr Speaker & Language Technol CSLT, Beijing 100084, Peoples R China

[2] Microsoft Res Asia, Beijing 100084, Peoples R China

来源：

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2014年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The popular i-vector approach to speaker recognition represents a speech segment as an i-vector in a low dimensional space. It is well known that i-vectors involve both speaker and session variances, and therefore additional discriminative approaches are required to extract speaker information from the 'total variance' space. Among various methods, the probabilistic linear discriminant analysis (PLDA) achieves state-of-the-art performance, partly due to its generative framework that represents speaker and session variances in a hierarchical way. A disadvantage of PLDA, however, lies in its Gaussian assumption of the prior/conditional distributions on the speaker and session variables, which is not necessarily true in reality. This paper presents a discriminative scoring approach which models i-vector pairs using a neural network (NN) so that the posterior probability that an i-vector pair belongs to the same person is read off from the NN output directly. This discriminative approach does not rely on any artificial assumptions on data distributions and can learn speaker-related information with sufficient accuracy provided that the network is large enough and the training data are abundant. Our experiments on the NIST SRE08 interview speech data demonstrated that the NN based approach outperforms PLDA in the core test condition, and combining the NN and PLDA scores leads to further gains.

引用

页数：5

共 50 条

[31] Auto-Encoding Nearest Neighbor i-vectors for Speaker Verification
Khan, Umair
India, Miquel
Hernando, Javier
INTERSPEECH 2019, 2019, : 4060 - 4064
[32] Speaker Verification using Sparse Representations on Total Variability I-Vectors
Li, Ming
Zhang, Xiang
Yan, Yonghong
Narayanan, Shrikanth
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2740 - +
[33] Native Accent Classification via I-Vectors and Speaker Compensation Fusion
DeMarco, Andrea
Cox, Stephen J.
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1471 - 1475
[34] Multitaper MFCC and PLP features for speaker verification using i-vectors
Alam, Md Jahangir
Kinnunen, Tomi
Kenny, Patrick
Ouellet, Pierre
O'Shaughnessy, Douglas
SPEECH COMMUNICATION, 2013, 55 (02) : 237 - 251
[35] Introducing I-Vectors for Joint Anti-spoofing and Speaker Verification
Khoury, Elie
Kinnunen, Tomi
Sizov, Aleksandr
Wu, Zhizheng
Marcel, Sebastien
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 61 - 65
[36] Speaker Adaptation of Neural Network Acoustic Models Using I-Vectors
Saon, George
Soltau, Hagen
Nahamoo, David
Picheny, Michael
2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 55 - 59
[37] Incorporation of discriminative n-grams to improve a phonotactic language recognizer based on i-vectors
Salamea Palaciosi, Christian
Fernando D'Haro, Luis
Cordoba, Ricardo
Angel Caraballo, Miguel
PROCESAMIENTO DEL LENGUAJE NATURAL, 2013, (51): : 145 - 152
[38] SPEAKER AGE ESTIMATION ON CONVERSATIONAL TELEPHONE SPEECH USING SENONE POSTERIOR BASED I-VECTORS
Sadjadi, Seyed Omid
Ganapathy, Sriram
Pelecanos, Jason W.
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5040 - 5044
[39] Source-Normalized LDA for Robust Speaker Recognition Using i-Vectors From Multiple Speech Sources
McLaren, Mitchell
van Leeuwen, David
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03): : 755 - 766
[40] Real-time Speaker Recognition System using Multi-stream i-vectors for AI Assistant
Cho, Keunseok
Roh, Jaeyoung
Han, Youngho
Kim, Namhoon
Lee, Jaewon
2018 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2018,

← 1 2 3 4 5 →