Multi-Scale Kernels for Short Utterance Speaker Recognition

被引：0

作者：

Zhang, Wei-Qiang ^{[1
]}

Zhao, Junhong ^{[2
,3
]}

Zhang, Wen-Lin ^{[4
]}

Liu, Jia ^{[1
]}

机构：

[1] Tsinghua Univ, Tsinghua Natl Lab Informat Sci & Technol, Dept Elect Engn, Beijing 100084, Peoples R China

[2] Chinese Acad Sci, Inst Elect, State Key Lab Transducer Technol, Beijing 100190, Peoples R China

[3] Univ Chinese Acad Sci, Beijing 100190, Peoples R China

[4] Zhengzhou Informat Sci & Technol Inst, Zhengzhou 450002, Peoples R China

来源：

2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2014年

基金：

中国国家自然科学基金;

关键词：

speaker recognition; short utterance; multi-scale kernel;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Short utterance is a great challenge for speaker recognition, for there is very limited data can be used for training and testing. To give a robust estimation, the amount of model parameters for the short utterance should be less than that for the long utterance; however, this may impede the models descriptive capability. In this paper, we propose a multi-scale kernel (MSK) approach to solve this problem. We construct a series of kernels with different scales, and combine them through multiple kernel learning (MKL) optimization. In this way, the robustness and scalability of the model will be both enhanced. The experimental results on NIST SRE 2010 10sec-10sec dataset show that the proposed MSK method outperforms the traditional Gaussian mixture model supervector (GSV) followed by support vector machine (SVM) method.

引用

页码：414 / +

页数：2

共 50 条

[31] Scale-invariant MFCCs for speech/speaker recognition
Tufekci, Zekeriya
Disken, Gokay
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (05) : 3758 - 3762
[32] SPEAKER RECOGNITION FOR MULTI-SPEAKER CONVERSATIONS USING X-VECTORS
Snyder, David
Garcia-Romero, Daniel
Sell, Gregory
McCree, Alan
Povey, Daniel
Khudanpur, Sanjeev
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5796 - 5800
[33] I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification
Zhang, Jiacen
Inoue, Nakamasa
Shinoda, Koichi
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3613 - 3617
[34] I-Vector Extraction Using Speaker Relevancy for Short Duration Speaker Recognition
Kang, Woo Hyun
Cho, Won Ik
Jang, Se Young
Lee, Hyeon Seung
Kim, Nam Soo
IT CONVERGENCE AND SECURITY 2017, VOL 1, 2018, 449 : 79 - 87
[35] Speaker recognition system in multi-channel environment
Sang, LF
Wu, ZH
Yang, YC
2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 3116 - 3121
[36] A method of multi-models fusion for speaker recognition
Wu H.
Luo L.
Peng H.
Wen W.
International Journal of Speech Technology, 2022, 25 (2) : 493 - 498
[37] Adversarial Training for Multi-domain Speaker Recognition
Wang, Qing
Rao, Wei
Guo, Pengcheng
Xie, Lei
2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[38] Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework
Nirmalya Sen
Md Sahidullah
Hemant A. Patil
Shyamal Kumar Das Mandal
Krothapalli Sreenivasa Rao
Tapan Kumar Basu
International Journal of Speech Technology, 2021, 24 : 1067 - 1088
[39] Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework
Sen, Nirmalya
Sahidullah, Md
Patil, Hemant A.
Das Mandal, Shyamal Kumar
Rao, Krothapalli Sreenivasa
Basu, Tapan Kumar
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (04) : 1067 - 1088
[40] i-vector Based Speaker Recognition on Short Utterances
Kanagasundaram, Ahilan
Vogt, Robbie
Dean, David
Sridharan, Sridha
Mason, Michael
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2352 - +

← 1 2 3 4 5 →