Multi-Scale Kernels for Short Utterance Speaker Recognition

被引:0
|
作者
Zhang, Wei-Qiang [1 ]
Zhao, Junhong [2 ,3 ]
Zhang, Wen-Lin [4 ]
Liu, Jia [1 ]
机构
[1] Tsinghua Univ, Tsinghua Natl Lab Informat Sci & Technol, Dept Elect Engn, Beijing 100084, Peoples R China
[2] Chinese Acad Sci, Inst Elect, State Key Lab Transducer Technol, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Beijing 100190, Peoples R China
[4] Zhengzhou Informat Sci & Technol Inst, Zhengzhou 450002, Peoples R China
来源
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2014年
基金
中国国家自然科学基金;
关键词
speaker recognition; short utterance; multi-scale kernel;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Short utterance is a great challenge for speaker recognition, for there is very limited data can be used for training and testing. To give a robust estimation, the amount of model parameters for the short utterance should be less than that for the long utterance; however, this may impede the models descriptive capability. In this paper, we propose a multi-scale kernel (MSK) approach to solve this problem. We construct a series of kernels with different scales, and combine them through multiple kernel learning (MKL) optimization. In this way, the robustness and scalability of the model will be both enhanced. The experimental results on NIST SRE 2010 10sec-10sec dataset show that the proposed MSK method outperforms the traditional Gaussian mixture model supervector (GSV) followed by support vector machine (SVM) method.
引用
收藏
页码:414 / +
页数:2
相关论文
共 50 条
  • [41] Using Voice Quality Features to Improve Short-Utterance, Text-Independent Speaker Verification Systems
    Park, Soo Jin
    Yeung, Gary
    Kreiman, Jody
    Keating, Patricia A.
    Alwan, Abeer
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1522 - 1526
  • [42] HUMAN AND MACHINE SPEAKER RECOGNITION BASED ON SHORT TRIVIAL EVENTS
    Zhang, Miao
    Kang, Xiaofei
    Wang, Yanqing
    Li, Lantian
    Tang, Zhiyuan
    Dai, Haisheng
    Wang, Dong
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5009 - 5013
  • [43] Probabilistic approach using joint long and short session i-vectors modeling to deal with short utterances for speaker recognition
    Ben Kheder, Waad
    Matrouf, Driss
    Ajili, Moez
    Bonastre, Jean-Francois
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1830 - 1834
  • [44] A Study of X-vector Based Speaker Recognition on Short Utterances
    Kanagasundaram, A.
    Sridharan, S.
    Sriram, G.
    Prachi, S.
    Fookes, C.
    INTERSPEECH 2019, 2019, : 2943 - 2947
  • [45] CN-Celeb: Multi-genre speaker recognition
    Li, Lantian
    Liu, Ruiqi
    Kang, Jiawen
    Fan, Yue
    Cui, Hao
    Cai, Yunqi
    Vipperla, Ravichander
    Zheng, Thomas Fang
    Wang, Dong
    SPEECH COMMUNICATION, 2022, 137 : 77 - 91
  • [46] A multi-class MLLR kernel for SVM speaker recognition
    Karam, Zahi N.
    Campbell, William M.
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4117 - +
  • [47] Moving average multi directional local features for speaker recognition
    Mahmood, Awais
    Muhammad, Ghulam
    Alsulaiman, Mansour
    Dhahri, Habib
    Othman, Esam M. Asem
    Faisal, Mohammed
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 1): : 2145 - 2157
  • [48] Biomimetic multi-resolution analysis for robust speaker recognition
    Sridhar Krishna Nemala
    Dmitry N Zotkin
    Ramani Duraiswami
    Mounya Elhilali
    EURASIP Journal on Audio, Speech, and Music Processing, 2012
  • [49] Towards multi-task learning of speech and speaker recognition
    Vaessen, Nik
    van Leeuwen, David A.
    INTERSPEECH 2023, 2023, : 4898 - 4902
  • [50] Robust features for text-independent speaker recognition with short utterances
    Chakroun, Rania
    Frikha, Mondher
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (17) : 13863 - 13883