HUMAN AND MACHINE SPEAKER RECOGNITION BASED ON SHORT TRIVIAL EVENTS

被引:0
作者
Zhang, Miao [1 ,2 ]
Kang, Xiaofei [1 ,3 ]
Wang, Yanqing [1 ,2 ]
Li, Lantian [1 ]
Tang, Zhiyuan [1 ]
Dai, Haisheng [4 ]
Wang, Dong [1 ]
机构
[1] Tsinghua Univ, Ctr Speech & Language Technol, Beijing, Peoples R China
[2] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
[3] Peking Univ, Beijing, Peoples R China
[4] JD AI Res, Beijing, Peoples R China
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
基金
中国国家自然科学基金;
关键词
speaker recognition; speech perception; deep neural network; speaker feature learning;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Human speech often has events that we will call trivial events, e.g., cough, laugh and sniff. Compared to regular speech, these trivial events are usually short and variable, thus generally regarded as not speaker discriminative and so are largely ignored by present speaker recognition research. However, these trivial events are highly valuable in some particular circumstances such as forensic examination, as they are less subjected to intentional change, so can be used to discover the genuine speaker from disguised speech. In this paper, we collect a trivial event speech database that involves 75 speakers and 6 types of events, and report preliminary speaker recognition results on this database, by both human listeners and machines. Particularly, the deep feature learning technique recently proposed by our group is utilized to analyze and recognize the trivial events, leading to acceptable equal error rates (EERs) ranging from 5% to 15% despite the extremely short durations (0.2-0.5 seconds) of these events. Comparing different types of events, 'hmm' seems more speaker discriminative.
引用
收藏
页码:5009 / 5013
页数:5
相关论文
共 50 条
  • [41] Short Utterance Speaker Recognition Based on Speech High Frequency Information Compensation and Dynamic Feature Enhancement Methods
    Zi, Yunfei
    Xiong, Shengwu
    ARCHIVES OF ACOUSTICS, 2024, 49 (01) : 37 - 48
  • [42] Comparison of Speaker Adaptation Methods as Feature Extraction for SVM-Based Speaker Recognition
    Ferras, Marc
    Leung, Cheung-Chi
    Barras, Claude
    Gauvain, Jean-Luc
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1366 - 1378
  • [43] Speaker-Specific Articulatory Feature Extraction Based on Knowledge Distillation for Speaker Recognition
    Hong, Qian-Bei
    Wu, Chung-Hsien
    Wang, Hsin-Min
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2023, 12 (02)
  • [44] VLSI Design for SC-Based Speaker Recognition
    Wang, Chien-Yao
    Shih, Min
    Tai, Tzu-Chiang
    Lin, Po-Chuan
    Huang, Shih-Ting
    Zhao, Jia-Hao
    Wang, Jia-Ching
    PROCEEDINGS OF THE 2015 10TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, 2015, : 335 - 338
  • [45] Speaker Recognition Based on MFCC and BP Neural Networks
    Wang, Yi
    Lawlor, Bob
    2017 28TH IRISH SIGNALS AND SYSTEMS CONFERENCE (ISSC), 2017,
  • [46] A robust feature based on sparse representation for speaker recognition
    Xie, Yining
    Huang, Jinjie
    Wang, Xinlei
    Journal of Computational Information Systems, 2013, 9 (09): : 3553 - 3561
  • [47] UBM-Based Sequence Kernel for Speaker Recognition
    Lei, Zhenchun
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1263 - 1266
  • [48] Variational Information Bottleneck based Regularization for Speaker Recognition
    Wang, Dan
    Dong, Yuanjie
    Li, Yaxing
    Zi, Yunfei
    Zhang, Zhihui
    Li, Xiaoqi
    Xiong, Shengwu
    INTERSPEECH 2021, 2021, : 1054 - 1058
  • [49] Speaker Recognition System Based on weighted feature parameter
    Zhu, Li
    Yang, Qing
    INTERNATIONAL CONFERENCE ON SOLID STATE DEVICES AND MATERIALS SCIENCE, 2012, 25 : 1515 - 1522
  • [50] Research and Implementation of Speaker Recognition Algorithm Based on FPGA
    Li Jinghong
    Tian Yanan
    Zhang Lijia
    PROCEEDINGS OF THE 2012 24TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2012, : 1155 - 1158