Shouted and whispered speech compensation for speaker verification systems

被引:3
作者
Prieto, Santi [1 ]
Ortega, Alfonso [2 ]
Lopez-Espejo, Ivan [3 ]
Lleida, Eduardo [2 ]
机构
[1] VeriDas Das Nano, Navarra, Spain
[2] Univ Zaragoza, Aragon Inst Engn Res I3A, ViVoLab, Zaragoza, Spain
[3] Aalborg Univ, Dept Elect Syst, Aalborg, Denmark
关键词
Speaker verification; Vocal effort mismatch; Shouted speech; Whispered speech; Domain compensation; Deep learning; VOCAL EFFORT; RECOGNITION; FEATURES; FUSION; ROBUST;
D O I
10.1016/j.dsp.2022.103536
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Nowadays, speaker verification systems begin to perform very well under normal speech conditions due to the plethora of neutrally-phonated speech data available, which are used to train such systems. Nevertheless, the use of vocal effort modes other than normal severely degrades performance because of vocal effort mismatch. In this paper, in which we consider whispered, normal and shouted speech production modes, we first study how vocal effort mismatch negatively affects speaker verification performance. Then, in order to mitigate this issue, we describe a series of techniques for score calibration and speaker embedding compensation relying on logistic regression-based vocal effort mode detection. To test the validity of all of these methodologies, speaker verification experiments using a modern x-vector-based speaker verification system are carried out. Experimental results show that we can achieve, when combining score calibration and embedding compensation relying upon vocal effort mode detection, up to 19% and 52% equal error rate (EER) relative improvements under the shouted-normal and whispered-normal scenarios, respectively, in comparison with a system applying neither calibration nor compensation. Compared to our previous work [1], we obtain a 7.3% relative improvement in terms of EER when adding score calibration in shouted-normal All vs. All condition. (C) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] NORMAL-TO-SHOUTED SPEECH SPECTRAL MAPPING FOR SPEAKER RECOGNITION UNDER VOCAL EFFORT MISMATCH
    Lopez, Ana Ramirez
    Saeidi, Rahim
    Juvela, Lauri
    Alku, Paavo
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4940 - 4944
  • [22] Acoustic analysis and feature transformation from neutral to whisper for speaker identification within whispered speech audio streams
    Fan, Xing
    Hansen, John H. L.
    SPEECH COMMUNICATION, 2013, 55 (01) : 119 - 134
  • [23] Comparative analysys of speech parameters for the design of speaker verification systems
    Souza, AF
    Souza, MN
    PROCEEDINGS OF THE 23RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-4: BUILDING NEW BRIDGES AT THE FRONTIERS OF ENGINEERING AND MEDICINE, 2001, 23 : 2178 - 2181
  • [24] A STUDY OF SPEAKER VERIFICATION PERFORMANCE WITH EXPRESSIVE SPEECH
    Parthasarathy, Srinivas
    Zhang, Chunlei
    Hansen, John H. L.
    Busso, Carlos
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5540 - 5544
  • [25] Effect of High-Energy Voiced Speech Segments and Speaker Gender on Shouted Speech Detection
    Baghel, Shikha
    Prasanna, S. R. M.
    Guha, Prithwijit
    2021 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2021, : 53 - 58
  • [26] Speaker Identification with Whispered Speech Using Unvoiced-Consonant Phonemes
    Xu, Juan
    Zhao, Heming
    PROCEEDINGS OF 2012 INTERNATIONAL CONFERENCE ON IMAGE ANALYSIS AND SIGNAL PROCESSING, 2012, : 136 - 139
  • [27] Speaker Identification for Whispered Speech Using Modified Temporal Patterns and MFCCs
    Fan, Xing
    Hansen, John H. L.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 912 - 915
  • [28] ASSESSMENT OF AUTOMATIC SPEAKER VERIFICATION ON LOSSY TRANSCODED SPEECH
    Polacky, Jozef
    Jarina, Roman
    Chmulik, Michal
    2016 4TH INTERNATIONAL WORKSHOP ON BIOMETRICS AND FORENSICS (IWBF), 2016,
  • [29] AWLloss: Speaker Verification Based on the Quality and Difficulty of Speech
    Liu, Qian
    Zhang, Xia
    Liang, Xinyan
    Qian, Yuhua
    Yao, Shanshan
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1337 - 1341
  • [30] Minimizing the False Alarm Probability of Speaker Verification Systems for Mimicked Speech
    George, Kuruvachan K.
    Kumar, C. Santhosh
    Pandat, Ashish
    Ramachandran, K. I.
    Das, K. Arun
    Veni, S.
    2015 INTERNATIONAL CONFERENCE ON COMPUTING AND NETWORK COMMUNICATIONS (COCONET), 2015, : 703 - 709