Shouted and whispered speech compensation for speaker verification systems

被引:3
|
作者
Prieto, Santi [1 ]
Ortega, Alfonso [2 ]
Lopez-Espejo, Ivan [3 ]
Lleida, Eduardo [2 ]
机构
[1] VeriDas Das Nano, Navarra, Spain
[2] Univ Zaragoza, Aragon Inst Engn Res I3A, ViVoLab, Zaragoza, Spain
[3] Aalborg Univ, Dept Elect Syst, Aalborg, Denmark
关键词
Speaker verification; Vocal effort mismatch; Shouted speech; Whispered speech; Domain compensation; Deep learning; VOCAL EFFORT; RECOGNITION; FEATURES; FUSION; ROBUST;
D O I
10.1016/j.dsp.2022.103536
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Nowadays, speaker verification systems begin to perform very well under normal speech conditions due to the plethora of neutrally-phonated speech data available, which are used to train such systems. Nevertheless, the use of vocal effort modes other than normal severely degrades performance because of vocal effort mismatch. In this paper, in which we consider whispered, normal and shouted speech production modes, we first study how vocal effort mismatch negatively affects speaker verification performance. Then, in order to mitigate this issue, we describe a series of techniques for score calibration and speaker embedding compensation relying on logistic regression-based vocal effort mode detection. To test the validity of all of these methodologies, speaker verification experiments using a modern x-vector-based speaker verification system are carried out. Experimental results show that we can achieve, when combining score calibration and embedding compensation relying upon vocal effort mode detection, up to 19% and 52% equal error rate (EER) relative improvements under the shouted-normal and whispered-normal scenarios, respectively, in comparison with a system applying neither calibration nor compensation. Compared to our previous work [1], we obtain a 7.3% relative improvement in terms of EER when adding score calibration in shouted-normal All vs. All condition. (C) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions
    Prieto, Santi
    Ortega, Alfonso
    Lopez-Espejo, Ivan
    Lleida, Eduardo
    INTERSPEECH 2020, 2020, : 1511 - 1515
  • [2] SPEAKER IDENTIFICATION FROM SHOUTED SPEECH: ANALYSIS AND COMPENSATION
    Hanilci, Cemal
    Kinnunen, Tomi
    Saeidi, Rahim
    Pohjalainen, Jouni
    Alku, Paavo
    Ertas, Figen
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8027 - 8031
  • [3] Analysis and Classification of Speech Mode: Whispered through Shouted
    Zhang, Chi
    Hansen, John H. L.
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2396 - 2399
  • [4] FORMANT-GAPS FEATURES FOR SPEAKER VERIFICATION USING WHISPERED SPEECH
    Naini, Abinay Reddy
    Rao, Achuth M., V
    Ghosh, Prasanta Kumar
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6231 - 6235
  • [5] Classification of Multi Speaker Shouted Speech and Single Speaker Normal Speech
    Baghel, Shikha
    Prasanna, S. R. Mahadeva
    Guha, Prithwijit
    TENCON 2017 - 2017 IEEE REGION 10 CONFERENCE, 2017, : 2388 - 2392
  • [6] Speaker Identification using Whispered Speech
    Jawarkar, Naresh P.
    Holambe, Raghunath S.
    Basu, Tapan Kumar
    2013 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT 2013), 2013, : 778 - 781
  • [7] Development of Chinese Whispered Database for Speaker Verification
    Gong Chenghui
    Zhao Heming
    Wang Yanlei
    Wang Min
    Yan Zongyue
    2009 ASIA PACIFIC CONFERENCE ON POSTGRADUATE RESEARCH IN MICROELECTRONICS AND ELECTRONICS (PRIMEASIA 2009), 2009, : 197 - +
  • [8] Speaker Identification with Whispered Speech mode Using MFCC: Challenges to Whispered Speech Identification
    Sardar, V. M.
    Shrbahadurkar, S. D.
    2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING (ICIP), 2015, : 70 - 74
  • [9] Fusion of bottleneck, spectral and modulation spectral features for improved speaker verification of neutral and whispered speech
    Sarria-Paja, Milton
    Falk, Tiago H.
    SPEECH COMMUNICATION, 2018, 102 : 78 - 86
  • [10] ACOUSTIC ANALYSIS FOR SPEAKER IDENTIFICATION OF WHISPERED SPEECH
    Fan, Xing
    Hansen, John H. L.
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5046 - 5049