Lightweight speaker verification with integrated VAD and speech enhancement

被引:0
|
作者
Hoang, Kiet Anh [1 ,2 ]
Le, Tung [1 ,2 ]
Nguyen, Huy Tien [1 ,2 ]
机构
[1] Univ Sci Ho Chi Minh, Fac Informat Technol, Ho Chi Minh City, Vietnam
[2] Vietnam Natl Univ, Ho Chi Minh City, Vietnam
关键词
Lightweight speaker verification; Speech enhancement; Voice activity detection; Mobile devices; Unified model; ROBUST;
D O I
10.1016/j.dsp.2024.104969
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Reducing noise and non-speech segments that degrade speaker verification (SV) performance requires voice activity detection (VAD) and speech enhancement (SE) processing. However, this signal preprocessing introduces additional latency, especially on low-resource mobile devices with limited computational capacity. To address this limitation, we propose a novel, lightweight, unified VAD and masking-based SE module (VAD-SE module) that enhances resistance to acoustic distortions with minimal computational overhead. By integrating the VAD-SE module into a MobileNetV2-based SV model through a feature fusion module, we achieve an end-to-end model trained comprehensively. Experimental results on the ZaloAI and VinBigData datasets indicate that our model consistently outperforms conventional SV systems across various adverse conditions, achieving approximately 10% better accuracy even in noisy and silent environments, with only a 3.5% increase in parameters. Notably, when deployed on mobile devices, the model achieves superior SV accuracy with a minimal latency increase of approximately 13 milliseconds, making it highly suitable for real-time applications.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] LOG SPECTRA ENHANCEMENT USING SPEAKER DEPENDENT PRIORS FOR SPEAKER VERIFICATION
    Maina, Ciira Wa
    Walsh, John MacLaren
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4540 - 4543
  • [32] Structure of pauses in speech in the context of speaker verification and classification of speech type
    Igras-Cybulska, Magdalena
    Ziolko, Bartosz
    Zelasko, Piotr
    Witkowski, Marcin
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2016,
  • [33] Structure of pauses in speech in the context of speaker verification and classification of speech type
    Magdalena Igras-Cybulska
    Bartosz Ziółko
    Piotr Żelasko
    Marcin Witkowski
    EURASIP Journal on Audio, Speech, and Music Processing, 2016
  • [34] Cross-Corpora Convolutional Deep Neural Network Dereverberation Preprocessing for Speaker Verification and Speech Enhancement
    Guzewich, Peter
    Zahorian, Stephen
    Chen, Xiao
    Zhang, Hao
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1329 - 1333
  • [35] A Lightweight CNN-Conformer Model for Automatic Speaker Verification
    Wang, Hao
    Lin, Xiaobing
    Zhang, Jiashu
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 56 - 60
  • [36] EDITnet: A Lightweight Network for Unsupervised Domain Adaptation in Speaker Verification
    Li, Jingyu
    Liu, Wei
    Lee, Tan
    INTERSPEECH 2022, 2022, : 3694 - 3698
  • [37] Speech Enhancement for Multimodal Speaker Diarization System
    Ahmad, Rehan
    Zubair, Syed
    Alquhayz, Hani
    IEEE ACCESS, 2020, 8 : 126671 - 126680
  • [38] First Investigation of Universal Speech Attributes for Speaker Verification
    Zhang, Sheng
    Guo, Wu
    Hu, Guoping
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [39] Making Confident Speaker Verification Decisions With Minimal Speech
    Vogt, Robert
    Sridharan, Sridha
    Mason, Michael
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1182 - 1192
  • [40] Relative Significance of Speech Sounds in Speaker Verification Systems
    Rafi, B. Shaik Mohammad
    Sankala, Sreekanth
    Murty, K. Sri Rama
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (09) : 5412 - 5427