Lightweight speaker verification with integrated VAD and speech enhancement

被引:0
|
作者
Hoang, Kiet Anh [1 ,2 ]
Le, Tung [1 ,2 ]
Nguyen, Huy Tien [1 ,2 ]
机构
[1] Univ Sci Ho Chi Minh, Fac Informat Technol, Ho Chi Minh City, Vietnam
[2] Vietnam Natl Univ, Ho Chi Minh City, Vietnam
关键词
Lightweight speaker verification; Speech enhancement; Voice activity detection; Mobile devices; Unified model; ROBUST;
D O I
10.1016/j.dsp.2024.104969
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Reducing noise and non-speech segments that degrade speaker verification (SV) performance requires voice activity detection (VAD) and speech enhancement (SE) processing. However, this signal preprocessing introduces additional latency, especially on low-resource mobile devices with limited computational capacity. To address this limitation, we propose a novel, lightweight, unified VAD and masking-based SE module (VAD-SE module) that enhances resistance to acoustic distortions with minimal computational overhead. By integrating the VAD-SE module into a MobileNetV2-based SV model through a feature fusion module, we achieve an end-to-end model trained comprehensively. Experimental results on the ZaloAI and VinBigData datasets indicate that our model consistently outperforms conventional SV systems across various adverse conditions, achieving approximately 10% better accuracy even in noisy and silent environments, with only a 3.5% increase in parameters. Notably, when deployed on mobile devices, the model achieves superior SV accuracy with a minimal latency increase of approximately 13 milliseconds, making it highly suitable for real-time applications.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] EXPLORING UNIVERSAL SPEECH ATTRIBUTES FOR SPEAKER VERIFICATION
    Zhang, Sheng
    Guo, Wu
    Hu, Guoping
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5355 - 5359
  • [22] SELECTION OF FEATURES AND SPEECH SEGMENTS FOR SPEAKER VERIFICATION
    LIN, WC
    PILLAY, SK
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1975, 58 : S107 - S107
  • [23] Towards Speaker Verification for Crowdsourced Speech Collections
    Mendonca, John
    Correia, Rui
    Lourenco, Mariana
    Freitas, Joao
    Trancoso, Isabel
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 5929 - 5937
  • [24] Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech
    De Leon, Phillip L.
    Pucher, Michael
    Yamagishi, Junichi
    ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, 2010, : 151 - 158
  • [25] PLDA Speaker Verification with Limited Speech Data
    Ridzik, Andrej
    Rusko, Milan
    SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 325 - 332
  • [26] A New Speech Corpus in Spanish for Speaker Verification
    Garcia, N.
    Arias-Vergara, T.
    Orozco-Arroyave, J. R.
    Vargas-Bonilla, J. F.
    2016 XXI SYMPOSIUM ON SIGNAL PROCESSING, IMAGES AND ARTIFICIAL VISION (STSIVA), 2016,
  • [27] SCHEME FOR SPEECH PROCESSING IN AUTOMATIC SPEAKER VERIFICATION
    DAS, SK
    MOHN, WS
    IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1971, AU19 (01): : 32 - &
  • [28] Speaker Re-identification with Speaker Dependent Speech Enhancement
    Shi, Yanpei
    Huang, Qiang
    Hain, Thomas
    INTERSPEECH 2020, 2020, : 1530 - 1534
  • [29] Speaker Recognition with VAD
    Ling, Jian
    Sun, Shuifa
    Zhu, Jianwei
    Liu, Xiaoli
    PROCEEDINGS OF THE 2009 SECOND PACIFIC-ASIA CONFERENCE ON WEB MINING AND WEB-BASED APPLICATION, 2009, : 313 - +
  • [30] Towards robustness in speaker verification: Enhancement and adaptation
    Tadj, C
    Gabrea, M
    Gargour, C
    Ramachandran, V
    2002 45TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL III, CONFERENCE PROCEEDINGS, 2002, : 320 - 323