Lightweight speaker verification with integrated VAD and speech enhancement

被引:0
|
作者
Hoang, Kiet Anh [1 ,2 ]
Le, Tung [1 ,2 ]
Nguyen, Huy Tien [1 ,2 ]
机构
[1] Univ Sci Ho Chi Minh, Fac Informat Technol, Ho Chi Minh City, Vietnam
[2] Vietnam Natl Univ, Ho Chi Minh City, Vietnam
关键词
Lightweight speaker verification; Speech enhancement; Voice activity detection; Mobile devices; Unified model; ROBUST;
D O I
10.1016/j.dsp.2024.104969
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Reducing noise and non-speech segments that degrade speaker verification (SV) performance requires voice activity detection (VAD) and speech enhancement (SE) processing. However, this signal preprocessing introduces additional latency, especially on low-resource mobile devices with limited computational capacity. To address this limitation, we propose a novel, lightweight, unified VAD and masking-based SE module (VAD-SE module) that enhances resistance to acoustic distortions with minimal computational overhead. By integrating the VAD-SE module into a MobileNetV2-based SV model through a feature fusion module, we achieve an end-to-end model trained comprehensively. Experimental results on the ZaloAI and VinBigData datasets indicate that our model consistently outperforms conventional SV systems across various adverse conditions, achieving approximately 10% better accuracy even in noisy and silent environments, with only a 3.5% increase in parameters. Notably, when deployed on mobile devices, the model achieves superior SV accuracy with a minimal latency increase of approximately 13 milliseconds, making it highly suitable for real-time applications.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Speech Enhancement Regularized by a Speaker Verification Model
    Lay, Bunlong
    Gerkmann, Timo
    2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2022,
  • [2] VoiceID Loss: Speech Enhancement for Speaker Verification
    Shon, Suwon
    Tang, Hao
    Glass, James
    INTERSPEECH 2019, 2019, : 2888 - 2892
  • [3] A Fused Speech Enhancement Framework for Robust Speaker Verification
    Wu, Yanfeng
    Li, Taihao
    Zhao, Junan
    Wang, Qirui
    Xu, Jing
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 883 - 887
  • [4] Lightweight Embeddings for Speaker Verification
    Tkachenko, Maxim
    Yamshinin, Alexander
    Kotov, Mikhail
    Nastasenko, Marina
    SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 687 - 696
  • [5] Front-end speech enhancement for commercial speaker verification systems
    Eskimez, Sefik Emre
    Soufleris, Peter
    Duan, Zhiyao
    Heinzelman, Wendi
    SPEECH COMMUNICATION, 2018, 99 : 101 - 113
  • [6] Speaker Verification with Multi-Run ICA Based Speech Enhancement
    Al-Ali, Ahmed Kamil Hasan
    Dean, David
    Senadji, Bouchra
    Baktashmotlagh, Mahsa
    Chandran, Vinod
    2017 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2017,
  • [7] Speaker-dependent Dictionary-based Speech Enhancement for Text-Dependent Speaker Verification
    Thomsen, Nicolai Baek
    Thomsen, Dennis Alexander Lehmann
    Tan, Zheng-Hua
    Lindberg, Borge
    Jensen, Soren Holdt
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1839 - 1843
  • [8] Speech Enhancement for Speaker Identification
    Mahesh, R.
    2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,
  • [9] Speaker verification using coded speech
    Moreno-Daniel, A
    Juang, BH
    Nolazco-Flores, JA
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, 2004, 3287 : 366 - 373
  • [10] Research on Truncated Speech in Speaker Verification
    Bie, Fanhu
    Wang, Dong
    Zheng, Thomas Fang
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 425 - 425