Enhancing Voice Activity Detection in Noisy Environments Using Deep Neural Networks

被引:0
作者
Nagaraja, B. G. [1 ]
Yadava, G. Thimmaraja [2 ]
机构
[1] Vidyavardhaka Coll Engn, E&CE, Gokulam 3 Stage, Mysuru 570002, Karnataka, India
[2] Nitte Meenakshi Inst Technol, E&CE, Bengaluru 560064, Karnataka, India
关键词
VAD; DNN; SNR; VQ-VAD; NOIZEUS; phase features;
D O I
10.1007/s00034-025-03055-3
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Voice activity detection (VAD) is a crucial component in numerous speech processing applications. One of the primary challenges in VAD is achieving a balance between avoiding false negatives and minimizing false positives in noisy environments. Even the past works have reported promising accuracy, nevertheless the performance of VAD in negative signal-to-noise ratio (SNR) conditions remains an open question. In this work, we aim to establish a deep neural network (DNN)-based speech enhancement pre-processing framework to improve VAD accuracy under deep noisy conditions. Additionally, we explore a DNN-based techniques that leverage the phase component to enhance speech quality. Experimental results on NOIZEUS database show that our proposed approach outperforms vector quantization-based VAD (VQ-VAD), as demonstrated by the VAD metrics.
引用
收藏
页数:15
相关论文
共 38 条
[21]  
Mateju L, 2017, INT CONF ACOUST SPEE, P5460, DOI 10.1109/ICASSP.2017.7953200
[22]   Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal [J].
Mukherjee H. ;
Obaidullah S.M. ;
Santosh K.C. ;
Phadikar S. ;
Roy K. .
International Journal of Speech Technology, 2018, 21 (4) :753-760
[23]   Multilingual Speaker Identification by Combining Evidence from LPR and Multitaper MFCC [J].
Nagaraja, B. ;
Jayanna, H. .
JOURNAL OF INTELLIGENT SYSTEMS, 2013, 22 (03) :241-251
[24]  
Nagaraja B.G., 2012, P INT C REC TRENDS C, P335, DOI [10.1007/978-3-642-34135-913, DOI 10.1007/978-3-642-34135-913]
[25]  
Nagaraja B.G., 2012, P INT C SIGN IM PROC, P143, DOI [10.1007/978-81-322-0997-313, DOI 10.1007/978-81-322-0997-313]
[26]   POLYNOMIAL EIGENVALUE DECOMPOSITION-BASED TARGET SPEAKER VOICE ACTIVITY DETECTION IN THE PRESENCE OF COMPETING TALKERS [J].
Neo, Vincent W. ;
Weiss, Stephan ;
McKnight, Simon W. ;
Hogg, Aidan O. T. ;
Naylor, Patrick A. .
2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
[27]  
Parihar N., 2004, 2004 12th European Signal Processing Conference (EUSIPCO), P553
[28]  
Pritam L.S., 2018, Int. J. Recent Technol. Eng.
[29]  
Shimauchi S, 2017, INT CONF ACOUST SPEE, P676, DOI 10.1109/ICASSP.2017.7952241
[30]   Binary and ratio time-frequency masks for robust speech recognition [J].
Srinivasan, Soundararajan ;
Roman, Nicoleta ;
Wang, DeLiang .
SPEECH COMMUNICATION, 2006, 48 (11) :1486-1501