Enhancing Voice Activity Detection in Noisy Environments Using Deep Neural Networks

被引:0
作者
Nagaraja, B. G. [1 ]
Yadava, G. Thimmaraja [2 ]
机构
[1] Vidyavardhaka Coll Engn, E&CE, Gokulam 3 Stage, Mysuru 570002, Karnataka, India
[2] Nitte Meenakshi Inst Technol, E&CE, Bengaluru 560064, Karnataka, India
关键词
VAD; DNN; SNR; VQ-VAD; NOIZEUS; phase features;
D O I
10.1007/s00034-025-03055-3
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Voice activity detection (VAD) is a crucial component in numerous speech processing applications. One of the primary challenges in VAD is achieving a balance between avoiding false negatives and minimizing false positives in noisy environments. Even the past works have reported promising accuracy, nevertheless the performance of VAD in negative signal-to-noise ratio (SNR) conditions remains an open question. In this work, we aim to establish a deep neural network (DNN)-based speech enhancement pre-processing framework to improve VAD accuracy under deep noisy conditions. Additionally, we explore a DNN-based techniques that leverage the phase component to enhance speech quality. Experimental results on NOIZEUS database show that our proposed approach outperforms vector quantization-based VAD (VQ-VAD), as demonstrated by the VAD metrics.
引用
收藏
页数:15
相关论文
共 38 条
[1]  
Aguiar-Pontes Josafa, 2023, Proceedings of the Future Technologies Conference (FTC) 2023. Lecture Notes in Networks and Systems (814), P232, DOI 10.1007/978-3-031-47451-4_17
[2]  
Ding SJ, 2022, Arxiv, DOI arXiv:2204.03793
[3]   Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training [J].
Dinkel, Heinrich ;
Wang, Shuai ;
Xu, Xuenan ;
Wu, Mengyue ;
Yu, Kai .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :1542-1555
[4]   Voice Activity Detection: Merging Source and Filter-based Information [J].
Drugman, Thomas ;
Stylianou, Yannis ;
Kida, Yusuke ;
Akamine, Masami .
IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (02) :252-256
[5]  
Dubey H, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P2726, DOI 10.1109/ICASSP.2018.8461652
[6]   Enhancement of speech dynamics for voice activity detection using DNN [J].
Dwijayanti, Suci ;
Yamamori, Kei ;
Miyoshi, Masato .
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
[7]  
Erdogan H, 2015, INT CONF ACOUST SPEE, P708, DOI 10.1109/ICASSP.2015.7178061
[8]  
Friedman D. H., 1985, ICASSP 85. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (Cat. No. 85CH2118-8), P1121
[9]   Phase Processing for Single-Channel Speech Enhancement [J].
Gerkmann, Timo ;
Krawczyk-Becker, Martin ;
Le Roux, Jonathan .
IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (02) :55-66
[10]   Robust Voice Activity Detection Using Long-Term Signal Variability [J].
Ghosh, Prasanta Kumar ;
Tsiartas, Andreas ;
Narayanan, Shrikanth .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (03) :600-613