Adaptive regularization framework for robust voice activity detection

被引:0
作者
Lu, Xugang [1 ]
Unoki, Masashi
Isotani, Ryosuke [1 ]
Kawai, Hisashi [1 ]
Nakamura, Satoshi [1 ]
机构
[1] Natl Inst Informat & Commun Technol, Tokyo, Japan
来源
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年
关键词
Noise reduction; voice activity detection; reproducing kernel Hilbert space;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional VAD algorithms work well under clean conditions, their performance however decreases drastically in noisy environments. We have investigated the tradeoff between false acceptance rate (FAR) and false rejection rate (FRR) in VAD with the consideration of noise reduction and speech distortion problem in speech enhancement, and proposed a regularization framework for noise reduction in designing VAD algorithms. In the framework, the balance between FAR and FRR was implicitly controlled by using a regularization parameter. In addition, the regularization was done in a reproducing kernel Hilbert space (RKHS) which made it easy to apply a non-linear transform function via "kernel trick" for noise reduction. Under this framework, a better tradeoff between FAR and FRR was obtained in VAD. Considering the non-stationarity property of speech and noise, in this study, an adaptive regularization framework was further developed in which the regularization parameter was changed adaptively according to local variations of the signal to noise ratio (SNR). We tested our algorithm on VAD experiments, and compared it with several typical VAD algorithms. The results showed that the proposed algorithm could be used to improve the robustness of VAD.
引用
收藏
页码:2664 / 2667
页数:4
相关论文
共 50 条
[41]   A Weighted Feature Voting Approach for Robust and Real-Time Voice Activity Detection [J].
Moattar, Mohammad Hossein ;
Homayounpour, Mohammad Mehdi .
ETRI JOURNAL, 2011, 33 (01) :99-109
[42]   Robust speaker verification in air traffic control using improved voice activity detection [J].
Neffe, Michael ;
Van Pham, Tuan ;
Pernkopf, Franz ;
Kubin, Gernot .
PROCEEDINGS OF THE FOURTH IASTED INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PATTERN RECOGNITION, AND APPLICATIONS, 2007, :298-+
[43]   ROBUST VOICE ACTIVITY DETECTION USING EMPIRICAL MODE DECOMPOSITION AND MODULATION SPECTRUM ANALYSIS [J].
Kanai, Yasuaki ;
Unoki, Masashi .
2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, :400-404
[44]   A Robust, Real-Time Voice Activity Detection Algorithm for Embedded Mobile Devices [J].
Bian Wu ;
Xiaolin Ren ;
Chongqing Liu ;
Yaxin Zhang .
International Journal of Speech Technology, 2005, 8 (2) :133-146
[45]   A Robust, Real-Time Voice Activity Detection Algorithm for Embedded Mobile Devices [J].
Wu, Bian ;
Ren, Xiaolin ;
Liu, Chongqing ;
Zhang, Yaxin .
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2005, 8 (02) :133-146
[46]   Horizontal Spectral Entropy with Long-Span of Time for Robust Voice Activity Detection [J].
Wang, Kun-Ching .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (09) :2156-2161
[47]   SELF-SUPERVISED PRETRAINING FOR ROBUST PERSONALIZED VOICE ACTIVITY DETECTION IN ADVERSE CONDITIONS [J].
Bovbjerg, Holger Severin ;
Jensen, Jesper ;
Ostergaard, Jan ;
Tan, Zheng-Hua .
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, :10126-10130
[48]   An Improved Robust Statistical Voice Activity Detection based on Sub-band Periodic Intensity [J].
He, Weijun ;
Feng, Xiaohui ;
Zhu, Zhengyu ;
Zhou, Weili .
2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, :2171-2175
[49]   Robust Voice Activity Detection Based on Concept of Modulation Transfer Function in Noisy Reverberant Environments [J].
Morita, Shota ;
Unoki, Masashi ;
Lu, Xugang ;
Akagi, Masato .
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, :108-+
[50]   Robust voice activity detection using higher-order statistics in the LPC residual domain [J].
Nemer, E ;
Goubran, R ;
Mahmoud, S .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03) :217-231