Adaptive regularization framework for robust voice activity detection

被引:0
作者
Lu, Xugang [1 ]
Unoki, Masashi
Isotani, Ryosuke [1 ]
Kawai, Hisashi [1 ]
Nakamura, Satoshi [1 ]
机构
[1] Natl Inst Informat & Commun Technol, Tokyo, Japan
来源
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年
关键词
Noise reduction; voice activity detection; reproducing kernel Hilbert space;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional VAD algorithms work well under clean conditions, their performance however decreases drastically in noisy environments. We have investigated the tradeoff between false acceptance rate (FAR) and false rejection rate (FRR) in VAD with the consideration of noise reduction and speech distortion problem in speech enhancement, and proposed a regularization framework for noise reduction in designing VAD algorithms. In the framework, the balance between FAR and FRR was implicitly controlled by using a regularization parameter. In addition, the regularization was done in a reproducing kernel Hilbert space (RKHS) which made it easy to apply a non-linear transform function via "kernel trick" for noise reduction. Under this framework, a better tradeoff between FAR and FRR was obtained in VAD. Considering the non-stationarity property of speech and noise, in this study, an adaptive regularization framework was further developed in which the regularization parameter was changed adaptively according to local variations of the signal to noise ratio (SNR). We tested our algorithm on VAD experiments, and compared it with several typical VAD algorithms. The results showed that the proposed algorithm could be used to improve the robustness of VAD.
引用
收藏
页码:2664 / 2667
页数:4
相关论文
共 50 条
[31]   Robust speaker recognition based on level-building voice activity detection [J].
Xie, Yan-Lu ;
Zhang, Jing-Song ;
Liu, Ming-Hui ;
Huang, Zhong-Wei .
Shenzhen Daxue Xuebao (Ligong Ban)/Journal of Shenzhen University Science and Engineering, 2012, 29 (04) :328-334
[32]   ROBUST VOICE ACTIVITY DETECTION BASED ON PITCH AND SUB-BAND ENERGY [J].
Zhang, Zhihao ;
Lin, Jinlong .
SIGMAP 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS, 2009, :44-48
[33]   Multi-Task Joint-Learning for Robust Voice Activity Detection [J].
Zhuang, Yimeng ;
Tong, Sibo ;
Yin, Maofan ;
Qian, Yanmin ;
Yu, Kai .
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[34]   Robust Statistical Voice Activity Detection Using a Likelihood Ratio Sign Test [J].
Deng, Shiwen ;
Han, Jiqing .
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, :3126-3129
[35]   Speaker-Dependent Voice Activity Detection Robust to Background Speech Noise [J].
Matsuda, Shigeki ;
Ito, Naoya ;
Tsujino, Kosuke ;
Kashioka, Hideki ;
Sagayama, Shigeki .
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, :2625-2628
[36]   ON USING SPECTRAL GRADIENT IN CONDITIONAL MAP CRITERION FOR ROBUST VOICE ACTIVITY DETECTION [J].
Choi, Jae-Hun ;
Chang, Joon-Hyuk .
PROCEEDINGS OF THE 3RD IEEE INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC 2012), 2012, :370-374
[37]   Source Enumeration and Robust Voice Activity Detection in Wireless Acoustic Sensor Networks [J].
Hasija, Tanuj ;
Goelz, Martin ;
Muma, Michael ;
Schreier, Peter J. ;
Zoubir, Abdelhak M. .
CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, :1257-1261
[38]   Applying the Bi-level HMM for Robust Voice-activity Detection [J].
Hwang, Yongwon ;
Jeong, Mun-Ho ;
Oh, Sang-Rok ;
Kim, Il-Hwan .
JOURNAL OF ELECTRICAL ENGINEERING & TECHNOLOGY, 2017, 12 (01) :373-377
[39]   Cluster-based Discriminative Weight Training Framework for Voice Activity Detection [J].
Park, Sangjun ;
Hahn, Minsoo .
2014 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA), 2014,
[40]   Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection [J].
Mariotte, Theo ;
Larcher, Anthony ;
Montresor, Silvio ;
Thomas, Jean-Hugh .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 :1859-1872