Adaptive regularization framework for robust voice activity detection

被引:0
作者
Lu, Xugang [1 ]
Unoki, Masashi
Isotani, Ryosuke [1 ]
Kawai, Hisashi [1 ]
Nakamura, Satoshi [1 ]
机构
[1] Natl Inst Informat & Commun Technol, Tokyo, Japan
来源
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年
关键词
Noise reduction; voice activity detection; reproducing kernel Hilbert space;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional VAD algorithms work well under clean conditions, their performance however decreases drastically in noisy environments. We have investigated the tradeoff between false acceptance rate (FAR) and false rejection rate (FRR) in VAD with the consideration of noise reduction and speech distortion problem in speech enhancement, and proposed a regularization framework for noise reduction in designing VAD algorithms. In the framework, the balance between FAR and FRR was implicitly controlled by using a regularization parameter. In addition, the regularization was done in a reproducing kernel Hilbert space (RKHS) which made it easy to apply a non-linear transform function via "kernel trick" for noise reduction. Under this framework, a better tradeoff between FAR and FRR was obtained in VAD. Considering the non-stationarity property of speech and noise, in this study, an adaptive regularization framework was further developed in which the regularization parameter was changed adaptively according to local variations of the signal to noise ratio (SNR). We tested our algorithm on VAD experiments, and compared it with several typical VAD algorithms. The results showed that the proposed algorithm could be used to improve the robustness of VAD.
引用
收藏
页码:2664 / 2667
页数:4
相关论文
共 50 条
  • [21] Robust Voice Activity Detection Based on Complementary BLSTM Enhancement Stage
    Shahryary, Iman
    Seyedin, Sanaz
    Ahadi, Seyed Mohammad
    2020 28TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2020, : 1608 - 1612
  • [22] Robust Voice Activity Detection Feature Design Based on Spectral Kurtosis
    Zhang Shuyin
    Guo Ying
    Zhang Qun
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL III, 2009, : 269 - 272
  • [23] Noise Robust Voice Activity Detection Based on Switching Kalman Filter
    Fujimoto, Masakiyo
    Ishizuka, Kentaro
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 965 - 968
  • [24] Noise robust voice activity detection based on switching Kalman filter
    Fujimoto, Masakiyo
    Ishizuka, Kentaro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03): : 467 - 477
  • [25] Adaptive Voice Activity Detection Based on Long-Term Information
    Yang X.-K.
    Qu D.
    Zhang W.-L.
    Yan H.-G.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2018, 46 (04): : 878 - 885
  • [26] VOICE ACTIVITY DETECTION BASED ON STATISTICAL LIKELIHOOD RATIO WITH ADAPTIVE THRESHOLDING
    Li, Xiaofei
    Horaud, Radu
    Girin, Laurent
    Gannot, Sharon
    2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [27] Visual Voice Activity Detection and Adaptive Threshold Estimation for Speech Recognition
    Song, Taeyup
    Lee, Kyungsun
    Kim, Sung Soo
    Lee, Jae-Won
    Ko, Hanseok
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2015, 34 (04): : 321 - 327
  • [28] rVAD: An unsupervised segment-based robust voice activity detection method
    Tan, Zheng-Hua
    Sarkar, Achintya Kr
    Dehak, Najim
    COMPUTER SPEECH AND LANGUAGE, 2020, 59 : 1 - 21
  • [29] A NEW APPROACH FOR ROBUST REALTIME VOICE ACTIVITY DETECTION USING SPECTRAL PATTERN
    Moattar, M. H.
    Homayounpour, M. M.
    Kalantari, Nima Khademi
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4478 - 4481
  • [30] Noise robust voice activity detection based on periodic to aperiodic component ratio
    Ishizuka, Kentaro
    Nakatani, Tomohiro
    Fujimoto, Masakiyo
    Miyazaki, Noboru
    SPEECH COMMUNICATION, 2010, 52 (01) : 41 - 60