Adaptive regularization framework for robust voice activity detection

被引:0
|
作者
Lu, Xugang [1 ]
Unoki, Masashi
Isotani, Ryosuke [1 ]
Kawai, Hisashi [1 ]
Nakamura, Satoshi [1 ]
机构
[1] Natl Inst Informat & Commun Technol, Tokyo, Japan
来源
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年
关键词
Noise reduction; voice activity detection; reproducing kernel Hilbert space;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional VAD algorithms work well under clean conditions, their performance however decreases drastically in noisy environments. We have investigated the tradeoff between false acceptance rate (FAR) and false rejection rate (FRR) in VAD with the consideration of noise reduction and speech distortion problem in speech enhancement, and proposed a regularization framework for noise reduction in designing VAD algorithms. In the framework, the balance between FAR and FRR was implicitly controlled by using a regularization parameter. In addition, the regularization was done in a reproducing kernel Hilbert space (RKHS) which made it easy to apply a non-linear transform function via "kernel trick" for noise reduction. Under this framework, a better tradeoff between FAR and FRR was obtained in VAD. Considering the non-stationarity property of speech and noise, in this study, an adaptive regularization framework was further developed in which the regularization parameter was changed adaptively according to local variations of the signal to noise ratio (SNR). We tested our algorithm on VAD experiments, and compared it with several typical VAD algorithms. The results showed that the proposed algorithm could be used to improve the robustness of VAD.
引用
收藏
页码:2664 / 2667
页数:4
相关论文
共 50 条
  • [1] Regularization in a reproducing kernel Hilbert space for robust voice activity detection
    Lu, Xugang
    Unoki, Masashi
    Isotani, Ryosuke
    Kawai, Hisashi
    Nakamura, Satoshi
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 585 - 588
  • [2] On Noise Robust Voice Activity Detection
    Dekens, Tomas
    Verhelst, Werner
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2660 - 2663
  • [3] AN ADAPTIVE VOICE ACTIVITY DETECTION ALGORITHM
    Zhang Zhigang
    Huang Junqin
    INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS, 2015, 8 (04): : 2175 - 2194
  • [4] Speech Waveform Compression Using Robust Adaptive Voice Activity Detection for Nonstationary Noise
    Syed, Waheeduddin Q.
    Wu, Hsiao-Chun
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2008,
  • [5] A Fusion Model for Robust Voice Activity Detection
    Wang, Guan-Bo
    Zhang, Wei-Qiang
    2019 IEEE 19TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2019), 2019,
  • [6] Rayleigh providers Robust voice activity detection
    Wang Jingfang
    Xu Huiyan
    2011 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND CONTROL (ICECC), 2011, : 2200 - 2203
  • [7] A robust voice activity detection technique based on combined framework of lacunarity and empirical mode decomposition
    Saxena, Ishan
    Mondal, Ashok
    2016 IEEE STUDENTS' CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER SCIENCE (SCEECS), 2016,
  • [8] Robust Voice Activity Detection Based on Adaptive Sub-band Energy Sequence Analysis and Harmonic Detection
    Guo, Yanmeng
    Fu, Qiang
    Yan, Yonghong
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1637 - 1640
  • [9] A robust framework for tamper detection in digital recorded voice signals
    Bernal-Patino, Adriana
    Ponomaryov, Volodymyr, I
    Reyes-Reyes, Rogelio
    Cruz-Ramos, Clara
    REAL-TIME IMAGE PROCESSING AND DEEP LEARNING 2019, 2019, 10996
  • [10] Robust voice activity detection directed by noise classification
    Saeedi, Jamal
    Ahadi, Seyed Mohammad
    Faez, Karim
    SIGNAL IMAGE AND VIDEO PROCESSING, 2015, 9 (03) : 561 - 572