A Statistical Model-Based Voice Activity Detection Using Multiple DNNs and Noise Awareness

被引:0
作者
Hwang, Inyoung [1 ]
Sim, Jaeseong [1 ]
Kim, Sang-Hyeon [1 ]
Song, Kwang-Sub [1 ]
Chang, Joon-Hyuk [1 ]
机构
[1] Hanyang Univ, Dept Elect & Comp Engn, Seoul, South Korea
来源
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年
关键词
voice activity detection; statistical model; acoustic environment classification; deep neural network; ensemble; SPEECH ENHANCEMENT;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose the ensemble of deep neural networks (DNNs) by using acoustic environment classification for statistical model-based voice activity detection (VAD). Since conventional decision functions for statistical model-based VAD are based on shallow model and it cannot take an advantage of the diversity of the space distribution of features, we present to use the multiple DNNs separately trained on different noise condition as decision function for the statistical model-based VAD. And, environmental noise classification is also performed based on the separate DNN since acoustic environment classification makes it possible to achieve high detection performance at various type of noise environment by using different algorithm according to current noise condition. In the training stage, a number of DNNs are independently trained according to different type of noise environments, and separate DNN is organized to detect one of the environmental conditions. In an online stage, the environmental knowledge on each frame is contributed to allow us to combine the speech presence probabilities, which are derived from the ensemble of the trained DNNs for the individual environment. Our approach for VAD was evaluated in terms of objective measures and showed significant improvement compared to the conventional algorithm.
引用
收藏
页码:2277 / 2281
页数:5
相关论文
共 16 条
  • [1] Bishop CM, 1995, Neural Networks for Pattern Recognition
  • [2] On using acoustic environment classification for statistical model-based speech enhancement
    Choi, Jae-Hun
    Chang, Joon-Hyuk
    [J]. SPEECH COMMUNICATION, 2012, 54 (03) : 477 - 490
  • [3] Ciresan D. C., 2012, P CVPR
  • [4] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR
    EPHRAIM, Y
    MALAH, D
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06): : 1109 - 1121
  • [5] Reducing the dimensionality of data with neural networks
    Hinton, G. E.
    Salakhutdinov, R. R.
    [J]. SCIENCE, 2006, 313 (5786) : 504 - 507
  • [6] A fast learning algorithm for deep belief nets
    Hinton, Geoffrey E.
    Osindero, Simon
    Teh, Yee-Whye
    [J]. NEURAL COMPUTATION, 2006, 18 (07) : 1527 - 1554
  • [7] Statistical model-based voice activity detection using support vector machine
    Jo, Q-H.
    Chang, J. -H.
    Shin, J. W.
    Kim, N. S.
    [J]. IET SIGNAL PROCESSING, 2009, 3 (03) : 205 - 210
  • [8] Discriminative weight training for a statistical model-based voice activity detection
    Kang, Sang-Ick
    Jo, Q-Haing
    Chang, Joon-Hyuk
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2008, 15 : 170 - 173
  • [9] Acoustic Modeling Using Deep Belief Networks
    Mohamed, Abdel-rahman
    Dahl, George E.
    Hinton, Geoffrey
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 14 - 22
  • [10] Platt JC, 2000, ADV NEUR IN, P61