Online Unsupervised Classification With Model Comparison in the Variational Bayes Framework for Voice Activity Detection

被引:8
|
作者
Cournapeau, David [1 ,2 ]
Watanabe, Shinji [2 ]
Nakamura, Atsushi [2 ]
Kawahara, Tatsuya [1 ]
机构
[1] Kyoto Univ, Sch Informat, Kyoto 6068501, Japan
[2] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan
关键词
Sequential estimation; speech analysis; variational Bayes (VB); voice activity detection (VAD); SPEECH RECOGNITION; EM ALGORITHM;
D O I
10.1109/JSTSP.2010.2080821
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A new online, unsupervised method for Voice Activity Detection (VAD) is proposed. The conventional VAD methods often rely on heuristics to adapt the decision threshold to the estimated SNR. The proposed VAD method is based on the Variational Bayes (VB) approach to the online Expectation Maximization (EM), so that it can automatically adapt the decision level and the statistical model at the same time. We consider two parallel classifiers, one for the noise-only case, and the other for speech-and-noise case. Both models are trained concurrently and online using the VB framework. The VB framework also provides an explicit approximation of the log evidence called free energy. It is used to assess the reliability of the classifier in an online fashion, and to decide which model is more appropriate at a given time frame. Experimental evaluations were conducted on the CENSREC-1-C database designed for VAD evaluations. With the effect of the model comparison, the proposed scheme outperforms the conventional VAD algorithms, especially in the remote recording condition. It is also shown to be more robust with respect to changes of the noise type.
引用
收藏
页码:1071 / 1083
页数:13
相关论文
共 50 条
  • [21] Comparison of Voice Activity Detection algorithms for VoIP
    Prasad, RV
    Sangwan, A
    Jamadagni, HS
    Chiranth, MC
    Sah, R
    Gaurav, V
    ISCC 2002: SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, PROCEEDINGS, 2002, : 530 - 535
  • [22] Robust voice activity detection directed by noise classification
    Saeedi, Jamal
    Ahadi, Seyed Mohammad
    Faez, Karim
    SIGNAL IMAGE AND VIDEO PROCESSING, 2015, 9 (03) : 561 - 572
  • [23] Adaptive regularization framework for robust voice activity detection
    Lu, Xugang
    Unoki, Masashi
    Isotani, Ryosuke
    Kawai, Hisashi
    Nakamura, Satoshi
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2664 - 2667
  • [24] Robust voice activity detection directed by noise classification
    Jamal Saeedi
    Seyed Mohammad Ahadi
    Karim Faez
    Signal, Image and Video Processing, 2015, 9 : 561 - 572
  • [25] Joint Learning using Denoising Variational Autoencoders for Voice Activity Detection
    Jung, Youngmoon
    Kim, Younggwan
    Choi, Yeunju
    Kim, Hoirin
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1210 - 1214
  • [26] Comparison among Voice Activity Detection Methods for Korean Elderly Voice
    Lee, JiYeoun
    PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 4: BIOSIGNALS, 2017, : 231 - 235
  • [27] Enhanced Voice Activity Detection Using Acoustic Event Detection and Classification
    Cho, Namgook
    Kim, Eun-Kyoung
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2011, 57 (01) : 196 - 202
  • [28] Online Target Speaker Voice Activity Detection for Speaker Diarization
    Wang, Weiqing
    Lin, Qingjian
    Li, Ming
    INTERSPEECH 2022, 2022, : 1441 - 1445
  • [29] Unsupervised model-guided online transfer learning framework for multiple fault detection of satellite control system
    Xia, Huaitao
    Meng, Tao
    NEUROCOMPUTING, 2025, 618
  • [30] rVAD: An unsupervised segment-based robust voice activity detection method
    Tan, Zheng-Hua
    Sarkar, Achintya Kr
    Dehak, Najim
    COMPUTER SPEECH AND LANGUAGE, 2020, 59 : 1 - 21