Online Unsupervised Classification With Model Comparison in the Variational Bayes Framework for Voice Activity Detection

被引：8

作者：

Cournapeau, David ^{[1
,2
]}

Watanabe, Shinji ^{[2
]}

Nakamura, Atsushi ^{[2
]}

Kawahara, Tatsuya ^{[1
]}

机构：

[1] Kyoto Univ, Sch Informat, Kyoto 6068501, Japan

[2] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan

来源：

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING | 2010年 / 4卷 / 06期

关键词：

Sequential estimation; speech analysis; variational Bayes (VB); voice activity detection (VAD); SPEECH RECOGNITION; EM ALGORITHM;

D O I：

10.1109/JSTSP.2010.2080821

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

A new online, unsupervised method for Voice Activity Detection (VAD) is proposed. The conventional VAD methods often rely on heuristics to adapt the decision threshold to the estimated SNR. The proposed VAD method is based on the Variational Bayes (VB) approach to the online Expectation Maximization (EM), so that it can automatically adapt the decision level and the statistical model at the same time. We consider two parallel classifiers, one for the noise-only case, and the other for speech-and-noise case. Both models are trained concurrently and online using the VB framework. The VB framework also provides an explicit approximation of the log evidence called free energy. It is used to assess the reliability of the classifier in an online fashion, and to decide which model is more appropriate at a given time frame. Experimental evaluations were conducted on the CENSREC-1-C database designed for VAD evaluations. With the effect of the model comparison, the proposed scheme outperforms the conventional VAD algorithms, especially in the remote recording condition. It is also shown to be more robust with respect to changes of the noise type.

引用

页码：1071 / 1083

页数：13

共 50 条

[21] Comparison of Voice Activity Detection algorithms for VoIP
Prasad, RV
Sangwan, A
Jamadagni, HS
Chiranth, MC
Sah, R
Gaurav, V
ISCC 2002: SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, PROCEEDINGS, 2002, : 530 - 535
[22] Robust voice activity detection directed by noise classification
Saeedi, Jamal
Ahadi, Seyed Mohammad
Faez, Karim
SIGNAL IMAGE AND VIDEO PROCESSING, 2015, 9 (03) : 561 - 572
[23] Adaptive regularization framework for robust voice activity detection
Lu, Xugang
Unoki, Masashi
Isotani, Ryosuke
Kawai, Hisashi
Nakamura, Satoshi
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2664 - 2667
[24] Robust voice activity detection directed by noise classification
Jamal Saeedi
Seyed Mohammad Ahadi
Karim Faez
Signal, Image and Video Processing, 2015, 9 : 561 - 572
[25] Joint Learning using Denoising Variational Autoencoders for Voice Activity Detection
Jung, Youngmoon
Kim, Younggwan
Choi, Yeunju
Kim, Hoirin
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1210 - 1214
[26] Comparison among Voice Activity Detection Methods for Korean Elderly Voice
Lee, JiYeoun
PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 4: BIOSIGNALS, 2017, : 231 - 235
[27] Enhanced Voice Activity Detection Using Acoustic Event Detection and Classification
Cho, Namgook
Kim, Eun-Kyoung
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2011, 57 (01) : 196 - 202
[28] Online Target Speaker Voice Activity Detection for Speaker Diarization
Wang, Weiqing
Lin, Qingjian
Li, Ming
INTERSPEECH 2022, 2022, : 1441 - 1445
[29] Unsupervised model-guided online transfer learning framework for multiple fault detection of satellite control system
Xia, Huaitao
Meng, Tao
NEUROCOMPUTING, 2025, 618
[30] rVAD: An unsupervised segment-based robust voice activity detection method
Tan, Zheng-Hua
Sarkar, Achintya Kr
Dehak, Najim
COMPUTER SPEECH AND LANGUAGE, 2020, 59 : 1 - 21

← 1 2 3 4 5 →