Formant-Based Robust Voice Activity Detection

被引：35

作者：

Yoo, In-Chul ^{[1
]}

Lim, Hyeontaek ^{[1
]}

Yook, Dongsuk ^{[1
]}

机构：

[1] Korea Univ, Dept Comp Sci & Engn, Speech Informat Proc Lab, Seoul 136701, South Korea

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2015年 / 23卷 / 12期

关键词：

Formants; spectral peaks; voice activity detection (VAD); SPECTRUM ESTIMATION; NOISE; ALGORITHM;

D O I：

10.1109/TASLP.2015.2476762

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Voice activity detection (VAD) can be used to distinguish human speech from other sounds, and various applications can benefit from VAD-including speech coding and speech recognition. To accurately detect voice activity, the algorithm must take into account the characteristic features of human speech and/or background noise. In many real-life applications, noise frequently occurs in an unexpected manner, and in such situations, it is difficult to determine the characteristics of noise with sufficient accuracy. As a result, robust VAD algorithms that depend less on making correct noise estimates are desirable for real-life applications. Formants are the major spectral peaks of the human voice, and these are highly useful to distinguish vowel sounds. The characteristics of the spectral peaks are such that, these peaks are likely to survive in a signal after severe corruption by noise, and so formants are attractive features for voice activity detection under low signal-to-noise ratio (SNR) conditions. However, it is difficult to accurately extract formants from noisy signals when background noise introduces unrelated spectral peaks. Therefore, this paper proposes a simple formant-based VAD algorithm to overcome the problem of detecting formants under conditions with severe noise. The proposed method achieves a much faster processing time and outperforms standard VAD algorithms under various noise conditions. The proposed method is robust against various types of noise and produces a light computational load, so it is suitable for use in various applications.

引用

页码：2238 / 2245

页数：8

共 27 条

[1] [Anonymous], P ANN C INT SPEECH C
[2] [Anonymous], 1996, G729 ITUT
[3] Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold
Davis, A
Nordholm, S
Togneri, R
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02): : 412 - 424
[4] Evaluation of formant-like features on an automatic vowel classification task
de Wet, F
Weber, K
Boves, L
Cranen, B
Bengio, S
Bourlard, H
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2004, 116 (03) : 1781 - 1792
[5] *ETSI, 2002, 201108 ETSI EN
[6] Faubel F., 2011, 2011 Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA 2011), P70, DOI 10.1109/HSCMA.2011.5942412
[7] Garofolo S., 1993, Timit acousticphonetic continuous speech corpus
[8] Robust Voice Activity Detection Using Long-Term Signal Variability
Ghosh, Prasanta Kumar
Tsiartas, Andreas
Narayanan, Shrikanth
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (03): : 600 - 613
[9] Hirsch H.-G., 2005, Fant - filtering and noise adding tool
[10] Hirsch H. G., 2000, P ISCA ITRW ASR AUT

← 1 2 3 →