Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold

被引:117
作者
Davis, A [1 ]
Nordholm, S
Togneri, R
机构
[1] Univ Western Australia, Western Australian Telecommun Res Inst, Crawley, WA 6009, Australia
[2] Curtin Univ Technol, Western Australian Telecommun Res Inst, Crawley, WA 6009, Australia
[3] Univ Western Australia, Sch Elect Elect & Comp Engn, Crawley, WA 6009, Australia
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 02期
基金
澳大利亚研究理事会;
关键词
adaptive voice activity detection; statistical decision; voice activity detection (VAD); voice activity detector;
D O I
10.1109/TSA.2005.855842
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Traditionally, voice activity detection algorithms are based on any combination of general speech properties such as temporal energy variations, periodicity, and spectrum. This paper describes a novel statistical method for voice activity detection using a signal-to-noise ratio measure. The method employs a low-variance spectrum estimate and determines' an optimal threshold based on the estimated noise statistics. A possible implementation is presented and evaluated over a large test set and compared to current modern standardized algorithms. The evaluations indicate promising results with the proposed scheme being comparable or favorable over the whole test set.
引用
收藏
页码:412 / 424
页数:13
相关论文
共 22 条
[1]  
[Anonymous], 1998, DIGITAL CELLULAR TEL
[2]  
[Anonymous], 2000, DIGITAL CELLULAR TEL
[3]   A robust voice activity detector for wireless communications using soft computing [J].
Beritelli, F ;
Casale, S ;
Cavallaro, A .
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 1998, 16 (09) :1818-1829
[4]  
BERITELLI F, 2000, P INT C SIGN PROC BE, V2, P69
[5]  
BERITELLI F, 2001, P IEEE ICASSP 01 SAL, V3, P1425
[6]  
Cho YD, 2001, INT CONF ACOUST SPEE, P737, DOI 10.1109/ICASSP.2001.941020
[7]  
DAVIS A, 2003, P JOINT INT C INF CO, V1, P119
[8]   SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].
EPHRAIM, Y ;
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121
[9]  
Freeman D.K., 1989, P INT C AC SPEECH SI, P369
[10]   Spectral subtraction using reduced delay convolution and adaptive averaging [J].
Gustafsson, H ;
Nordholm, SE ;
Claesson, I .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (08) :799-807