Speech/non-speech discrimination based on contextual information integrated bispectrum LRT

被引:28
作者
Ramirez, Javier [1 ]
Gorriz, Juan Manuel
Segura, Jose Carlos
Puntonet, Carlos G.
Rubio, Antonio J.
机构
[1] Periodista Daniel Saucedo Aranda, Dept Teoria Senal Telemat & Comunicac, Granada 18071, Spain
[2] Periodista Daniel Saucedo Aranda, Dept Arquitectura & Tecnol Comp, Granada 18071, Spain
关键词
contextual likelihood ratio test; higher order statistics; robust speech recognition; voice activity detection;
D O I
10.1109/LSP.2006.873147
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This letter shows an effective statistical voice activity detection algorithm based on the integrated bispectrum, which is defined as a cross spectrum between the signal and its square and inherits the ability of higher order statistics to detect signals in noise with many other additional advantages: 1) its computation as a cross spectrum leads to significant computational savings, and 2) the variance of the estimator is of the same order as that of the power spectrum estimator. The decision rule is formulated in terms of an average likelihood ratio test (LRT) involving successive integrated bispectrum speech features. With these and other innovations, the proposed method reports significant improvements in speech/pause discrimination as well as in speech recognition over standardized techniques such as ITU-T G.729, ETSI AMR, and AFE VADs, and over recently published VADs.
引用
收藏
页码:497 / 500
页数:4
相关论文
共 25 条
[1]  
ARMANI L, 2003, P EUROSPEECH 2003 GE, P501
[2]   Noise reduction and echo cancellation front-end for speech codecs [J].
Basbug, F ;
Swaminathan, K ;
Nandkumar, S .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (01) :1-13
[3]  
BRILLINGER DR, 1968, SPECTRAL ANAL TIME S
[4]  
Brillinger DR., 1975, TIME SERIES DATA ANA
[5]  
Cho YD, 2001, IEEE SIGNAL PROC LET, V8, P276, DOI 10.1109/97.957270
[6]  
ETSI, 1999, VOIC ACT DET VAD AD
[7]   A soft voice activity detector based on a Laplacian-Gaussian model [J].
Gazor, S ;
Zhang, W .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05) :498-505
[8]   Improved MO-LRT VAD based on bispectra Gaussian model [J].
Górriz, JM ;
Ramírez, J ;
Segura, JC ;
Puntonet, CG .
ELECTRONICS LETTERS, 2005, 41 (15) :877-879
[9]  
Hirsch H.G, 2000, P ASR2000 AUT SPEECH
[10]   Towards improving speech detection robustness for speech recognition in adverse conditions [J].
Karray, L ;
Martin, A .
SPEECH COMMUNICATION, 2003, 40 (03) :261-276