An Improved Robust Statistical Voice Activity Detection based on Sub-band Periodic Intensity

被引:0
作者
He, Weijun [1 ]
Feng, Xiaohui [1 ]
Zhu, Zhengyu [1 ]
Zhou, Weili [1 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou, Guangdong, Peoples R China
来源
2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION | 2015年
关键词
speech recognition; voice activity detection; likelihood ratio; statistical model; reserved coefficient;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
From an investigation of the statistical model likelihood ratio test-based voice activity detection(VAD), it was discovered that there existed false alarm problem in detecting the non verbal vocalization signal. In this paper, an improved statistical model-based VAD method is proposed for noise adverse environments, which employs reserved coefficient in the decision rule. The reserved coefficient is determined by sub-bands periodic intensity, sub-bands are divided on the basis of human auditory sensing characteristic. The final decision depends upon the geometric mean of the reserved sub-band likelihood ratios. Simulation which is carried out on the CADCC and NOISEX-92 databases, shows its promising performance in comparison with traditional robust VAD methods in both stationary and nonstationary noise conditions, in terms of improved false alarm rate and receiver operating characteristic (ROC) curve.
引用
收藏
页码:2171 / 2175
页数:5
相关论文
共 18 条
[1]   ITU-T recommendation G.729 Annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications [J].
Benyassine, A ;
Shlomot, E ;
Su, HY ;
Massaloux, D ;
Lamblin, C ;
Petit, JP .
IEEE COMMUNICATIONS MAGAZINE, 1997, 35 (09) :64-73
[2]  
Cho YD, 2001, INT CONF ACOUST SPEE, P737, DOI 10.1109/ICASSP.2001.941020
[3]   SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].
EPHRAIM, Y ;
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121
[4]   Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay [J].
Gerkmann, Timo ;
Hendriks, Richard C. .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04) :1383-1393
[5]   Voice Activity Detection based on Statistical Model Employing Deep Neural Network [J].
Hwang, Inyoung ;
Chang, Joon-Hyuk .
2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, :582-585
[6]   Voice Activity Detection in Presence of Transient Noise Using Spectral Clustering [J].
Mousazadeh, Saman ;
Cohen, Israel .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06) :1261-1271
[7]   SOURCE CODING OF DISCRETE FOURIER-TRANSFORM [J].
PEARLMAN, WA ;
GRAY, RM .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1978, 24 (06) :683-692
[8]   Convex Combination of Multiple Statistical Models With Application to VAD [J].
Petsatodis, Theodoros ;
Boukis, Christos ;
Talantzis, Fotios ;
Tan, Zheng-Hua ;
Prasad, Ramjee .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (08) :2314-2327
[9]   Statistical voice activity detection using a multiple observation likelihood ratio test [J].
Ramírez, J ;
Segura, JC ;
Benítez, C ;
García, L ;
Rubio, A .
IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (10) :689-692
[10]   Improved voice activity detection using contextual multiple hypothesis testing for robust speech recognition [J].
Ramirez, Javier ;
Segura, Jose C. ;
Gorriz, Juan M. ;
Garcia, Luz .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08) :2177-2189