Horizontal Spectral Entropy with Long-Span of Time for Robust Voice Activity Detection

被引:0
作者
Wang, Kun-Ching [1 ]
机构
[1] Shih Chien Univ, Taipei, Taiwan
关键词
voice activity detection; horizontal spectral entropy; long-term; Mel-scaled filter bank; NOISE;
D O I
10.1587/transinf.E96.D.2156
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This letter introduces innovative VAD based on horizontal spectral entropy with long-span of time (HSELT) feature sets to improve mobile ASR performance in low signal-to-noise ratio (SNR) conditions. Since the signal characteristics of nonstationary noise change with time, we need long-term information of the noisy speech signal to define a more robust decision rule yielding high accuracy. We find that HSELT measures can horizontally enhance the transition between speech and non-speech segments. Based on this finding, we use the HSELT measures to achieve high accuracy for detecting speech signal form various stationary and nonstationary noises.
引用
收藏
页码:2156 / 2161
页数:6
相关论文
共 14 条
[1]  
[Anonymous], 1998, DIGITAL CELLULAR TEL
[2]   ITU-T recommendation G.729 Annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications [J].
Benyassine, A ;
Shlomot, E ;
Su, HY ;
Massaloux, D ;
Lamblin, C ;
Petit, JP .
IEEE COMMUNICATIONS MAGAZINE, 1997, 35 (09) :64-73
[3]   A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing [J].
Breithaupt, Colin ;
Gerkmann, Timo ;
Martin, Rainer .
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :4897-4900
[4]  
Cho YD, 2001, IEEE SIGNAL PROC LET, V8, P276, DOI 10.1109/97.957270
[5]  
Doblinger G., 1995, P EUR, P1513
[6]  
ETSI, 2002, 201108 ETSI ES
[7]   Multiband modulation energy tracking for noisy speech detection [J].
Evangelopoulos, Georgios ;
Maragos, Petros .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (06) :2024-2038
[8]  
GERVEN SV, 1997, P EUROSPEECH, V3, P1095
[9]   Robust Voice Activity Detection Using Long-Term Signal Variability [J].
Ghosh, Prasanta Kumar ;
Tsiartas, Andreas ;
Narayanan, Shrikanth .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (03) :600-613
[10]   Noise power spectral density estimation based on optimal smoothing and minimum statistics [J].
Martin, R .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (05) :504-512