Spectro-temporal Power Spectrum Features for Noise Robust ASR

被引:0
作者
Hamed Riazati Seresht
Seyed Mohammad Ahadi
Sanaz Seyedin
机构
[1] Amirkabir University of Technology,Department of Electrical Engineering
来源
Circuits, Systems, and Signal Processing | 2017年 / 36卷
关键词
Automatic speech recognition; Spectro-temporal feature extraction; 2-D filtering; Spectral peak movements;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we present a new technique to extract a noise robust representation of speech signals called spectro-temporal power spectrum. This technique is based on applying a simple 2-D filter to the speech spectrogram to highlight the movements of spectral peaks. As speech spectral peaks constitute the regions of high-SNR (signal-to-noise ratio) values in the speech spectrogram, we expect that applying our filter will improve the recognition performance. In addition, by applying the 2-D filter, the spectro-temporal information around each frequency component is encoded into the frequency representation of speech signal. This information will help the recognizer to better identify the true state to which each frame should be allocated. Experimental results on the Aurora 2 task show that error rate improvements of about 40 and 35 % are obtained for test sets A and B, respectively, in comparison with the baseline system when combined with cepstral mean and variance normalization. Also, further improvement was achieved when the proposed features were extracted from enhanced spectra obtained by applying advanced front-end routine. Moreover, phone recognition task evaluated on TIMIT database showed the preference of the proposed method over the baseline methods. The obtained improvement by the proposed method is made with a very simple and easy-to-implement routine which makes it suitable for practical systems.
引用
收藏
页码:3222 / 3242
页数:20
相关论文
共 29 条
[1]  
Chen J(2003)Cepstrum derived from differential power spectrum for robust speech recognition Speech Commun. 41 469-484
[2]  
Paliwal KK(2006)Statistical comparisons of classifiers over multiple data sets J. Mach. Learn. Res. 7 1-30
[3]  
Nakamura S(2001)Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex J. Neurophysiol. 85 1220-1234
[4]  
Demsar J(2007)Features based on filtering and spectral peaks in autocorrelation domain for robust speech recognition Comput. Speech Lang. 21 187-205
[5]  
Depireux DA(2010)Temporal envelope compensation for robust phoneme recognition using modulation spectrum J. Acoust. Soc. Am. 128 3769-3780
[6]  
Simon JZ(1990)Perceptual linear predictive (PLP) analysis of speech J. Acoust. Soc. Am. 87 1738-1752
[7]  
Klein DJ(1994)Rasta processing of speech IEEE Trans. Speech Audio Process. 2 578-589
[8]  
Shamma SA(2011)Discrimination of speech from nonspeech in broadcast news based on modulation frequency features Speech Commun. 53 726-735
[9]  
Farahani G(2011)Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition Speech Commun. 53 753-767
[10]  
Ahadi SM(2012)Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition J. Acoust. Soc. Am. 131 4134-4151