NOISE-ROBUST SPEECH RECOGNITION WITH EXEMPLAR-BASED SPARSE REPRESENTATIONS USING ALPHA-BETA DIVERGENCE

被引:0
作者
Yilmaz, Emre [1 ]
Gemmeke, Jort F. [1 ]
Van Hamme, Hugo [1 ]
机构
[1] Katholieke Univ Leuven, Dept ESAT, Leuven, Belgium
来源
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年
关键词
exemplar-based speech recognition; sparse representations; alpha-beta divergence; noise-robustness; NONNEGATIVE MATRIX FACTORIZATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we investigate the performance of a noise-robust sparse representations (SR)-based recognizer using the Alpha-Beta (AB)divergence to compare the noisy speech segments and exemplars. The baseline recognizer, which approximates noisy speech segments as a linear combination of speech and noise exemplars of variable length, uses the generalized Kullback-Leibler divergence to quantify the approximation quality. Incorporating a reconstruction errorbased back-end, the recognition performance highly depends on the congruence of the divergence measure and used speech features. Having two tuning parameters, namely alpha and beta, the AB-divergence provides improved robustness against background noise and outliers. These parameters can be adjusted for better performance depending on the distribution of speech and noise exemplars in the high-dimensional feature space. Moreover, various well-known distance/divergence measures such as the Euclidean distance, generalized Kullback-Leibler divergence, Itakura-Saito divergence and Hellinger distance are special cases of the AB-divergence for different (alpha, beta) values. The goal of this work is to investigate the optimal divergence for mel-scaled magnitude spectral features by performing recognition experiments at several SNR levels using different (alpha, beta) pairs. The results demonstrate the effectiveness of the AB-divergence compared to the generalized Kullback-Leibler divergence especially at the lower SNR levels.
引用
收藏
页数:5
相关论文
共 28 条
[1]  
[Anonymous], 2012, PROC 17 IEEE EUR TES, DOI DOI 10.1109/ETS.2012.6233045
[2]  
[Anonymous], 2011, INT WORKSH MACH LIST
[3]  
[Anonymous], 2 INT WORKSH MACH LE
[4]  
Aradilla G., 2005, Proc. Eurospeech, P3333
[5]  
Axelrod S, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P173
[6]  
Cichocki A, 2006, LECT NOTES COMPUT SC, V3889, P32
[7]   Generalized Alpha-Beta Divergences and Their Application to Robust Nonnegative Matrix Factorization [J].
Cichocki, Andrzej ;
Cruces, Sergio ;
Amari, Shun-ichi .
ENTROPY, 2011, 13 (01) :134-170
[8]   Robust automatic speech recognition with missing and unreliable acoustic data [J].
Cooke, M ;
Green, P ;
Josifovski, L ;
Vizinho, A .
SPEECH COMMUNICATION, 2001, 34 (03) :267-285
[9]   An audio-visual corpus for speech perception and automatic speech recognition (L) [J].
Cooke, Martin ;
Barker, Jon ;
Cunningham, Stuart ;
Shao, Xu .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05) :2421-2424
[10]   Template-based continuous speech recognition [J].
De Wachter, Mathias ;
Matton, Mike ;
Demuynck, Kris ;
Wambacq, Patrick ;
Cools, Ronald ;
Van Compernolle, Dirk .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04) :1377-1390