Multitaper Estimation of Frequency-Warped Cepstra With Application to Speaker Verification

被引:23
作者
Sandberg, Johan [1 ]
Hansson-Sandsten, Maria [1 ]
Kinnunen, Tomi [2 ]
Saeidi, Rahim [2 ]
Flandrin, Patrick [3 ]
Borgnat, Pierre [3 ]
机构
[1] Lund Univ, Ctr Math Sci, SE-22100 Lund, Sweden
[2] Univ Eastern Finland, Dept Comp Sci & Stat, Speech & Image Proc Unit, FIN-80101 Joensuu, Finland
[3] Ecole Normale Super Lyon, CNRS, UMR 5672, Phys Lab, F-69364 Lyon, France
基金
瑞典研究理事会;
关键词
Cepstral analysis; MFCC; multiple windows; multitapers; speaker verification; speech analysis; GAUSSIAN MIXTURE-MODELS; SPECTRAL ESTIMATION; RECOGNITION; STATISTICS;
D O I
10.1109/LSP.2010.2040228
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Usually the mel-frequency cepstral coefficients are estimated either from a periodogram or from a windowed periodogram. We state a general estimator which also includes multitaper estimators. We propose approximations of the variance and bias of the estimate of each coefficient. By using Monte Carlo computations, we demonstrate that the approximations are accurate. Using the proposed formulas, the peak matched multitaper estimator is shown to have low mean square error (squared bias variance) on speech-like processes. It is also shown to perform slightly better in the NIST 2006 speaker verification task as compared to the Hamming window conventionally used in this context.
引用
收藏
页码:343 / 346
页数:4
相关论文
共 14 条
[1]  
[Anonymous], 2001, Discrete-Time Speech Signal Processing:Principles and Practice
[2]  
BOGERT BP, 1963, P S TIM SER AN, P15
[3]   Support vector machines for speaker and language recognition [J].
Campbell, WM ;
Campbell, JP ;
Reynolds, DA ;
Singer, E ;
Torres-Carrasquillo, PA .
COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3) :210-229
[4]   On second-order statistics and linear estimation of cepstral coefficients [J].
Ephraim, Y ;
Rahim, M .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (02) :162-176
[5]   On the Statistics of Spectral Amplitudes After Variance Reduction by Temporal Cepstrum Smoothing and Cepstral Nulling [J].
Gerkmann, Timo ;
Martin, Rainer .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2009, 57 (11) :4165-4174
[6]   A multiple window method for estimation of peaked spectra [J].
Hansson, M ;
Salomonsson, G .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1997, 45 (03) :778-781
[7]   Comparative evaluation of maximum a Posteriori vector quantization and gaussian mixture models in speaker verification [J].
Kinnunen, Tomi ;
Saastamoinen, Juhani ;
Hautamaki, Ville ;
Vinni, Mikko ;
Franti, Pasi .
PATTERN RECOGNITION LETTERS, 2009, 30 (04) :341-347
[8]  
Percival D. B., 1993, SPECTRAL ANAL PHYS A, DOI [DOI 10.1017/CBO9780511622762, 10.1017/cbo9780511622762, 10.1017/CBO9780511622762]
[9]   Speaker verification using adapted Gaussian mixture models [J].
Reynolds, DA ;
Quatieri, TF ;
Dunn, RB .
DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :19-41
[10]   Multitapering and a wavelet variant of MFCC in speech recognition [J].
Ricotti, LP .
IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2005, 152 (01) :29-35