Robustness to telephone handset distortion in speaker recognition by discriminative feature design

被引:43
作者
Heck, LP
Konig, Y
Sönmez, MK
Weintraub, M
机构
[1] Nuance Commun, Menlo Pk, CA 94025 USA
[2] Utopy Inc, San Francisco, CA 94102 USA
[3] SRI Int, Menlo Pk, CA 94025 USA
关键词
speaker recognition; speaker verification; speaker identification; channel compensation; channel robustness; telephone handset distortion; feature extraction; neural network; discriminative design;
D O I
10.1016/S0167-6393(99)00077-1
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A method is described for designing speaker recognition features that are robust to telephone handset distortion. The approach transforms features such as mel-cepstral features, log spectrum, and prosody-based features with a non-linear artificial neural network. The neural network is discriminatively trained to maximize speaker recognition performance specifically in the setting of telephone handset mismatch between training and testing. The algorithm requires neither stereo recordings of speech during training nor manual labeling of handset types either in training or testing. Results on the 1998 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation corpus show relative improvements as high as 28% for the new multilayered perceptron (MLP)-based features as compared to a standard mel-cepstral feature set with cepstral mean subtraction (CMS) and handset-dependent normalizing impostor models. (C) 2000 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:181 / 192
页数:12
相关论文
共 26 条
[1]  
BAUM EB, 1988, NEURAL INFORMATION P, P52
[2]   GLOBAL OPTIMIZATION OF A NEURAL NETWORK-HIDDEN MARKOV MODEL HYBRID [J].
BENGIO, Y ;
DEMORI, R ;
FLAMMIA, G ;
KOMPE, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1992, 3 (02) :252-259
[3]   HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features [J].
Chengalvarayan, R ;
Deng, L .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (03) :243-256
[4]  
EULER S, 1995, P EUROSPEECH SEP, P109
[5]   CEPSTRAL ANALYSIS TECHNIQUE FOR AUTOMATIC SPEAKER VERIFICATION [J].
FURUI, S .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1981, 29 (02) :254-272
[6]  
HECK KLP, 1997, P INT C AC SPEECH SI
[7]  
Hermansky H, 1991, P EUROSPEECH, P1367
[8]  
LEHR M, 1996, THESIS STANFORD U
[9]  
LIU FH, 1994, P INT C AC SPEECH SI, V2, P19
[10]   Robust speaker recognition - A feature-based approach [J].
Mammone, RJ ;
Zhang, XY ;
Ramachandran, RP .
IEEE SIGNAL PROCESSING MAGAZINE, 1996, 13 (05) :58-71